History log of /openbmc/linux/kernel/trace/trace.c (Results 1 – 25 of 4102)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.71
# 9144f784 09-Jan-2025 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.70' into for/openbmc/dev-6.6

This is the 6.6.70 stable release

Conflicts:
include/linux/usb/chipidea.h

Conflict was a trivial addition.

Signed-off-by: Andrew Jeffery <andrew@c

Merge tag 'v6.6.70' into for/openbmc/dev-6.6

This is the 6.6.70 stable release

Conflicts:
include/linux/usb/chipidea.h

Conflict was a trivial addition.

Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>

show more ...


Revision tags: v6.6.70, v6.6.69, v6.6.68, v6.6.67
# f452f397 16-Dec-2024 Steven Rostedt <rostedt@goodmis.org>

tracing: Check "%s" dereference via the field and not the TP_printk format

[ Upstream commit afd2627f727b89496d79a6b934a025fc916d4ded ]

The TP_printk() portion of a trace event is executed at the t

tracing: Check "%s" dereference via the field and not the TP_printk format

[ Upstream commit afd2627f727b89496d79a6b934a025fc916d4ded ]

The TP_printk() portion of a trace event is executed at the time a event
is read from the trace. This can happen seconds, minutes, hours, days,
months, years possibly later since the event was recorded. If the print
format contains a dereference to a string via "%s", and that string was
allocated, there's a chance that string could be freed before it is read
by the trace file.

To protect against such bugs, there are two functions that verify the
event. The first one is test_event_printk(), which is called when the
event is created. It reads the TP_printk() format as well as its arguments
to make sure nothing may be dereferencing a pointer that was not copied
into the ring buffer along with the event. If it is, it will trigger a
WARN_ON().

For strings that use "%s", it is not so easy. The string may not reside in
the ring buffer but may still be valid. Strings that are static and part
of the kernel proper which will not be freed for the life of the running
system, are safe to dereference. But to know if it is a pointer to a
static string or to something on the heap can not be determined until the
event is triggered.

This brings us to the second function that tests for the bad dereferencing
of strings, trace_check_vprintf(). It would walk through the printf format
looking for "%s", and when it finds it, it would validate that the pointer
is safe to read. If not, it would produces a WARN_ON() as well and write
into the ring buffer "[UNSAFE-MEMORY]".

The problem with this is how it used va_list to have vsnprintf() handle
all the cases that it didn't need to check. Instead of re-implementing
vsnprintf(), it would make a copy of the format up to the %s part, and
call vsnprintf() with the current va_list ap variable, where the ap would
then be ready to point at the string in question.

For architectures that passed va_list by reference this was possible. For
architectures that passed it by copy it was not. A test_can_verify()
function was used to differentiate between the two, and if it wasn't
possible, it would disable it.

Even for architectures where this was feasible, it was a stretch to rely
on such a method that is undocumented, and could cause issues later on
with new optimizations of the compiler.

Instead, the first function test_event_printk() was updated to look at
"%s" as well. If the "%s" argument is a pointer outside the event in the
ring buffer, it would find the field type of the event that is the problem
and mark the structure with a new flag called "needs_test". The event
itself will be marked by TRACE_EVENT_FL_TEST_STR to let it be known that
this event has a field that needs to be verified before the event can be
printed using the printf format.

When the event fields are created from the field type structure, the
fields would copy the field type's "needs_test" value.

Finally, before being printed, a new function ignore_event() is called
which will check if the event has the TEST_STR flag set (if not, it
returns false). If the flag is set, it then iterates through the events
fields looking for the ones that have the "needs_test" flag set.

Then it uses the offset field from the field structure to find the pointer
in the ring buffer event. It runs the tests to make sure that pointer is
safe to print and if not, it triggers the WARN_ON() and also adds to the
trace output that the event in question has an unsafe memory access.

The ignore_event() makes the trace_check_vprintf() obsolete so it is
removed.

Link: https://lore.kernel.org/all/CAHk-=wh3uOnqnZPpR0PeLZZtyWbZLboZ7cHLCKRWsocvs9Y7hQ@mail.gmail.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20241217024720.848621576@goodmis.org
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.66, v6.6.65, v6.6.64, v6.6.63, v6.6.62, v6.6.61, v6.6.60, v6.6.59, v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54
# 55841e88 03-Oct-2024 Steven Rostedt <rostedt@goodmis.org>

tracing: Fix trace_check_vprintf() when tp_printk is used

[ Upstream commit 50a3242d84ee1625b0bfef29b95f935958dccfbe ]

When the tp_printk kernel command line is used, the trace events go
directly t

tracing: Fix trace_check_vprintf() when tp_printk is used

[ Upstream commit 50a3242d84ee1625b0bfef29b95f935958dccfbe ]

When the tp_printk kernel command line is used, the trace events go
directly to printk(). It is still checked via the trace_check_vprintf()
function to make sure the pointers of the trace event are legit.

The addition of reading buffers from previous boots required adding a
delta between the addresses of the previous boot and the current boot so
that the pointers in the old buffer can still be used. But this required
adding a trace_array pointer to acquire the delta offsets.

The tp_printk code does not provide a trace_array (tr) pointer, so when
the offsets were examined, a NULL pointer dereference happened and the
kernel crashed.

If the trace_array does not exist, just default the delta offsets to zero,
as that also means the trace event is not being read from a previous boot.

Link: https://lore.kernel.org/all/Zv3z5UsG_jsO9_Tb@aschofie-mobl2.lan/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241003104925.4e1b1fd9@gandalf.local.home
Fixes: 07714b4bb3f98 ("tracing: Handle old buffer mappings for event strings and functions")
Reported-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: afd2627f727b ("tracing: Check "%s" dereference via the field and not the TP_printk format")
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.53, v6.6.52, v6.6.51, v6.6.50, v6.6.49, v6.6.48, v6.6.47, v6.6.46, v6.6.45, v6.6.44, v6.6.43, v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37, v6.6.36, v6.6.35, v6.6.34
# 680c07fa 12-Jun-2024 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Handle old buffer mappings for event strings and functions

[ Upstream commit 07714b4bb3f9800261c8b4b2f47e9010ed60979d ]

Use the saved text_delta and data_delta of a persistent memory mappe

tracing: Handle old buffer mappings for event strings and functions

[ Upstream commit 07714b4bb3f9800261c8b4b2f47e9010ed60979d ]

Use the saved text_delta and data_delta of a persistent memory mapped ring
buffer that was saved from a previous boot, and use the delta in the trace
event print output so that strings and functions show up normally.

That is, for an event like trace_kmalloc() that prints the callsite via
"%pS", if it used the address saved in the ring buffer it will not match
the function that was saved in the previous boot if the kernel remaps
itself between boots.

For RCU events that point to saved static strings where only the address
of the string is saved in the ring buffer, it too will be adjusted to
point to where the string is on the current boot.

Link: https://lkml.kernel.org/r/20240612232026.821020753@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineeth Pillai <vineeth@bitbyteword.org>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Beau Belgrave <beaub@linux.microsoft.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: afd2627f727b ("tracing: Check "%s" dereference via the field and not the TP_printk format")
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6
# 6920e362 27-Oct-2023 Kees Cook <keescook@chromium.org>

seq_buf: Introduce DECLARE_SEQ_BUF and seq_buf_str()

[ Upstream commit dcc4e5728eeaeda84878ca0018758cff1abfca21 ]

Solve two ergonomic issues with struct seq_buf;

1) Too much boilerplate is require

seq_buf: Introduce DECLARE_SEQ_BUF and seq_buf_str()

[ Upstream commit dcc4e5728eeaeda84878ca0018758cff1abfca21 ]

Solve two ergonomic issues with struct seq_buf;

1) Too much boilerplate is required to initialize:

struct seq_buf s;
char buf[32];

seq_buf_init(s, buf, sizeof(buf));

Instead, we can build this directly on the stack. Provide
DECLARE_SEQ_BUF() macro to do this:

DECLARE_SEQ_BUF(s, 32);

2) %NUL termination is fragile and requires 2 steps to get a valid
C String (and is a layering violation exposing the "internals" of
seq_buf):

seq_buf_terminate(s);
do_something(s->buffer);

Instead, we can just return s->buffer directly after terminating it in
the refactored seq_buf_terminate(), now known as seq_buf_str():

do_something(seq_buf_str(s));

Link: https://lore.kernel.org/linux-trace-kernel/20231027155634.make.260-kees@kernel.org
Link: https://lore.kernel.org/linux-trace-kernel/20231026194033.it.702-kees@kernel.org/

Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Yun Zhou <yun.zhou@windriver.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: afd2627f727b ("tracing: Check "%s" dereference via the field and not the TP_printk format")
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.5.9
# c46547b4 19-Oct-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

tracing: Move readpos from seq_buf to trace_seq

[ Upstream commit d0ed46b60396cfa7e0056f55e1ce0b43c7db57b6 ]

To make seq_buf more lightweight as a string buf, move the readpos member
from seq_buf t

tracing: Move readpos from seq_buf to trace_seq

[ Upstream commit d0ed46b60396cfa7e0056f55e1ce0b43c7db57b6 ]

To make seq_buf more lightweight as a string buf, move the readpos member
from seq_buf to its container, trace_seq. That puts the responsibility
of maintaining the readpos entirely in the tracing code. If some future
users want to package up the readpos with a seq_buf, we can define a
new struct then.

Link: https://lore.kernel.org/linux-trace-kernel/20231020033545.2587554-2-willy@infradead.org

Cc: Kees Cook <keescook@chromium.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: afd2627f727b ("tracing: Check "%s" dereference via the field and not the TP_printk format")
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# ed454332 05-Jan-2025 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.69' into for/openbmc/dev-6.6

This is the 6.6.69 stable release


# 03041e47 16-Dec-2024 Lizhi Xu <lizhi.xu@windriver.com>

tracing: Prevent bad count for tracing_cpumask_write

commit 98feccbf32cfdde8c722bc4587aaa60ee5ac33f0 upstream.

If a large count is provided, it will trigger a warning in bitmap_parse_user.
Also che

tracing: Prevent bad count for tracing_cpumask_write

commit 98feccbf32cfdde8c722bc4587aaa60ee5ac33f0 upstream.

If a large count is provided, it will trigger a warning in bitmap_parse_user.
Also check zero for it.

Cc: stable@vger.kernel.org
Fixes: 9e01c1b74c953 ("cpumask: convert kernel trace functions")
Link: https://lore.kernel.org/20241216073238.2573704-1-lizhi.xu@windriver.com
Reported-by: syzbot+0aecfd34fb878546f3fd@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0aecfd34fb878546f3fd
Tested-by: syzbot+0aecfd34fb878546f3fd@syzkaller.appspotmail.com
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# e50e86db 03-Nov-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.59' into for/openbmc/dev-6.6

This is the 6.6.59 stable release


# 91f62703 03-Mar-2024 Masami Hiramatsu (Google) <mhiramat@kernel.org>

tracing/probes: Support $argN in return probe (kprobe and fprobe)

[ Upstream commit 25f00e40ce7953db197af3a59233711d154c9d80 ]

Support accessing $argN in the return probe events. This will help use

tracing/probes: Support $argN in return probe (kprobe and fprobe)

[ Upstream commit 25f00e40ce7953db197af3a59233711d154c9d80 ]

Support accessing $argN in the return probe events. This will help users to
record entry data in function return (exit) event for simplfing the function
entry/exit information in one event, and record the result values (e.g.
allocated object/initialized object) at function exit.

For example, if we have a function `int init_foo(struct foo *obj, int param)`
sometimes we want to check how `obj` is initialized. In such case, we can
define a new return event like below;

# echo 'r init_foo retval=$retval param=$arg2 field1=+0($arg1)' >> kprobe_events

Thus it records the function parameter `param` and its result `obj->field1`
(the dereference will be done in the function exit timing) value at once.

This also support fprobe, BTF args and'$arg*'. So if CONFIG_DEBUG_INFO_BTF
is enabled, we can trace both function parameters and the return value
by following command.

# echo 'f target_function%return $arg* $retval' >> dynamic_events

Link: https://lore.kernel.org/all/170952365552.229804.224112990211602895.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Stable-dep-of: 373b9338c972 ("uprobe: avoid out-of-bounds memory access of fetching args")
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# 7b7fd0ac 17-Oct-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.57' into for/openbmc/dev-6.6

This is the 6.6.57 stable release


# 5915b898 20-Feb-2024 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Have saved_cmdlines arrays all in one allocation

[ Upstream commit 0b18c852cc6fb8284ac0ab97e3e840974a6a8a64 ]

The saved_cmdlines have three arrays for mapping PIDs to COMMs:

- map_pid_to

tracing: Have saved_cmdlines arrays all in one allocation

[ Upstream commit 0b18c852cc6fb8284ac0ab97e3e840974a6a8a64 ]

The saved_cmdlines have three arrays for mapping PIDs to COMMs:

- map_pid_to_cmdline[]
- map_cmdline_to_pid[]
- saved_cmdlines

The map_pid_to_cmdline[] is PID_MAX_DEFAULT in size and holds the index
into the other arrays. The map_cmdline_to_pid[] is a mapping back to the
full pid as it can be larger than PID_MAX_DEFAULT. And the
saved_cmdlines[] just holds the COMMs associated to the pids.

Currently the map_pid_to_cmdline[] and saved_cmdlines[] are allocated
together (in reality the saved_cmdlines is just in the memory of the
rounding of the allocation of the structure as it is always allocated in
powers of two). The map_cmdline_to_pid[] array is allocated separately.

Since the rounding to a power of two is rather large (it allows for 8000
elements in saved_cmdlines), also include the map_cmdline_to_pid[] array.
(This drops it to 6000 by default, which is still plenty for most use
cases). This saves even more memory as the map_cmdline_to_pid[] array
doesn't need to be allocated.

Link: https://lore.kernel.org/linux-trace-kernel/20240212174011.068211d9@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240220140703.182330529@goodmis.org

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Mete Durlu <meted@linux.ibm.com>
Fixes: 44dc5c41b5b1 ("tracing: Fix wasted memory in saved_cmdlines logic")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# ca2478a7 12-Sep-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.51' into for/openbmc/dev-6.6

This is the 6.6.51 stable release


# 993ecb4e 27-Aug-2024 Zheng Yejian <zhengyejian@huaweicloud.com>

tracing: Avoid possible softlockup in tracing_iter_reset()

commit 49aa8a1f4d6800721c7971ed383078257f12e8f9 upstream.

In __tracing_open(), when max latency tracers took place on the cpu,
the time st

tracing: Avoid possible softlockup in tracing_iter_reset()

commit 49aa8a1f4d6800721c7971ed383078257f12e8f9 upstream.

In __tracing_open(), when max latency tracers took place on the cpu,
the time start of its buffer would be updated, then event entries with
timestamps being earlier than start of the buffer would be skipped
(see tracing_iter_reset()).

Softlockup will occur if the kernel is non-preemptible and too many
entries were skipped in the loop that reset every cpu buffer, so add
cond_resched() to avoid it.

Cc: stable@vger.kernel.org
Fixes: 2f26ebd549b9a ("tracing: use timestamp to determine start of latency traces")
Link: https://lore.kernel.org/20240827124654.3817443-1-zhengyejian@huaweicloud.com
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# 46eeaa11 03-Apr-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.24' into dev-6.6

This is the 6.6.24 stable release


# 274f0b1a 08-Mar-2024 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Use .flush() call to wake up readers

commit e5d7c1916562f0e856eb3d6f569629fcd535fed2 upstream.

The .release() function does not get called until all readers of a file
descriptor are finish

tracing: Use .flush() call to wake up readers

commit e5d7c1916562f0e856eb3d6f569629fcd535fed2 upstream.

The .release() function does not get called until all readers of a file
descriptor are finished.

If a thread is blocked on reading a file descriptor in ring_buffer_wait(),
and another thread closes the file descriptor, it will not wake up the
other thread as ring_buffer_wake_waiters() is called by .release(), and
that will not get called until the .read() is finished.

The issue originally showed up in trace-cmd, but the readers are actually
other processes with their own file descriptors. So calling close() would wake
up the other tasks because they are blocked on another descriptor then the
one that was closed(). But there's other wake ups that solve that issue.

When a thread is blocked on a read, it can still hang even when another
thread closed its descriptor.

This is what the .flush() callback is for. Have the .flush() wake up the
readers.

Link: https://lore.kernel.org/linux-trace-kernel/20240308202432.107909457@goodmis.org

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linke li <lilinke99@qq.com>
Cc: Rabin Vincent <rabin@rab.in>
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# c595db6d 13-Mar-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.18' into dev-6.6

This is the 6.6.18 stable release


# 35ee34c0 06-Feb-2024 Steven Rostedt (Google) <rostedt@goodmis.org>

eventfs: Remove eventfs_file and just use eventfs_inode

commit 5790b1fb3d672d9a1fe3881a7181dfdbe741568f upstream.

Instead of having a descriptor for every file represented in the eventfs
directory,

eventfs: Remove eventfs_file and just use eventfs_inode

commit 5790b1fb3d672d9a1fe3881a7181dfdbe741568f upstream.

Instead of having a descriptor for every file represented in the eventfs
directory, only have the directory itself represented. Change the API to
send in a list of entries that represent all the files in the directory
(but not other directories). The entry list contains a name and a callback
function that will be used to create the files when they are accessed.

struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry *parent,
const struct eventfs_entry *entries,
int size, void *data);

is used for the top level eventfs directory, and returns an eventfs_inode
that will be used by:

struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode *parent,
const struct eventfs_entry *entries,
int size, void *data);

where both of the above take an array of struct eventfs_entry entries for
every file that is in the directory.

The entries are defined by:

typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data,
const struct file_operations **fops);

struct eventfs_entry {
const char *name;
eventfs_callback callback;
};

Where the name is the name of the file and the callback gets called when
the file is being created. The callback passes in the name (in case the
same callback is used for multiple files), a pointer to the mode, data and
fops. The data will be pointing to the data that was passed in
eventfs_create_dir() or eventfs_create_events_dir() but may be overridden
to point to something else, as it will be used to point to the
inode->i_private that is created. The information passed back from the
callback is used to create the dentry/inode.

If the callback fills the data and the file should be created, it must
return a positive number. On zero or negative, the file is ignored.

This logic may also be used as a prototype to convert entire pseudo file
systems into just-in-time allocation.

The "show_events_dentry" file has been updated to show the directories,
and any files they have.

With just the eventfs_file allocations:

Before after deltas for meminfo (in kB):

MemFree: -14360
MemAvailable: -14260
Buffers: 40
Cached: 24
Active: 44
Inactive: 48
Inactive(anon): 28
Active(file): 44
Inactive(file): 20
Dirty: -4
AnonPages: 28
Mapped: 4
KReclaimable: 132
Slab: 1604
SReclaimable: 132
SUnreclaim: 1472
Committed_AS: 12

Before after deltas for slabinfo:

<slab>: <objects> [ * <size> = <total>]

ext4_inode_cache 27 [* 1184 = 31968 ]
extent_status 102 [* 40 = 4080 ]
tracefs_inode_cache 144 [* 656 = 94464 ]
buffer_head 39 [* 104 = 4056 ]
shmem_inode_cache 49 [* 800 = 39200 ]
filp -53 [* 256 = -13568 ]
dentry 251 [* 192 = 48192 ]
lsm_file_cache 277 [* 32 = 8864 ]
vm_area_struct -14 [* 184 = -2576 ]
trace_event_file 1748 [* 88 = 153824 ]
kmalloc-1k 35 [* 1024 = 35840 ]
kmalloc-256 49 [* 256 = 12544 ]
kmalloc-192 -28 [* 192 = -5376 ]
kmalloc-128 -30 [* 128 = -3840 ]
kmalloc-96 10581 [* 96 = 1015776 ]
kmalloc-64 3056 [* 64 = 195584 ]
kmalloc-32 1291 [* 32 = 41312 ]
kmalloc-16 2310 [* 16 = 36960 ]
kmalloc-8 9216 [* 8 = 73728 ]

Free memory dropped by 14,360 kB
Available memory dropped by 14,260 kB
Total slab additions in size: 1,771,032 bytes

With this change:

Before after deltas for meminfo (in kB):

MemFree: -12084
MemAvailable: -11976
Buffers: 32
Cached: 32
Active: 72
Inactive: 168
Inactive(anon): 176
Active(file): 72
Inactive(file): -8
Dirty: 24
AnonPages: 196
Mapped: 8
KReclaimable: 148
Slab: 836
SReclaimable: 148
SUnreclaim: 688
Committed_AS: 324

Before after deltas for slabinfo:

<slab>: <objects> [ * <size> = <total>]

tracefs_inode_cache 144 [* 656 = 94464 ]
shmem_inode_cache -23 [* 800 = -18400 ]
filp -92 [* 256 = -23552 ]
dentry 179 [* 192 = 34368 ]
lsm_file_cache -3 [* 32 = -96 ]
vm_area_struct -13 [* 184 = -2392 ]
trace_event_file 1748 [* 88 = 153824 ]
kmalloc-1k -49 [* 1024 = -50176 ]
kmalloc-256 -27 [* 256 = -6912 ]
kmalloc-128 1864 [* 128 = 238592 ]
kmalloc-64 4685 [* 64 = 299840 ]
kmalloc-32 -72 [* 32 = -2304 ]
kmalloc-16 256 [* 16 = 4096 ]
total = 721352

Free memory dropped by 12,084 kB
Available memory dropped by 11,976 kB
Total slab additions in size: 721,352 bytes

That's over 2 MB in savings per instance for free and available memory,
and over 1 MB in savings per instance of slab memory.

Link: https://lore.kernel.org/linux-trace-kernel/20231003184059.4924468e@gandalf.local.home
Link: https://lore.kernel.org/linux-trace-kernel/20231004165007.43d79161@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# 49bca071 14-Feb-2024 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Inform kmemleak of saved_cmdlines allocation

commit 2394ac4145ea91b92271e675a09af2a9ea6840b7 upstream.

The allocation of the struct saved_cmdlines_buffer structure changed from:

s

tracing: Inform kmemleak of saved_cmdlines allocation

commit 2394ac4145ea91b92271e675a09af2a9ea6840b7 upstream.

The allocation of the struct saved_cmdlines_buffer structure changed from:

s = kmalloc(sizeof(*s), GFP_KERNEL);
s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);

to:

orig_size = sizeof(*s) + val * TASK_COMM_LEN;
order = get_order(orig_size);
size = 1 << (order + PAGE_SHIFT);
page = alloc_pages(GFP_KERNEL, order);
if (!page)
return NULL;

s = page_address(page);
memset(s, 0, sizeof(*s));

s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);

Where that s->saved_cmdlines allocation looks to be a dangling allocation
to kmemleak. That's because kmemleak only keeps track of kmalloc()
allocations. For allocations that use page_alloc() directly, the kmemleak
needs to be explicitly informed about it.

Add kmemleak_alloc() and kmemleak_free() around the page allocation so
that it doesn't give the following false positive:

unreferenced object 0xffff8881010c8000 (size 32760):
comm "swapper", pid 0, jiffies 4294667296
hex dump (first 32 bytes):
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
backtrace (crc ae6ec1b9):
[<ffffffff86722405>] kmemleak_alloc+0x45/0x80
[<ffffffff8414028d>] __kmalloc_large_node+0x10d/0x190
[<ffffffff84146ab1>] __kmalloc+0x3b1/0x4c0
[<ffffffff83ed7103>] allocate_cmdlines_buffer+0x113/0x230
[<ffffffff88649c34>] tracer_alloc_buffers.isra.0+0x124/0x460
[<ffffffff8864a174>] early_trace_init+0x14/0xa0
[<ffffffff885dd5ae>] start_kernel+0x12e/0x3c0
[<ffffffff885f5758>] x86_64_start_reservations+0x18/0x30
[<ffffffff885f582b>] x86_64_start_kernel+0x7b/0x80
[<ffffffff83a001c3>] secondary_startup_64_no_verify+0x15e/0x16b

Link: https://lore.kernel.org/linux-trace-kernel/87r0hfnr9r.fsf@kernel.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240214112046.09a322d6@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 44dc5c41b5b1 ("tracing: Fix wasted memory in saved_cmdlines logic")
Reported-by: Kalle Valo <kvalo@kernel.org>
Tested-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# 6e48f4c6 09-Feb-2024 Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Fix wasted memory in saved_cmdlines logic

commit 44dc5c41b5b1267d4dd037d26afc0c4d3a568acb upstream.

While looking at improving the saved_cmdlines cache I found a huge amount
of wasted memo

tracing: Fix wasted memory in saved_cmdlines logic

commit 44dc5c41b5b1267d4dd037d26afc0c4d3a568acb upstream.

While looking at improving the saved_cmdlines cache I found a huge amount
of wasted memory that should be used for the cmdlines.

The tracing data saves pids during the trace. At sched switch, if a trace
occurred, it will save the comm of the task that did the trace. This is
saved in a "cache" that maps pids to comms and exposed to user space via
the /sys/kernel/tracing/saved_cmdlines file. Currently it only caches by
default 128 comms.

The structure that uses this creates an array to store the pids using
PID_MAX_DEFAULT (which is usually set to 32768). This causes the structure
to be of the size of 131104 bytes on 64 bit machines.

In hex: 131104 = 0x20020, and since the kernel allocates generic memory in
powers of two, the kernel would allocate 0x40000 or 262144 bytes to store
this structure. That leaves 131040 bytes of wasted space.

Worse, the structure points to an allocated array to store the comm names,
which is 16 bytes times the amount of names to save (currently 128), which
is 2048 bytes. Instead of allocating a separate array, make the structure
end with a variable length string and use the extra space for that.

This is similar to a recommendation that Linus had made about eventfs_inode names:

https://lore.kernel.org/all/20240130190355.11486-5-torvalds@linux-foundation.org/

Instead of allocating a separate string array to hold the saved comms,
have the structure end with: char saved_cmdlines[]; and round up to the
next power of two over sizeof(struct saved_cmdline_buffers) + num_cmdlines * TASK_COMM_LEN
It will use this extra space for the saved_cmdline portion.

Now, instead of saving only 128 comms by default, by using this wasted
space at the end of the structure it can save over 8000 comms and even
saves space by removing the need for allocating the other array.

Link: https://lore.kernel.org/linux-trace-kernel/20240209063622.1f7b6d5f@rorschach.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Mete Durlu <meted@linux.ibm.com>
Fixes: 939c7a4f04fcd ("tracing: Introduce saved_cmdlines_size file")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# 1b0b4c42 10-Feb-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.13' into dev-6.6

This is the 6.6.13 stable release


# 3386ee82 10-Feb-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.10' into dev-6.6

This is the 6.6.10 stable release


# e6eff205 10-Feb-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.8' into dev-6.6

This is the 6.6.8 stable release


# d0c44de2 10-Feb-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.7' into dev-6.6

This is the 6.6.7 stable release


# d9a6029d 13-Dec-2023 Zheng Yejian <zhengyejian1@huawei.com>

tracing: Fix uaf issue when open the hist or hist_debug file

[ Upstream commit 1cc111b9cddc71ce161cd388f11f0e9048edffdb ]

KASAN report following issue. The root cause is when opening 'hist'
file of

tracing: Fix uaf issue when open the hist or hist_debug file

[ Upstream commit 1cc111b9cddc71ce161cd388f11f0e9048edffdb ]

KASAN report following issue. The root cause is when opening 'hist'
file of an instance and accessing 'trace_event_file' in hist_show(),
but 'trace_event_file' has been freed due to the instance being removed.
'hist_debug' file has the same problem. To fix it, call
tracing_{open,release}_file_tr() in file_operations callback to have
the ref count and avoid 'trace_event_file' being freed.

BUG: KASAN: slab-use-after-free in hist_show+0x11e0/0x1278
Read of size 8 at addr ffff242541e336b8 by task head/190

CPU: 4 PID: 190 Comm: head Not tainted 6.7.0-rc5-g26aff849438c #133
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x98/0xf8
show_stack+0x1c/0x30
dump_stack_lvl+0x44/0x58
print_report+0xf0/0x5a0
kasan_report+0x80/0xc0
__asan_report_load8_noabort+0x1c/0x28
hist_show+0x11e0/0x1278
seq_read_iter+0x344/0xd78
seq_read+0x128/0x1c0
vfs_read+0x198/0x6c8
ksys_read+0xf4/0x1e0
__arm64_sys_read+0x70/0xa8
invoke_syscall+0x70/0x260
el0_svc_common.constprop.0+0xb0/0x280
do_el0_svc+0x44/0x60
el0_svc+0x34/0x68
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x168/0x170

Allocated by task 188:
kasan_save_stack+0x28/0x50
kasan_set_track+0x28/0x38
kasan_save_alloc_info+0x20/0x30
__kasan_slab_alloc+0x6c/0x80
kmem_cache_alloc+0x15c/0x4a8
trace_create_new_event+0x84/0x348
__trace_add_new_event+0x18/0x88
event_trace_add_tracer+0xc4/0x1a0
trace_array_create_dir+0x6c/0x100
trace_array_create+0x2e8/0x568
instance_mkdir+0x48/0x80
tracefs_syscall_mkdir+0x90/0xe8
vfs_mkdir+0x3c4/0x610
do_mkdirat+0x144/0x200
__arm64_sys_mkdirat+0x8c/0xc0
invoke_syscall+0x70/0x260
el0_svc_common.constprop.0+0xb0/0x280
do_el0_svc+0x44/0x60
el0_svc+0x34/0x68
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x168/0x170

Freed by task 191:
kasan_save_stack+0x28/0x50
kasan_set_track+0x28/0x38
kasan_save_free_info+0x34/0x58
__kasan_slab_free+0xe4/0x158
kmem_cache_free+0x19c/0x508
event_file_put+0xa0/0x120
remove_event_file_dir+0x180/0x320
event_trace_del_tracer+0xb0/0x180
__remove_instance+0x224/0x508
instance_rmdir+0x44/0x78
tracefs_syscall_rmdir+0xbc/0x140
vfs_rmdir+0x1cc/0x4c8
do_rmdir+0x220/0x2b8
__arm64_sys_unlinkat+0xc0/0x100
invoke_syscall+0x70/0x260
el0_svc_common.constprop.0+0xb0/0x280
do_el0_svc+0x44/0x60
el0_svc+0x34/0x68
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x168/0x170

Link: https://lore.kernel.org/linux-trace-kernel/20231214012153.676155-1-zhengyejian1@huawei.com

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


12345678910>>...165