Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48 |
|
#
dce8f8ed |
| 23-Aug-2023 |
Oleg Nesterov <oleg@redhat.com> |
document while_each_thread(), change first_tid() to use for_each_thread()
Add the comment to explain that while_each_thread(g,t) is not rcu-safe unless g is stable (e.g. current). Even if g is a g
document while_each_thread(), change first_tid() to use for_each_thread()
Add the comment to explain that while_each_thread(g,t) is not rcu-safe unless g is stable (e.g. current). Even if g is a group leader and thus can't exit before t, t or another sub-thread can exec and remove g from the thread_group list.
The only lockless user of while_each_thread() is first_tid() and it is fine in that it can't loop forever, yet for_each_thread() looks better and I am going to change while_each_thread/next_thread.
Link: https://lkml.kernel.org/r/20230823170806.GA11724@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
5ffd2c37 |
| 17-Aug-2023 |
Oleg Nesterov <oleg@redhat.com> |
kill do_each_thread()
Eric has pointed out that we still have 3 users of do_each_thread(). Change them to use for_each_process_thread() and kill this helper.
There is a subtle change, after do_each
kill do_each_thread()
Eric has pointed out that we still have 3 users of do_each_thread(). Change them to use for_each_process_thread() and kill this helper.
There is a subtle change, after do_each_thread/while_each_thread g == t == &init_task, while after for_each_process_thread() they both point to nowhere, but this doesn't matter.
> Why is for_each_process_thread() better than do_each_thread()?
Say, for_each_process_thread() is rcu safe, do_each_thread() is not.
And certainly
for_each_process_thread(p, t) { do_something(p, t); }
looks better than
do_each_thread(p, t) { do_something(p, t); } while_each_thread(p, t);
And again, there are only 3 users of this awkward helper left. It should have been killed years ago and in fact I thought it had already been killed. It uses while_each_thread() which needs some changes.
Link: https://lkml.kernel.org/r/20230817163708.GA8248@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: "Christian Brauner (Microsoft)" <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jiri Slaby <jirislaby@kernel.org> # tty/serial Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32 |
|
#
8ce8849d |
| 01-Jun-2023 |
Thomas Gleixner <tglx@linutronix.de> |
posix-timers: Ensure timer ID search-loop limit is valid
posix_timer_add() tries to allocate a posix timer ID by starting from the cached ID which was stored by the last successful allocation.
This
posix-timers: Ensure timer ID search-loop limit is valid
posix_timer_add() tries to allocate a posix timer ID by starting from the cached ID which was stored by the last successful allocation.
This is done in a loop searching the ID space for a free slot one by one. The loop has to terminate when the search wrapped around to the starting point.
But that's racy vs. establishing the starting point. That is read out lockless, which leads to the following problem:
CPU0 CPU1 posix_timer_add() start = sig->posix_timer_id; lock(hash_lock); ... posix_timer_add() if (++sig->posix_timer_id < 0) start = sig->posix_timer_id; sig->posix_timer_id = 0;
So CPU1 can observe a negative start value, i.e. -1, and the loop break never happens because the condition can never be true:
if (sig->posix_timer_id == start) break;
While this is unlikely to ever turn into an endless loop as the ID space is huge (INT_MAX), the racy read of the start value caught the attention of KCSAN and Dmitry unearthed that incorrectness.
Rewrite it so that all id operations are under the hash lock.
Reported-by: syzbot+5c54bd3eb218bb595aa9@syzkaller.appspotmail.com Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/87bkhzdn6g.ffs@tglx
show more ...
|
Revision tags: v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49 |
|
#
d80f7d7b |
| 21-Jun-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
Track how many threads have not started exiting and when the last thread starts exiting set SIGNAL_GROUP_EXIT.
This guarantees that S
signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
Track how many threads have not started exiting and when the last thread starts exiting set SIGNAL_GROUP_EXIT.
This guarantees that SIGNAL_GROUP_EXIT will get set when a process exits. In practice this achieves nothing as glibc's implementation of _exit calls sys_group_exit then sys_exit. While glibc's implemenation of pthread_exit calls exit (which cleansup and calls _exit) if it is the last thread and sys_exit if it is the last thread.
This means the only way the kernel might observe a process that does not set call exit_group is if the language runtime does not use glibc.
With more cleanups I hope to move the decrement of quick_threads earlier.
Link: https://lkml.kernel.org/r/87bkukd4tc.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
Revision tags: v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38 |
|
#
31cae1ea |
| 03-May-2022 |
Peter Zijlstra <peterz@infradead.org> |
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
Currently ptrace_stop() / do_signal_stop() rely on the special states TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, th
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
Currently ptrace_stop() / do_signal_stop() rely on the special states TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this state exists only in task->__state and nowhere else.
There's two spots of bother with this:
- PREEMPT_RT has task->saved_state which complicates matters, meaning task_is_{traced,stopped}() needs to check an additional variable.
- An alternative freezer implementation that itself relies on a special TASK state would loose TASK_TRACED/TASK_STOPPED and will result in misbehaviour.
As such, add additional state to task->jobctl to track this state outside of task->__state.
NOTE: this doesn't actually fix anything yet, just adds extra state.
--EWB * didn't add a unnecessary newline in signal.h * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up instead of in signal_wake_up_state. This prevents the clearing of TASK_STOPPED and TASK_TRACED from getting lost. * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-12-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
Revision tags: v5.15.37 |
|
#
2500ad1c |
| 29-Apr-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
ptrace: Don't change __state
Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace command is executing.
Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and impleme
ptrace: Don't change __state
Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace command is executing.
Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and implement a new jobctl flag TASK_PTRACE_FROZEN. This new flag is set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL when the wake up is for a fatal signal. Skip adding __TASK_TRACED when TASK_PTRACE_FROZEN is not set. This has the same effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use TASK_KILLABLE go through signal_wake_up.
Handle a ptrace_stop being called with a pending fatal signal. Previously it would have been handled by schedule simply failing to sleep. As TASK_WAKEKILL is no longer part of TASK_TRACED schedule will sleep with a fatal_signal_pending. The code in signal_wake_up guarantees that the code will be awaked by any fatal signal that codes after TASK_TRACED is set.
Previously the __state value of __TASK_TRACED was changed to TASK_RUNNING when woken up or back to TASK_TRACED when the code was left in ptrace_stop. Now when woken up ptrace_stop now clears JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced clears JOBCTL_PTRACE_FROZEN.
Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-10-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
e788be95 |
| 28-Apr-2022 |
Jens Axboe <axboe@kernel.dk> |
task_work: allow TWA_SIGNAL without a rescheduling IPI
Some use cases don't always need an IPI when sending a TWA_SIGNAL notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL, except it
task_work: allow TWA_SIGNAL without a rescheduling IPI
Some use cases don't always need an IPI when sending a TWA_SIGNAL notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL, except it doesn't send an IPI to the target task. It merely sets TIF_NOTIFY_SIGNAL and wakes up the task.
This can be useful in avoiding a forceful transition to the kernel if the task is running in userspace. Depending on the task_work in question, it may be quite fine waiting for the next reschedule or kernel enter anyway, or the use case may even have other mechanisms for hinting to the task that a transition may be useful. This can drive more cooperative scheduling of task_work.
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/821f42b6-7d91-8074-8212-d34998097de4@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.36, v5.15.35, v5.15.34, v5.15.33 |
|
#
78ed93d7 |
| 04-Apr-2022 |
Marco Elver <elver@google.com> |
signal: Deliver SIGTRAP on perf event asynchronously if blocked
With SIGTRAP on perf events, we have encountered termination of processes due to user space attempting to block delivery of SIGTRAP. C
signal: Deliver SIGTRAP on perf event asynchronously if blocked
With SIGTRAP on perf events, we have encountered termination of processes due to user space attempting to block delivery of SIGTRAP. Consider this case:
<set up SIGTRAP on a perf event> ... sigset_t s; sigemptyset(&s); sigaddset(&s, SIGTRAP | <and others>); sigprocmask(SIG_BLOCK, &s, ...); ... <perf event triggers>
When the perf event triggers, while SIGTRAP is blocked, force_sig_perf() will force the signal, but revert back to the default handler, thus terminating the task.
This makes sense for error conditions, but not so much for explicitly requested monitoring. However, the expectation is still that signals generated by perf events are synchronous, which will no longer be the case if the signal is blocked and delivered later.
To give user space the ability to clearly distinguish synchronous from asynchronous signals, introduce siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is required in future).
The resolution to the problem is then to (a) no longer force the signal (avoiding the terminations), but (b) tell user space via si_perf_flags if the signal was synchronous or not, so that such signals can be handled differently (e.g. let user space decide to ignore or consider the data imprecise).
The alternative of making the kernel ignore SIGTRAP on perf events if the signal is blocked may work for some usecases, but likely causes issues in others that then have to revert back to interception of sigprocmask() (which we want to avoid). [ A concrete example: when using breakpoint perf events to track data-flow, in a region of code where signals are blocked, data-flow can no longer be tracked accurately. When a relevant asynchronous signal is received after unblocking the signal, the data-flow tracking logic needs to know its state is imprecise. ]
Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/20220404111204.935357-1-elver@google.com
show more ...
|
Revision tags: v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23 |
|
#
593febb1 |
| 09-Feb-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
The header tracehook.h is no place for code to live. The functions set_notify_signal and clear_notify_signal are not about
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
The header tracehook.h is no place for code to live. The functions set_notify_signal and clear_notify_signal are not about signals. They are about interruptions that act like signals. The fundamental signal primitives wind up calling set_notify_signal and clear_notify_signal. Which means they need to be maintained with the signal code.
Since set_notify_signal and clear_notify_signal must be maintained with the signal subsystem move them into sched/signal.h and claim them as part of the signal subsystem.
Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-10-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
Revision tags: v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13 |
|
#
49697335 |
| 24-Jun-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Remove the helper signal_group_exit
This helper is misleading. It tests for an ongoing exec as well as the process having received a fatal signal.
Sometimes it is appropriate to treat an o
signal: Remove the helper signal_group_exit
This helper is misleading. It tests for an ongoing exec as well as the process having received a fatal signal.
Sometimes it is appropriate to treat an on-going exec differently than a process that is shutting down due to a fatal signal. In particular taking the fast path out of exit_signals instead of retargeting signals is not appropriate during exec, and not changing the the exit code in do_group_exit during exec.
Removing the helper makes it more obvious what is going on as both cases must be coded for explicitly.
While removing the helper fix the two cases where I have observed using signal_group_exit resulted in the wrong result.
In exit_signals only test for SIGNAL_GROUP_EXIT so that signals are retargetted during an exec.
In do_group_exit use 0 as the exit code during an exec as de_thread does not set group_exit_code. As best as I can determine group_exit_code has been is set to 0 most of the time during de_thread. During a thread group stop group_exit_code is set to the stop signal and when the thread group receives SIGCONT group_exit_code is reset to 0.
Link: https://lkml.kernel.org/r/20211213225350.27481-8-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
Revision tags: v5.10.46, v5.10.43 |
|
#
60700e38 |
| 06-Jun-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Rename group_exit_task group_exec_task
The only remaining user of group_exit_task is exec. Rename the field so that it is clear which part of the code uses it.
Update the comment above the
signal: Rename group_exit_task group_exec_task
The only remaining user of group_exit_task is exec. Rename the field so that it is clear which part of the code uses it.
Update the comment above the definition of group_exec_task to document how it is currently used.
Link: https://lkml.kernel.org/r/20211213225350.27481-7-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
2f824d4d |
| 08-Jan-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Remove SIGNAL_GROUP_COREDUMP
After the previous cleanups "signal->core_state" is set whenever SIGNAL_GROUP_COREDUMP is set and "signal->core_state" is tested whenver the code wants to know i
signal: Remove SIGNAL_GROUP_COREDUMP
After the previous cleanups "signal->core_state" is set whenever SIGNAL_GROUP_COREDUMP is set and "signal->core_state" is tested whenver the code wants to know if a coredump is in progress. The remaining tests of SIGNAL_GROUP_COREDUMP also test to see if SIGNAL_GROUP_EXIT is set. Similarly the only place that sets SIGNAL_GROUP_COREDUMP also sets SIGNAL_GROUP_EXIT.
Which makes SIGNAL_GROUP_COREDUMP unecessary and redundant. So stop setting SIGNAL_GROUP_COREDUMP, stop testing SIGNAL_GROUP_COREDUMP, and remove it's definition.
With the setting of SIGNAL_GROUP_COREDUMP gone, coredump_finish no longer needs to clear SIGNAL_GROUP_COREDUMP out of signal->flags by setting SIGNAL_GROUP_EXIT.
Link: https://lkml.kernel.org/r/20211213225350.27481-5-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
fcb116bc |
| 18-Nov-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Replace force_fatal_sig with force_exit_sig when in doubt
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added.
signal: Replace force_fatal_sig with force_exit_sig when in doubt
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals.
Add force_exit_sig and use it instead of force_fatal_sig where historically the code has directly called do_exit. This has the implementation benefits of going through the signal exit path (including generating core dumps) without the danger of allowing userspace to ignore or change these signals.
This avoids userspace regressions as older kernels exited with do_exit which debuggers also can not intercept.
In the future is should be possible to improve the quality of implementation of the kernel by changing some of these force_exit_sig calls to force_fatal_sig. That can be done where it matters on a case-by-case basis with careful analysis.
Reported-by: Kyle Huey <me@kylehuey.com> Reported-by: kernel test robot <oliver.sang@intel.com> [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Fixes: a3616a3c0272 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die") Fixes: 83a1f27ad773 ("signal/powerpc: On swapcontext failure force SIGSEGV") Fixes: 9bc508cf0791 ("signal/s390: Use force_sigsegv in default_trap_handler") Fixes: 086ec444f866 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig") Fixes: c317d306d550 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails") Fixes: 695dd0d634df ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit") Fixes: 1fbd60df8a85 ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.") Fixes: 941edc5bf174 ("exit/syscall_user_dispatch: Send ordinary signals on failure") Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
5768d890 |
| 15-Nov-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Requeue signals in the appropriate queue
In the event that a tracer changes which signal needs to be delivered and that signal is currently blocked then the signal needs to be requeued for l
signal: Requeue signals in the appropriate queue
In the event that a tracer changes which signal needs to be delivered and that signal is currently blocked then the signal needs to be requeued for later delivery.
With the advent of CLONE_THREAD the kernel has 2 signal queues per task. The per process queue and the per task queue. Update the code so that if the signal is removed from the per process queue it is requeued on the per process queue. This is necessary to make it appear the signal was never dequeued.
The rr debugger reasonably believes that the state of the process from the last ptrace_stop it observed until PTRACE_EVENT_EXIT can be recreated by simply letting a process run. If a SIGKILL interrupts a ptrace_stop this is not true today.
So return signals to their original queue in ptrace_signal so that signals that are not delivered appear like they were never dequeued.
Fixes: 794aa320b79d ("[PATCH] sigfix-2.5.40-D6") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87zgq4d5r4.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
26d5badb |
| 20-Oct-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Implement force_fatal_sig
Add a simple helper force_fatal_sig that causes a signal to be delivered to a process as if the signal handler was set to SIG_DFL.
Reimplement force_sigsegv based
signal: Implement force_fatal_sig
Add a simple helper force_fatal_sig that causes a signal to be delivered to a process as if the signal handler was set to SIG_DFL.
Reimplement force_sigsegv based upon this new helper. This fixes force_sigsegv so that when it forces the default signal handler to be used the code now forces the signal to be unblocked as well.
Reusing the tested logic in force_sig_info_to_task that was built for force_sig_seccomp this makes the implementation trivial.
This is interesting both because it makes force_sigsegv simpler and because there are a couple of buggy places in the kernel that call do_exit(SIGILL) or do_exit(SIGSYS) because there is no straight forward way today for those places to simply force the exit of a process with the chosen signal. Creating force_fatal_sig allows those places to be implemented with normal signal exits.
Link: https://lkml.kernel.org/r/20211020174406.17889-13-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
#
0258b5fd |
| 22-Sep-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
coredump: Limit coredumps to a single thread group
Today when a signal is delivered with a handler of SIG_DFL whose default behavior is to generate a core dump not only that process but every proces
coredump: Limit coredumps to a single thread group
Today when a signal is delivered with a handler of SIG_DFL whose default behavior is to generate a core dump not only that process but every process that shares the mm is killed.
In the case of vfork this looks like a real world problem. Consider the following well defined sequence.
if (vfork() == 0) { execve(...); _exit(EXIT_FAILURE); }
If a signal that generates a core dump is received after vfork but before the execve changes the mm the process that called vfork will also be killed (as the mm is shared).
Similarly if the execve fails after the point of no return the kernel delivers SIGSEGV which will kill both the exec'ing process and because the mm is shared the process that called vfork as well.
As far as I can tell this behavior is a violation of people's reasonable expectations, POSIX, and is unnecessarily fragile when the system is low on memory.
Solve this by making a userspace visible change to only kill a single process/thread group. This is possible because Jann Horn recently modified[1] the coredump code so that the mm can safely be modified while the coredump is happening. With LinuxThreads long gone I don't expect anyone to have a notice this behavior change in practice.
To accomplish this move the core_state pointer from mm_struct to signal_struct, which allows different thread groups to coredump simultatenously.
In zap_threads remove the work to kill anything except for the current thread group.
v2: Remove core_state from the VM_BUG_ON_MM print to fix compile failure when CONFIG_DEBUG_VM is enabled. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
[1] a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3") History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133 Link: https://lkml.kernel.org/r/20211007144701.67592574@canb.auug.org.au Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
60768ffc |
| 04-Apr-2022 |
Marco Elver <elver@google.com> |
signal: Deliver SIGTRAP on perf event asynchronously if blocked
[ Upstream commit 78ed93d72ded679e3caf0758357209887bda885f ]
With SIGTRAP on perf events, we have encountered termination of processe
signal: Deliver SIGTRAP on perf event asynchronously if blocked
[ Upstream commit 78ed93d72ded679e3caf0758357209887bda885f ]
With SIGTRAP on perf events, we have encountered termination of processes due to user space attempting to block delivery of SIGTRAP. Consider this case:
<set up SIGTRAP on a perf event> ... sigset_t s; sigemptyset(&s); sigaddset(&s, SIGTRAP | <and others>); sigprocmask(SIG_BLOCK, &s, ...); ... <perf event triggers>
When the perf event triggers, while SIGTRAP is blocked, force_sig_perf() will force the signal, but revert back to the default handler, thus terminating the task.
This makes sense for error conditions, but not so much for explicitly requested monitoring. However, the expectation is still that signals generated by perf events are synchronous, which will no longer be the case if the signal is blocked and delivered later.
To give user space the ability to clearly distinguish synchronous from asynchronous signals, introduce siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is required in future).
The resolution to the problem is then to (a) no longer force the signal (avoiding the terminations), but (b) tell user space via si_perf_flags if the signal was synchronous or not, so that such signals can be handled differently (e.g. let user space decide to ignore or consider the data imprecise).
The alternative of making the kernel ignore SIGTRAP on perf events if the signal is blocked may work for some usecases, but likely causes issues in others that then have to revert back to interception of sigprocmask() (which we want to avoid). [ A concrete example: when using breakpoint perf events to track data-flow, in a region of code where signals are blocked, data-flow can no longer be tracked accurately. When a relevant asynchronous signal is received after unblocking the signal, the data-flow tracking logic needs to know its state is imprecise. ]
Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/20220404111204.935357-1-elver@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
60768ffc |
| 04-Apr-2022 |
Marco Elver <elver@google.com> |
signal: Deliver SIGTRAP on perf event asynchronously if blocked
[ Upstream commit 78ed93d72ded679e3caf0758357209887bda885f ]
With SIGTRAP on perf events, we have encountered termination of processe
signal: Deliver SIGTRAP on perf event asynchronously if blocked
[ Upstream commit 78ed93d72ded679e3caf0758357209887bda885f ]
With SIGTRAP on perf events, we have encountered termination of processes due to user space attempting to block delivery of SIGTRAP. Consider this case:
<set up SIGTRAP on a perf event> ... sigset_t s; sigemptyset(&s); sigaddset(&s, SIGTRAP | <and others>); sigprocmask(SIG_BLOCK, &s, ...); ... <perf event triggers>
When the perf event triggers, while SIGTRAP is blocked, force_sig_perf() will force the signal, but revert back to the default handler, thus terminating the task.
This makes sense for error conditions, but not so much for explicitly requested monitoring. However, the expectation is still that signals generated by perf events are synchronous, which will no longer be the case if the signal is blocked and delivered later.
To give user space the ability to clearly distinguish synchronous from asynchronous signals, introduce siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is required in future).
The resolution to the problem is then to (a) no longer force the signal (avoiding the terminations), but (b) tell user space via si_perf_flags if the signal was synchronous or not, so that such signals can be handled differently (e.g. let user space decide to ignore or consider the data imprecise).
The alternative of making the kernel ignore SIGTRAP on perf events if the signal is blocked may work for some usecases, but likely causes issues in others that then have to revert back to interception of sigprocmask() (which we want to avoid). [ A concrete example: when using breakpoint perf events to track data-flow, in a region of code where signals are blocked, data-flow can no longer be tracked accurately. When a relevant asynchronous signal is received after unblocking the signal, the data-flow tracking logic needs to know its state is imprecise. ]
Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/20220404111204.935357-1-elver@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
686bf792 |
| 18-Nov-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Replace force_fatal_sig with force_exit_sig when in doubt
commit fcb116bc43c8c37c052530ead79872f8b2615711 upstream.
Recently to prevent issues with SECCOMP_RET_KILL and similar signals bein
signal: Replace force_fatal_sig with force_exit_sig when in doubt
commit fcb116bc43c8c37c052530ead79872f8b2615711 upstream.
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals.
Add force_exit_sig and use it instead of force_fatal_sig where historically the code has directly called do_exit. This has the implementation benefits of going through the signal exit path (including generating core dumps) without the danger of allowing userspace to ignore or change these signals.
This avoids userspace regressions as older kernels exited with do_exit which debuggers also can not intercept.
In the future is should be possible to improve the quality of implementation of the kernel by changing some of these force_exit_sig calls to force_fatal_sig. That can be done where it matters on a case-by-case basis with careful analysis.
Reported-by: Kyle Huey <me@kylehuey.com> Reported-by: kernel test robot <oliver.sang@intel.com> [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Fixes: a3616a3c0272 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die") Fixes: 83a1f27ad773 ("signal/powerpc: On swapcontext failure force SIGSEGV") Fixes: 9bc508cf0791 ("signal/s390: Use force_sigsegv in default_trap_handler") Fixes: 086ec444f866 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig") Fixes: c317d306d550 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails") Fixes: 695dd0d634df ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit") Fixes: 1fbd60df8a85 ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.") Fixes: 941edc5bf174 ("exit/syscall_user_dispatch: Send ordinary signals on failure") Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Kyle Huey <khuey@kylehuey.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Thomas Backlund <tmb@iki.fi> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
110ae07d |
| 20-Oct-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Implement force_fatal_sig
commit 26d5badbccddcc063dc5174a2baffd13a23322aa upstream.
Add a simple helper force_fatal_sig that causes a signal to be delivered to a process as if the signal ha
signal: Implement force_fatal_sig
commit 26d5badbccddcc063dc5174a2baffd13a23322aa upstream.
Add a simple helper force_fatal_sig that causes a signal to be delivered to a process as if the signal handler was set to SIG_DFL.
Reimplement force_sigsegv based upon this new helper. This fixes force_sigsegv so that when it forces the default signal handler to be used the code now forces the signal to be unblocked as well.
Reusing the tested logic in force_sig_info_to_task that was built for force_sig_seccomp this makes the implementation trivial.
This is interesting both because it makes force_sigsegv simpler and because there are a couple of buggy places in the kernel that call do_exit(SIGILL) or do_exit(SIGSYS) because there is no straight forward way today for those places to simply force the exit of a process with the chosen signal. Creating force_fatal_sig allows those places to be implemented with normal signal exits.
Link: https://lkml.kernel.org/r/20211020174406.17889-13-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Thomas Backlund <tmb@iki.fi> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
307d522f |
| 23-Jun-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/seccomp: Refactor seccomp signal and coredump generation
Factor out force_sig_seccomp from the seccomp signal generation and place it in kernel/signal.c. The function force_sig_seccomp takes
signal/seccomp: Refactor seccomp signal and coredump generation
Factor out force_sig_seccomp from the seccomp signal generation and place it in kernel/signal.c. The function force_sig_seccomp takes a parameter force_coredump to indicate that the sigaction field should be reset to SIGDFL so that a coredump will be generated when the signal is delivered.
force_sig_seccomp is then used to replace both seccomp_send_sigsys and seccomp_init_siginfo.
force_sig_info_to_task gains an extra parameter to force using the default signal action.
With this change seccomp is no longer a special case and there becomes exactly one place do_coredump is called from.
Further it no longer becomes necessary for __seccomp_filter to call do_group_exit.
Acked-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/87r1gr6qc4.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
a5dec9f8 |
| 26-Jul-2021 |
Frederic Weisbecker <frederic@kernel.org> |
posix-cpu-timers: Assert task sighand is locked while starting cputime counter
Starting the process wide cputime counter needs to be done in the same sighand locking sequence than actually arming th
posix-cpu-timers: Assert task sighand is locked while starting cputime counter
Starting the process wide cputime counter needs to be done in the same sighand locking sequence than actually arming the related timer otherwise this races against concurrent timers setting/expiring in the same threadgroup.
Detecting that the cputime counter is started without holding the sighand lock is a first step toward debugging such situations.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-2-frederic@kernel.org
show more ...
|
Revision tags: v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116 |
|
#
c7fff928 |
| 30-Apr-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Remove the generic __ARCH_SI_TRAPNO support
Now that __ARCH_SI_TRAPNO is no longer set by any architecture remove all of the code it enabled from the kernel.
On alpha and sparc a more expli
signal: Remove the generic __ARCH_SI_TRAPNO support
Now that __ARCH_SI_TRAPNO is no longer set by any architecture remove all of the code it enabled from the kernel.
On alpha and sparc a more explict approach of using send_sig_fault_trapno or force_sig_fault_trapno in the very limited circumstances where si_trapno was set to a non-zero value.
The generic support that is being removed always set si_trapno on all fault signals. With only SIGILL ILL_ILLTRAP on sparc and SIGFPE and SIGTRAP TRAP_UNK on alpla providing si_trapno values asking all senders of fault signals to provide an si_trapno value does not make sense.
Making si_trapno an ordinary extension of the fault siginfo layout has enabled the architecture generic implementation of SIGTRAP TRAP_PERF, and enables other faulting signals to grow architecture generic senders as well.
v1: https://lkml.kernel.org/r/m18s4zs7nu.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-8-ebiederm@xmission.com Link: https://lkml.kernel.org/r/87bl73xx6x.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
7de5f68d |
| 28-May-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/alpha: si_trapno is only used with SIGFPE and SIGTRAP TRAP_UNK
While reviewing the signal handlers on alpha it became clear that si_trapno is only set to a non-zero value when sending SIGFPE
signal/alpha: si_trapno is only used with SIGFPE and SIGTRAP TRAP_UNK
While reviewing the signal handlers on alpha it became clear that si_trapno is only set to a non-zero value when sending SIGFPE and when sending SITGRAP with si_code TRAP_UNK.
Add send_sig_fault_trapno and send SIGTRAP TRAP_UNK, and SIGFPE with it.
Remove the define of __ARCH_SI_TRAPNO and remove the always zero si_trapno parameter from send_sig_fault and force_sig_fault.
v1: https://lkml.kernel.org/r/m1eeers7q7.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-7-ebiederm@xmission.com Link: https://lkml.kernel.org/r/87h7gvxx7l.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
2c9f7eaf |
| 28-May-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/sparc: si_trapno is only used with SIGILL ILL_ILLTRP
While reviewing the signal handlers on sparc it became clear that si_trapno is only set to a non-zero value when sending SIGILL with si_co
signal/sparc: si_trapno is only used with SIGILL ILL_ILLTRP
While reviewing the signal handlers on sparc it became clear that si_trapno is only set to a non-zero value when sending SIGILL with si_code ILL_ILLTRP.
Add force_sig_fault_trapno and send SIGILL ILL_ILLTRP with it.
Remove the define of __ARCH_SI_TRAPNO and remove the always zero si_trapno parameter from send_sig_fault and force_sig_fault.
v1: https://lkml.kernel.org/r/m1eeers7q7.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-7-ebiederm@xmission.com Link: https://lkml.kernel.org/r/87mtqnxx89.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|