Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42 |
|
#
05f26f86 |
| 25-Jul-2023 |
Zhen Lei <thunder.leizhen@huawei.com> |
epoll: simplify ep_alloc()
The get_current_user() does not fail, and moving it after kzalloc() can simplify the code a bit.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Message-Id: <2023072
epoll: simplify ep_alloc()
The get_current_user() does not fail, and moving it after kzalloc() can simplify the code a bit.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Message-Id: <20230726032135.933-1-thunder.leizhen@huaweicloud.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
Revision tags: v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32 |
|
#
2192bba0 |
| 30-May-2023 |
Benjamin Segall <bsegall@google.com> |
epoll: ep_autoremove_wake_function should use list_del_init_careful
autoremove_wake_function uses list_del_init_careful, so should epoll's more aggressive variant. It only doesn't because it was co
epoll: ep_autoremove_wake_function should use list_del_init_careful
autoremove_wake_function uses list_del_init_careful, so should epoll's more aggressive variant. It only doesn't because it was copied from an older wait.c rather than the most recent.
[bsegall@google.com: add comment] Link: https://lkml.kernel.org/r/xm26bki0ulsr.fsf_-_@google.com Link: https://lkml.kernel.org/r/xm26pm6hvfer.fsf@google.com Fixes: a16ceb139610 ("epoll: autoremove wakers even more aggressively") Signed-off-by: Ben Segall <bsegall@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.31, v6.1.30, v6.1.29 |
|
#
38f1755a |
| 11-May-2023 |
Min-Hua Chen <minhuadotchen@gmail.com> |
fs: use correct __poll_t type
Fix the following sparse warnings by using __poll_t instead of unsigned type.
fs/eventpoll.c:541:9: sparse: warning: restricted __poll_t degrades to integer fs/eventfd
fs: use correct __poll_t type
Fix the following sparse warnings by using __poll_t instead of unsigned type.
fs/eventpoll.c:541:9: sparse: warning: restricted __poll_t degrades to integer fs/eventfd.c:67:17: sparse: warning: restricted __poll_t degrades to integer
Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com> Message-Id: <20230511164628.336586-1-minhuadotchen@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
Revision tags: v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24 |
|
#
d4cb626d |
| 11-Apr-2023 |
Davidlohr Bueso <dave@stgolabs.net> |
epoll: rename global epmutex
As of 4f04cbaf128 ("epoll: use refcount to reduce ep_mutex contention"), this lock is now specific to nesting cases - inserting an epoll fd onto another epoll fd. Renam
epoll: rename global epmutex
As of 4f04cbaf128 ("epoll: use refcount to reduce ep_mutex contention"), this lock is now specific to nesting cases - inserting an epoll fd onto another epoll fd. Rename the lock to be less generic.
Link: https://lkml.kernel.org/r/20230411234159.20421-1-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.23, v6.1.22 |
|
#
58c9b016 |
| 22-Mar-2023 |
Paolo Abeni <pabeni@redhat.com> |
epoll: use refcount to reduce ep_mutex contention
We are observing huge contention on the epmutex during an http connection/rate test:
83.17% 0.25% nginx [kernel.kallsyms] [k]
epoll: use refcount to reduce ep_mutex contention
We are observing huge contention on the epmutex during an http connection/rate test:
83.17% 0.25% nginx [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe [...] |--66.96%--__fput |--60.04%--eventpoll_release_file |--58.41%--__mutex_lock.isra.6 |--56.56%--osq_lock
The application is multi-threaded, creates a new epoll entry for each incoming connection, and does not delete it before the connection shutdown - that is, before the connection's fd close().
Many different threads compete frequently for the epmutex lock, affecting the overall performance.
To reduce the contention this patch introduces explicit reference counting for the eventpoll struct. Each registered event acquires a reference, and references are released at ep_remove() time.
The eventpoll struct is released by whoever - among EP file close() and and the monitored file close() drops its last reference.
Additionally, this introduces a new 'dying' flag to prevent races between the EP file close() and the monitored file close(). ep_eventpoll_release() marks, under f_lock spinlock, each epitem as dying before removing it, while EP file close() does not touch dying epitems.
The above is needed as both close operations could run concurrently and drop the EP reference acquired via the epitem entry. Without the above flag, the monitored file close() could reach the EP struct via the epitem list while the epitem is still listed and then try to put it after its disposal.
An alternative could be avoiding touching the references acquired via the epitems at EP file close() time, but that could leave the EP struct alive for potentially unlimited time after EP file close(), with nasty side effects.
With all the above in place, we can drop the epmutex usage at disposal time.
Overall this produces a significant performance improvement in the mentioned connection/rate scenario: the mutex operations disappear from the topmost offenders in the perf report, and the measured connections/rate grows by ~60%.
To make the change more readable this additionally renames ep_free() to ep_clear_and_put(), and moves the actual memory cleanup in a separate ep_free() helper.
Link: https://lkml.kernel.org/r/4a57788dcaf28f5eb4f8dfddcc3a8b172a7357bb.1679504153.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Xiumei Mu <xmu@redhiat.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.21, v6.1.20, v6.1.19 |
|
#
7059a9aa |
| 12-Mar-2023 |
Changcheng Liu <changchengx.liu@outlook.com> |
eventpoll: align comment with nested epoll limitation
fix comment in commit 02edc6fc4d5f ("epoll: comment the funky #ifdef")
Signed-off-by: Liu, Changcheng <changchengx.liu@outlook.com> Signed-off-
eventpoll: align comment with nested epoll limitation
fix comment in commit 02edc6fc4d5f ("epoll: comment the funky #ifdef")
Signed-off-by: Liu, Changcheng <changchengx.liu@outlook.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
show more ...
|
Revision tags: v6.1.18, v6.1.17 |
|
#
063f3ed9 |
| 10-Mar-2023 |
Palmer Dabbelt <palmer@dabbelt.com> |
Move ep_take_care_of_epollwakeup() to fs/eventpoll.c
This doesn't make any sense to expose to userspace, so it's been moved to the one user. This was introduced by commit 95f19f658ce1 ("epoll: drop
Move ep_take_care_of_epollwakeup() to fs/eventpoll.c
This doesn't make any sense to expose to userspace, so it's been moved to the one user. This was introduced by commit 95f19f658ce1 ("epoll: drop EPOLLWAKEUP if PM_SLEEP is disabled").
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Reviewed-by: Andrew Waterman <waterman@eecs.berkeley.edu> Reviewed-by: Albert Ou <aou@eecs.berkeley.edu> Message-Id: <1447119071-19392-7-git-send-email-palmer@dabbelt.com> [thuth: Rebased to fix contextual conflicts] Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
show more ...
|
Revision tags: v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80 |
|
#
caf1aeaf |
| 20-Nov-2022 |
Jens Axboe <axboe@kernel.dk> |
eventpoll: add EPOLL_URING_WAKE poll wakeup flag
We can have dependencies between epoll and io_uring. Consider an epoll context, identified by the epfd file descriptor, and an io_uring file descript
eventpoll: add EPOLL_URING_WAKE poll wakeup flag
We can have dependencies between epoll and io_uring. Consider an epoll context, identified by the epfd file descriptor, and an io_uring file descriptor identified by iofd. If we add iofd to the epfd context, and arm a multishot poll request for epfd with iofd, then the multishot poll request will repeatedly trigger and generate events until terminated by CQ ring overflow. This isn't a desired behavior.
Add EPOLL_URING so that io_uring can pass it in as part of the poll wakeup key, and io_uring can check for that to detect a potential recursive invocation.
Cc: stable@vger.kernel.org # 6.0 Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55 |
|
#
693fc06e |
| 14-Jul-2022 |
Uros Bizjak <ubizjak@gmail.com> |
epoll: use try_cmpxchg in list_add_tail_lockless
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in list_add_tail_lockless. x86 CMPXCHG instruction returns success in ZF flag, so this ch
epoll: use try_cmpxchg in list_add_tail_lockless
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in list_add_tail_lockless. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg).
No functional change intended.
Link: https://lkml.kernel.org/r/20220714173255.12987-1-ubizjak@gmail.com Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48 |
|
#
a16ceb13 |
| 15-Jun-2022 |
Benjamin Segall <bsegall@google.com> |
epoll: autoremove wakers even more aggressively
If a process is killed or otherwise exits while having active network connections and many threads waiting on epoll_wait, the threads will all be woke
epoll: autoremove wakers even more aggressively
If a process is killed or otherwise exits while having active network connections and many threads waiting on epoll_wait, the threads will all be woken immediately, but not removed from ep->wq. Then when network traffic scans ep->wq in wake_up, every wakeup attempt will fail, and will not remove the entries from the list.
This means that the cost of the wakeup attempt is far higher than usual, does not decrease, and this also competes with the dying threads trying to actually make progress and remove themselves from the wq.
Handle this by removing visited epoll wq entries unconditionally, rather than only when the wakeup succeeds - the structure of ep_poll means that the only potential loss is the timed_out->eavail heuristic, which now can race and result in a redundant ep_send_events attempt. (But only when incoming data and a timeout actually race, not on every timeout)
Shakeel added:
: We are seeing this issue in production with real workloads and it has : caused hard lockups. Particularly network heavy workloads with a lot : of threads in epoll_wait() can easily trigger this issue if they get : killed (oom-killed in our case).
Link: https://lkml.kernel.org/r/xm26fsjotqda.fsf@google.com Signed-off-by: Ben Segall <bsegall@google.com> Tested-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Khazhismel Kumykov <khazhy@google.com> Cc: Heiher <r@hev.cc> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17 |
|
#
a8f5de89 |
| 22-Jan-2022 |
Xiaoming Ni <nixiaoming@huawei.com> |
eventpoll: simplify sysctl declaration with register_sysctl()
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain.
To help with
eventpoll: simplify sysctl declaration with register_sysctl()
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic.
So move the epoll_table sysctl to fs/eventpoll.c and use register_sysctl().
Link: https://lkml.kernel.org/r/20211123202422.819032-9-mcgrof@kernel.org Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Qing Wang <wangqing@vivo.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
9ccb5d39 |
| 15-Jun-2022 |
Benjamin Segall <bsegall@google.com> |
epoll: autoremove wakers even more aggressively
commit a16ceb13961068f7209e34d7984f8e42d2c06159 upstream.
If a process is killed or otherwise exits while having active network connections and many
epoll: autoremove wakers even more aggressively
commit a16ceb13961068f7209e34d7984f8e42d2c06159 upstream.
If a process is killed or otherwise exits while having active network connections and many threads waiting on epoll_wait, the threads will all be woken immediately, but not removed from ep->wq. Then when network traffic scans ep->wq in wake_up, every wakeup attempt will fail, and will not remove the entries from the list.
This means that the cost of the wakeup attempt is far higher than usual, does not decrease, and this also competes with the dying threads trying to actually make progress and remove themselves from the wq.
Handle this by removing visited epoll wq entries unconditionally, rather than only when the wakeup succeeds - the structure of ep_poll means that the only potential loss is the timed_out->eavail heuristic, which now can race and result in a redundant ep_send_events attempt. (But only when incoming data and a timeout actually race, not on every timeout)
Shakeel added:
: We are seeing this issue in production with real workloads and it has : caused hard lockups. Particularly network heavy workloads with a lot : of threads in epoll_wait() can easily trigger this issue if they get : killed (oom-killed in our case).
Link: https://lkml.kernel.org/r/xm26fsjotqda.fsf@google.com Signed-off-by: Ben Segall <bsegall@google.com> Tested-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Khazhismel Kumykov <khazhy@google.com> Cc: Heiher <r@hev.cc> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63 |
|
#
1e1c1583 |
| 07-Sep-2021 |
Nicholas Piggin <npiggin@gmail.com> |
fs/epoll: use a per-cpu counter for user's watches count
This counter tracks the number of watches a user has, to compare against the 'max_user_watches' limit. This causes a scalability bottleneck o
fs/epoll: use a per-cpu counter for user's watches count
This counter tracks the number of watches a user has, to compare against the 'max_user_watches' limit. This causes a scalability bottleneck on SPECjbb2015 on large systems as there is only one user. Changing to a per-cpu counter increases throughput of the benchmark by about 30% on a 16-socket, > 1000 thread system.
[rdunlap@infradead.org: fix build errors in kernel/user.c when CONFIG_EPOLL=n] [npiggin@gmail.com: move ifdefs into wrapper functions, slightly improve panic message] Link: https://lkml.kernel.org/r/1628051945.fens3r99ox.astroid@bobo.none [akpm@linux-foundation.org: tweak user_epoll_alloc(), per Guenter] Link: https://lkml.kernel.org/r/20210804191421.GA1900577@roeck-us.net
Link: https://lkml.kernel.org/r/20210802032013.2751916-1-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reported-by: Anton Blanchard <anton@ozlabs.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60 |
|
#
249dbe74 |
| 11-Aug-2021 |
Arnd Bergmann <arnd@arndb.de> |
ARM: 9108/1: oabi-compat: rework epoll_wait/epoll_pwait emulation
The epoll_wait() system call wrapper is one of the remaining users of the set_fs() infrasturcture for Arm. Changing it to not requir
ARM: 9108/1: oabi-compat: rework epoll_wait/epoll_pwait emulation
The epoll_wait() system call wrapper is one of the remaining users of the set_fs() infrasturcture for Arm. Changing it to not require set_fs() is rather complex unfortunately.
The approach I'm taking here is to allow architectures to override the code that copies the output to user space, and let the oabi-compat implementation check whether it is getting called from an EABI or OABI system call based on the thread_info->syscall value.
The in_oabi_syscall() check here mirrors the in_compat_syscall() and in_x32_syscall() helpers for 32-bit compat implementations on other architectures.
Overall, the amount of code goes down, at least with the newly added sys_oabi_epoll_pwait() helper getting removed again. The downside is added complexity in the source code for the native implementation. There should be no difference in runtime performance except for Arm kernels with CONFIG_OABI_COMPAT enabled that now have to go through an external function call to check which of the two variants to use.
Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
show more ...
|
Revision tags: v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35 |
|
#
7fab29e3 |
| 06-May-2021 |
Davidlohr Bueso <dave@stgolabs.net> |
fs/epoll: restore waking from ep_done_scan()
Commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") changed the userspace visible behavior of exclusive waiters blocked on a com
fs/epoll: restore waking from ep_done_scan()
Commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") changed the userspace visible behavior of exclusive waiters blocked on a common epoll descriptor upon a single event becoming ready.
Previously, all tasks doing epoll_wait would awake, and now only one is awoken, potentially causing missed wakeups on applications that rely on this behavior, such as Apache Qpid.
While the aforementioned commit aims at having only a wakeup single path in ep_poll_callback (with the exceptions of epoll_ctl cases), we need to restore the wakeup in what was the old ep_scan_ready_list() such that the next thread can be awoken, in a cascading style, after the waker's corresponding ep_send_events().
Link: https://lkml.kernel.org/r/20210405231025.33829-3-dave@stgolabs.net Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jason Baron <jbaron@akamai.com> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20 |
|
#
a6c67fee |
| 01-Mar-2021 |
Randy Dunlap <rdunlap@infradead.org> |
fs: eventpoll: fix comments & kernel-doc notation
Use the documented kernel-doc format for function Return: descriptions. Begin constant values in kernel-doc comments with '%'.
Remove kernel-doc "/
fs: eventpoll: fix comments & kernel-doc notation
Use the documented kernel-doc format for function Return: descriptions. Begin constant values in kernel-doc comments with '%'.
Remove kernel-doc "/**" from 2 functions that are not documented with kernel-doc notation.
Fix typos, punctuation, & grammar.
Also fix a few kernel-doc warnings:
../fs/eventpoll.c:1883: warning: Function parameter or member 'ep' not described in 'ep_loop_check_proc' ../fs/eventpoll.c:1883: warning: Excess function parameter 'priv' description in 'ep_loop_check_proc' ../fs/eventpoll.c:1932: warning: Function parameter or member 'ep' not described in 'ep_loop_check' ../fs/eventpoll.c:1932: warning: Excess function parameter 'from' description in 'ep_loop_check'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
show more ...
|
Revision tags: v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14 |
|
#
bfe3911a |
| 05-Feb-2021 |
Chris Wilson <chris@chris-wilson.co.uk> |
kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for
kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category.
Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to deduplicate the per-service file descriptor store.
Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: stable@vger.kernel.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> # DRM depends on kcmp Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> # systemd uses kcmp Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210205220012.1983-1-chris@chris-wilson.co.uk
show more ...
|
#
58169a52 |
| 18-Dec-2020 |
Willem de Bruijn <willemb@google.com> |
epoll: add syscall epoll_pwait2
Add syscall epoll_pwait2, an epoll_wait variant with nsec resolution that replaces int timeout with struct timespec. It is equivalent otherwise.
int epoll_pwait
epoll: add syscall epoll_pwait2
Add syscall epoll_pwait2, an epoll_wait variant with nsec resolution that replaces int timeout with struct timespec. It is equivalent otherwise.
int epoll_pwait2(int fd, struct epoll_event *events, int maxevents, const struct timespec *timeout, const sigset_t *sigset);
The underlying hrtimer is already programmed with nsec resolution. pselect and ppoll also set nsec resolution timeout with timespec.
The sigset_t in epoll_pwait has a compat variant. epoll_pwait2 needs the same.
For timespec, only support this new interface on 2038 aware platforms that define __kernel_timespec_t. So no CONFIG_COMPAT_32BIT_TIME.
Link: https://lkml.kernel.org/r/20201121144401.3727659-3-willemdebruijn.kernel@gmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
7cdf7c20 |
| 18-Dec-2020 |
Willem de Bruijn <willemb@google.com> |
epoll: convert internal api to timespec64
Patch series "add epoll_pwait2 syscall", v4.
Enable nanosecond timeouts for epoll.
Analogous to pselect and ppoll, introduce an epoll_wait syscall variant
epoll: convert internal api to timespec64
Patch series "add epoll_pwait2 syscall", v4.
Enable nanosecond timeouts for epoll.
Analogous to pselect and ppoll, introduce an epoll_wait syscall variant that takes a struct timespec instead of int timeout.
This patch (of 4):
Make epoll more consistent with select/poll: pass along the timeout as timespec64 pointer.
In anticipation of additional changes affecting all three polling mechanisms:
- add epoll_pwait2 syscall with timespec semantics, and share poll_select_set_timeout implementation. - compute slack before conversion to absolute time, to save one ktime_get_ts64 call.
Link: https://lkml.kernel.org/r/20201121144401.3727659-1-willemdebruijn.kernel@gmail.com Link: https://lkml.kernel.org/r/20201121144401.3727659-2-willemdebruijn.kernel@gmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
e59d3c64 |
| 18-Dec-2020 |
Soheil Hassas Yeganeh <soheil@google.com> |
epoll: eliminate unnecessary lock for zero timeout
We call ep_events_available() under lock when timeout is 0, and then call it without locks in the loop for the other cases.
Instead, call ep_event
epoll: eliminate unnecessary lock for zero timeout
We call ep_events_available() under lock when timeout is 0, and then call it without locks in the loop for the other cases.
Instead, call ep_events_available() without lock for all cases. For non-zero timeouts, we will recheck after adding the thread to the wait queue. For zero timeout cases, by definition, user is opportunistically polling and will have to call epoll_wait again in the future.
Note that this lock was kept in c5a282e9635e9 because the whole loop was historically under lock.
This patch results in a 1% CPU/RPC reduction in RPC benchmarks.
Link: https://lkml.kernel.org/r/20201106231635.3528496-9-soheil.kdev@gmail.com Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
00b27634 |
| 18-Dec-2020 |
Soheil Hassas Yeganeh <soheil@google.com> |
epoll: replace gotos with a proper loop
The existing loop is pointless, and the labels make it really hard to follow the structure.
Replace that control structure with a simple loop that returns wh
epoll: replace gotos with a proper loop
The existing loop is pointless, and the labels make it really hard to follow the structure.
Replace that control structure with a simple loop that returns when there are new events, there is a signal, or the thread has timed out.
Link: https://lkml.kernel.org/r/20201106231635.3528496-8-soheil.kdev@gmail.com Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
e8c85328 |
| 18-Dec-2020 |
Soheil Hassas Yeganeh <soheil@google.com> |
epoll: pull all code between fetch_events and send_event into the loop
This is a no-op change which simplifies the follow up patches.
Link: https://lkml.kernel.org/r/20201106231635.3528496-7-soheil
epoll: pull all code between fetch_events and send_event into the loop
This is a no-op change which simplifies the follow up patches.
Link: https://lkml.kernel.org/r/20201106231635.3528496-7-soheil.kdev@gmail.com Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
1493c47f |
| 18-Dec-2020 |
Soheil Hassas Yeganeh <soheil@google.com> |
epoll: simplify and optimize busy loop logic
ep_events_available() is called multiple times around the busy loop logic, even though the logic is generally not used. ep_reset_busy_poll_napi_id() is
epoll: simplify and optimize busy loop logic
ep_events_available() is called multiple times around the busy loop logic, even though the logic is generally not used. ep_reset_busy_poll_napi_id() is similarly always called, even when busy loop is not used.
Eliminate ep_reset_busy_poll_napi_id() and inline it inside ep_busy_loop(). Make ep_busy_loop() return whether there are any events available after the busy loop. This will eliminate unnecessary loads and branches, and simplifies the loop.
Link: https://lkml.kernel.org/r/20201106231635.3528496-6-soheil.kdev@gmail.com Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
e411596d |
| 18-Dec-2020 |
Soheil Hassas Yeganeh <soheil@google.com> |
epoll: move eavail next to the list_empty_careful check
This is a no-op change and simply to make the code more coherent.
Link: https://lkml.kernel.org/r/20201106231635.3528496-5-soheil.kdev@gmail.
epoll: move eavail next to the list_empty_careful check
This is a no-op change and simply to make the code more coherent.
Link: https://lkml.kernel.org/r/20201106231635.3528496-5-soheil.kdev@gmail.com Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
cccd29bf |
| 18-Dec-2020 |
Soheil Hassas Yeganeh <soheil@google.com> |
epoll: pull fatal signal checks into ep_send_events()
To simplify the code, pull in checking the fatal signals into ep_send_events(). ep_send_events() is called only from ep_poll().
Note that, pre
epoll: pull fatal signal checks into ep_send_events()
To simplify the code, pull in checking the fatal signals into ep_send_events(). ep_send_events() is called only from ep_poll().
Note that, previously, we were always checking fatal events, but it is checked only if eavail is true. This should be fine because the goal of that check is to quickly return from epoll_wait() when there is a pending fatal signal.
Link: https://lkml.kernel.org/r/20201106231635.3528496-4-soheil.kdev@gmail.com Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Cc: Guantao Liu <guantaol@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|