eventpoll.c - OpenGrok history log for /openbmc/linux/fs/eventpoll.c

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 7fd3253a	30-Nov-2020	Björn Töpel <bjorn.topel@intel.com>	net: Introduce preferred busy-polling The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That net: Introduce preferred busy-polling The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That means that if the NAPI context is not scheduled, it will poll it. If, after busy-polling, the budget is exceeded the busy-polling logic will schedule the NAPI onto the regular softirq handling. One implication of the behavior above is that a busy/heavy loaded NAPI context will never enter/allow for busy-polling. Some applications prefer that most NAPI processing would be done by busy-polling. This series adds a new socket option, SO_PREFER_BUSY_POLL, that works in concert with the napi_defer_hard_irqs and gro_flush_timeout knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature"), and allows for a user to defer interrupts to be enabled and instead schedule the NAPI context from a watchdog timer. When a user enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled, and the NAPI context is being processed by a softirq, the softirq NAPI processing will exit early to allow the busy-polling to be performed. If the application stops performing busy-polling via a system call, the watchdog timer defined by gro_flush_timeout will timeout, and regular softirq handling will resume. In summary; Heavy traffic applications that prefer busy-polling over softirq processing should use this option. Example usage: $ echo 2 \| sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs $ echo 200000 \| sudo tee /sys/class/net/ens785f1/gro_flush_timeout Note that the timeout should be larger than the userspace processing window, otherwise the watchdog will timeout and fall back to regular softirq processing. Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/20201130185205.196029-2-bjorn.topel@gmail.com show more ...
# 20c7775a	26-Nov-2020	Peter Zijlstra <peterz@infradead.org>	Merge remote-tracking branch 'origin/master' into perf/core Further perf/core patches will depend on: d3f7b1bb2040 ("mm/gup: fix gup_fast with dynamic page table folding") which is already in Li Merge remote-tracking branch 'origin/master' into perf/core Further perf/core patches will depend on: d3f7b1bb2040 ("mm/gup: fix gup_fast with dynamic page table folding") which is already in Linus' tree. show more ...
# 05909cd9	17-Nov-2020	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge tag 'v5.9' into next Sync up with mainline to bring in the latest DTS files.
# 4f6b838c	12-Nov-2020	Marc Zyngier <maz@kernel.org>	Merge tag 'v5.10-rc1' into kvmarm-master/next Linux 5.10-rc1 Signed-off-by: Marc Zyngier <maz@kernel.org>
# 666fab4a	07-Nov-2020	Ingo Molnar <mingo@kernel.org>	Merge branch 'linus' into perf/kprobes Conflicts: include/asm-generic/atomic-instrumented.h kernel/kprobes.c Use the upstream atomic-instrumented.h checksum, and pick the kprobes version of kerne Merge branch 'linus' into perf/kprobes Conflicts: include/asm-generic/atomic-instrumented.h kernel/kprobes.c Use the upstream atomic-instrumented.h checksum, and pick the kprobes version of kernel/kprobes.c, which effectively reverts this upstream workaround: 645f224e7ba2: ("kprobes: Tell lockdep about kprobe nesting") Since the new code should be fine without nesting. Knock on wood ... Signed-off-by: Ingo Molnar <mingo@kernel.org> show more ...
# 5f8f9652	05-Nov-2020	Jani Nikula <jani.nikula@intel.com>	Merge drm/drm-next into drm-intel-next-queued Catch up with v5.10-rc2 and drm-misc-next. Signed-off-by: Jani Nikula <jani.nikula@intel.com>
# 01be83ee	04-Nov-2020	Thomas Gleixner <tglx@linutronix.de>	Merge branch 'core/urgent' into core/entry Pick up the entry fix before further modifications.
# c489573b	02-Nov-2020	Maxime Ripard <maxime@cerno.tech>	Merge drm/drm-next into drm-misc-next Daniel needs -rc2 in drm-misc-next to merge some patches Signed-off-by: Maxime Ripard <maxime@cerno.tech>
# 17bb415f	01-Nov-2020	Thomas Gleixner <tglx@linutronix.de>	Merge tag 'irqchip-fixes-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - A couple of fixes after the IPI as IRQ Merge tag 'irqchip-fixes-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - A couple of fixes after the IPI as IRQ patches (Kconfig, bcm2836) - Two SiFive PLIC fixes (irq_set_affinity, hierarchy handling) - "unmapped events" handling for the ti-sci-inta controller - Tidying up for the irq-mst driver (static functions, Kconfig) - Small cleanup in the Renesas irqpin driver - STM32 exti can now handle LP timer events show more ...
# 4a95857a	29-Oct-2020	Zhenyu Wang <zhenyuw@linux.intel.com>	Merge tag 'drm-intel-fixes-2020-10-29' into gvt-fixes Backmerge for 5.10-rc1 to apply one extra APL fix. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Revision tags: v5.8.17
# f59cddd8	28-Oct-2020	Mark Brown <broonie@kernel.org>	Merge tag 'v5.10-rc1' into regulator-5.10 Linux 5.10-rc1
# 3bfd5f42	28-Oct-2020	Mark Brown <broonie@kernel.org>	Merge tag 'v5.10-rc1' into spi-5.10 Linux 5.10-rc1
# ce038aea	28-Oct-2020	Mark Brown <broonie@kernel.org>	Merge tag 'v5.10-rc1' into asoc-5.10 Linux 5.10-rc1
Revision tags: v5.8.16, v5.8.15, v5.9, v5.8.14
# 319c1517	01-Oct-2020	Al Viro <viro@zeniv.linux.org.uk>	epoll: take epitem list out of struct file Move the head of epitem list out of struct file; for epoll ones it's moved into struct eventpoll (->refs there), for non-epoll - into the new object (struc epoll: take epitem list out of struct file Move the head of epitem list out of struct file; for epoll ones it's moved into struct eventpoll (->refs there), for non-epoll - into the new object (struct epitem_head). In place of ->f_ep_links we leave a pointer to the list head (->f_ep). ->f_ep is protected by ->f_lock and it's zeroed as soon as the list of epitems becomes empty (that can happen only in ep_remove() by now). The list of files for reverse path check is not going through struct file now - it's a single-linked list going through epitem_head instances. It's terminated by ERR_PTR(-1) (== EP_UNACTIVE_POINTER), so the elements of list can be distinguished by head->next != NULL. epitem_head instances are allocated at ep_insert() time (by attach_epitem()) and freed either by ep_remove() (if it empties the set of epitems and epitem_head does not belong to the reverse path check list) or by clear_tfile_check_list() when the list is emptied (if the set of epitems is empty by that point). Allocations are done from a separate slab - minimal kmalloc() size is too large on some architectures. As the result, we trim struct file _and_ get rid of the games with temporary file references. Locking and barriers are interesting (aren't they always); see unlist_file() and ep_remove() for details. The non-obvious part is that ep_remove() needs to decide if it will be the one to free the damn thing before actually storing NULL to head->epitems.first - that's what smp_load_acquire is for in there. unlist_file() lockless path is safe, since we hit it only if we observe NULL in head->epitems.first and whoever had done that store is guaranteed to have observed non-NULL in head->next. IOW, their last access had been the store of NULL into ->epitems.first and we can safely free the sucker. OTOH, we are under rcu_read_lock() and both epitem and epitem->file have their freeing RCU-delayed. So if we see non-NULL ->epitems.first, we can grab ->f_lock (all epitems in there share the same struct file) and safely recheck the emptiness of ->epitems; again, ->next is still non-NULL, so ep_remove() couldn't have freed head yet. ->f_lock serializes us wrt ep_remove(); the rest is trivial. Note that once head->epitems becomes NULL, nothing can get inserted into it - the only remaining reference to head after that point is from the reverse path check list. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> show more ...
# d9f41e3c	01-Oct-2020	Al Viro <viro@zeniv.linux.org.uk>	epoll: massage the check list insertion in the "non-epoll target" cases do it in ep_insert() rather than in do_epoll_ctl(), so that we do it only with some epitem is already guaranteed to exist. Si epoll: massage the check list insertion in the "non-epoll target" cases do it in ep_insert() rather than in do_epoll_ctl(), so that we do it only with some epitem is already guaranteed to exist. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> show more ...
# b62d2706	01-Oct-2020	Al Viro <viro@zeniv.linux.org.uk>	lift rcu_read_lock() into reverse_path_check() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Revision tags: v5.8.13
# 44cdc1d9	27-Sep-2020	Al Viro <viro@zeniv.linux.org.uk>	convert ->f_ep_links/->fllink to hlist we don't care about the order of elements there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# d1ec50ad	27-Sep-2020	Al Viro <viro@zeniv.linux.org.uk>	ep_insert(): move creation of wakeup source past the fl_ep_links insertion That's the beginning of preparations for taking f_ep_links out of struct file. If insertion might fail, we will need a new ep_insert(): move creation of wakeup source past the fl_ep_links insertion That's the beginning of preparations for taking f_ep_links out of struct file. If insertion might fail, we will need a new failure exit. Having wakeup source creation done after that point will simplify life there; ep_remove() can (and commonly does) live with NULL epi->ws, so it can be used for cleanup after ep_create_wakeup_source() failure. It can't be used before the rbtree insertion, though, so if we are to unify all old failure exits, we need to move that thing down. Then we would be free to do simple kmem_cache_free() on the failure to insert into f_ep_links - no wakeup source to leak on that failure exit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> show more ...
# 2c0b71c1	26-Sep-2020	Al Viro <viro@zeniv.linux.org.uk>	fold ep_read_events_proc() into the only caller Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# ad9366b1	26-Sep-2020	Al Viro <viro@zeniv.linux.org.uk>	take the common part of ep_eventpoll_poll() and ep_item_poll() into helper The only reason why ep_item_poll() can't simply call ep_eventpoll_poll() (or, better yet, call vfs_poll() in all cases) is take the common part of ep_eventpoll_poll() and ep_item_poll() into helper The only reason why ep_item_poll() can't simply call ep_eventpoll_poll() (or, better yet, call vfs_poll() in all cases) is that we need to tell lockdep how deep into the hierarchy of ->mtx we are. So let's add a variant of ep_eventpoll_poll() that would take depth explicitly and turn ep_eventpoll_poll() into wrapper for that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> show more ...
# 85353e91	26-Sep-2020	Al Viro <viro@zeniv.linux.org.uk>	ep_insert(): we only need tep->mtx around the insertion itself We do need ep->mtx (and we are holding it all along), but that's the lock on the epoll we are inserting into; locking of the epoll bein ep_insert(): we only need tep->mtx around the insertion itself We do need ep->mtx (and we are holding it all along), but that's the lock on the epoll we are inserting into; locking of the epoll being inserted is not needed for most of that work - as the matter of fact, we only need it to provide barriers for the fastpath check (for now). Move taking and releasing it into ep_insert(). The caller (do_epoll_ctl()) doesn't need to bother with that at all. Moreover, that way we kill the kludge in ep_item_poll() - now it's always called with tep unlocked. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> show more ...
# e3e096e7	26-Sep-2020	Al Viro <viro@zeniv.linux.org.uk>	ep_insert(): don't open-code ep_remove() on failure exits Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Revision tags: v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62
# 57804b1c	31-Aug-2020	Al Viro <viro@zeniv.linux.org.uk>	lift locking/unlocking ep->mtx out of ep_{start,done}_scan() get rid of depth/ep_locked arguments there and document the kludge in ep_item_poll() that has lead to ep_locked existence in the first pl lift locking/unlocking ep->mtx out of ep_{start,done}_scan() get rid of depth/ep_locked arguments there and document the kludge in ep_item_poll() that has lead to ep_locked existence in the first place Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> show more ...
# ff07952a	31-Aug-2020	Al Viro <viro@zeniv.linux.org.uk>	ep_send_events_proc(): fold into the caller ... and get rid of struct ep_send_events_data - not needed anymore. The weird way of passing the arguments in (and real return value out - nominal return ep_send_events_proc(): fold into the caller ... and get rid of struct ep_send_events_data - not needed anymore. The weird way of passing the arguments in (and real return value out - nominal return value of ep_send_events_proc() is ignored) was due to the signature forced on ep_scan_ready_list() callbacks. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> show more ...
# 443f1a04	31-Aug-2020	Al Viro <viro@zeniv.linux.org.uk>	lift the calls of ep_send_events_proc() into the callers ... and kill ep_scan_ready_list() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1 2 3 4 5 6 789 10 >>...74