#
7fd3253a |
| 30-Nov-2020 |
Björn Töpel <bjorn.topel@intel.com> |
net: Introduce preferred busy-polling
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That
net: Introduce preferred busy-polling
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That means that if the NAPI context is not scheduled, it will poll it. If, after busy-polling, the budget is exceeded the busy-polling logic will schedule the NAPI onto the regular softirq handling.
One implication of the behavior above is that a busy/heavy loaded NAPI context will never enter/allow for busy-polling. Some applications prefer that most NAPI processing would be done by busy-polling.
This series adds a new socket option, SO_PREFER_BUSY_POLL, that works in concert with the napi_defer_hard_irqs and gro_flush_timeout knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature"), and allows for a user to defer interrupts to be enabled and instead schedule the NAPI context from a watchdog timer. When a user enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled, and the NAPI context is being processed by a softirq, the softirq NAPI processing will exit early to allow the busy-polling to be performed.
If the application stops performing busy-polling via a system call, the watchdog timer defined by gro_flush_timeout will timeout, and regular softirq handling will resume.
In summary; Heavy traffic applications that prefer busy-polling over softirq processing should use this option.
Example usage:
$ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout
Note that the timeout should be larger than the userspace processing window, otherwise the watchdog will timeout and fall back to regular softirq processing.
Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/20201130185205.196029-2-bjorn.topel@gmail.com
show more ...
|
#
20c7775a |
| 26-Nov-2020 |
Peter Zijlstra <peterz@infradead.org> |
Merge remote-tracking branch 'origin/master' into perf/core
Further perf/core patches will depend on:
d3f7b1bb2040 ("mm/gup: fix gup_fast with dynamic page table folding")
which is already in Li
Merge remote-tracking branch 'origin/master' into perf/core
Further perf/core patches will depend on:
d3f7b1bb2040 ("mm/gup: fix gup_fast with dynamic page table folding")
which is already in Linus' tree.
show more ...
|
#
05909cd9 |
| 17-Nov-2020 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v5.9' into next
Sync up with mainline to bring in the latest DTS files.
|
#
4f6b838c |
| 12-Nov-2020 |
Marc Zyngier <maz@kernel.org> |
Merge tag 'v5.10-rc1' into kvmarm-master/next
Linux 5.10-rc1
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
666fab4a |
| 07-Nov-2020 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into perf/kprobes
Conflicts: include/asm-generic/atomic-instrumented.h kernel/kprobes.c
Use the upstream atomic-instrumented.h checksum, and pick the kprobes version of kerne
Merge branch 'linus' into perf/kprobes
Conflicts: include/asm-generic/atomic-instrumented.h kernel/kprobes.c
Use the upstream atomic-instrumented.h checksum, and pick the kprobes version of kernel/kprobes.c, which effectively reverts this upstream workaround:
645f224e7ba2: ("kprobes: Tell lockdep about kprobe nesting")
Since the new code *should* be fine without nesting.
Knock on wood ...
Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
5f8f9652 |
| 05-Nov-2020 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next-queued
Catch up with v5.10-rc2 and drm-misc-next.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
01be83ee |
| 04-Nov-2020 |
Thomas Gleixner <tglx@linutronix.de> |
Merge branch 'core/urgent' into core/entry
Pick up the entry fix before further modifications.
|
#
c489573b |
| 02-Nov-2020 |
Maxime Ripard <maxime@cerno.tech> |
Merge drm/drm-next into drm-misc-next
Daniel needs -rc2 in drm-misc-next to merge some patches
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|
#
17bb415f |
| 01-Nov-2020 |
Thomas Gleixner <tglx@linutronix.de> |
Merge tag 'irqchip-fixes-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
- A couple of fixes after the IPI as IRQ
Merge tag 'irqchip-fixes-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
- A couple of fixes after the IPI as IRQ patches (Kconfig, bcm2836) - Two SiFive PLIC fixes (irq_set_affinity, hierarchy handling) - "unmapped events" handling for the ti-sci-inta controller - Tidying up for the irq-mst driver (static functions, Kconfig) - Small cleanup in the Renesas irqpin driver - STM32 exti can now handle LP timer events
show more ...
|
#
4a95857a |
| 29-Oct-2020 |
Zhenyu Wang <zhenyuw@linux.intel.com> |
Merge tag 'drm-intel-fixes-2020-10-29' into gvt-fixes
Backmerge for 5.10-rc1 to apply one extra APL fix.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
|
Revision tags: v5.8.17 |
|
#
f59cddd8 |
| 28-Oct-2020 |
Mark Brown <broonie@kernel.org> |
Merge tag 'v5.10-rc1' into regulator-5.10
Linux 5.10-rc1
|
#
3bfd5f42 |
| 28-Oct-2020 |
Mark Brown <broonie@kernel.org> |
Merge tag 'v5.10-rc1' into spi-5.10
Linux 5.10-rc1
|
#
ce038aea |
| 28-Oct-2020 |
Mark Brown <broonie@kernel.org> |
Merge tag 'v5.10-rc1' into asoc-5.10
Linux 5.10-rc1
|
Revision tags: v5.8.16, v5.8.15, v5.9, v5.8.14 |
|
#
319c1517 |
| 01-Oct-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
epoll: take epitem list out of struct file
Move the head of epitem list out of struct file; for epoll ones it's moved into struct eventpoll (->refs there), for non-epoll - into the new object (struc
epoll: take epitem list out of struct file
Move the head of epitem list out of struct file; for epoll ones it's moved into struct eventpoll (->refs there), for non-epoll - into the new object (struct epitem_head). In place of ->f_ep_links we leave a pointer to the list head (->f_ep).
->f_ep is protected by ->f_lock and it's zeroed as soon as the list of epitems becomes empty (that can happen only in ep_remove() by now).
The list of files for reverse path check is *not* going through struct file now - it's a single-linked list going through epitem_head instances. It's terminated by ERR_PTR(-1) (== EP_UNACTIVE_POINTER), so the elements of list can be distinguished by head->next != NULL.
epitem_head instances are allocated at ep_insert() time (by attach_epitem()) and freed either by ep_remove() (if it empties the set of epitems *and* epitem_head does not belong to the reverse path check list) or by clear_tfile_check_list() when the list is emptied (if the set of epitems is empty by that point). Allocations are done from a separate slab - minimal kmalloc() size is too large on some architectures.
As the result, we trim struct file _and_ get rid of the games with temporary file references.
Locking and barriers are interesting (aren't they always); see unlist_file() and ep_remove() for details. The non-obvious part is that ep_remove() needs to decide if it will be the one to free the damn thing *before* actually storing NULL to head->epitems.first - that's what smp_load_acquire is for in there. unlist_file() lockless path is safe, since we hit it only if we observe NULL in head->epitems.first and whoever had done that store is guaranteed to have observed non-NULL in head->next. IOW, their last access had been the store of NULL into ->epitems.first and we can safely free the sucker. OTOH, we are under rcu_read_lock() and both epitem and epitem->file have their freeing RCU-delayed. So if we see non-NULL ->epitems.first, we can grab ->f_lock (all epitems in there share the same struct file) and safely recheck the emptiness of ->epitems; again, ->next is still non-NULL, so ep_remove() couldn't have freed head yet. ->f_lock serializes us wrt ep_remove(); the rest is trivial.
Note that once head->epitems becomes NULL, nothing can get inserted into it - the only remaining reference to head after that point is from the reverse path check list.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
d9f41e3c |
| 01-Oct-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
epoll: massage the check list insertion
in the "non-epoll target" cases do it in ep_insert() rather than in do_epoll_ctl(), so that we do it only with some epitem is already guaranteed to exist.
Si
epoll: massage the check list insertion
in the "non-epoll target" cases do it in ep_insert() rather than in do_epoll_ctl(), so that we do it only with some epitem is already guaranteed to exist.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
b62d2706 |
| 01-Oct-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
lift rcu_read_lock() into reverse_path_check()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
Revision tags: v5.8.13 |
|
#
44cdc1d9 |
| 27-Sep-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
convert ->f_ep_links/->fllink to hlist
we don't care about the order of elements there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d1ec50ad |
| 27-Sep-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
ep_insert(): move creation of wakeup source past the fl_ep_links insertion
That's the beginning of preparations for taking f_ep_links out of struct file. If insertion might fail, we will need a new
ep_insert(): move creation of wakeup source past the fl_ep_links insertion
That's the beginning of preparations for taking f_ep_links out of struct file. If insertion might fail, we will need a new failure exit. Having wakeup source creation done after that point will simplify life there; ep_remove() can (and commonly does) live with NULL epi->ws, so it can be used for cleanup after ep_create_wakeup_source() failure. It can't be used before the rbtree insertion, though, so if we are to unify all old failure exits, we need to move that thing down. Then we would be free to do simple kmem_cache_free() on the failure to insert into f_ep_links - no wakeup source to leak on that failure exit.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
2c0b71c1 |
| 26-Sep-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
fold ep_read_events_proc() into the only caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ad9366b1 |
| 26-Sep-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
take the common part of ep_eventpoll_poll() and ep_item_poll() into helper
The only reason why ep_item_poll() can't simply call ep_eventpoll_poll() (or, better yet, call vfs_poll() in all cases) is
take the common part of ep_eventpoll_poll() and ep_item_poll() into helper
The only reason why ep_item_poll() can't simply call ep_eventpoll_poll() (or, better yet, call vfs_poll() in all cases) is that we need to tell lockdep how deep into the hierarchy of ->mtx we are. So let's add a variant of ep_eventpoll_poll() that would take depth explicitly and turn ep_eventpoll_poll() into wrapper for that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
85353e91 |
| 26-Sep-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
ep_insert(): we only need tep->mtx around the insertion itself
We do need ep->mtx (and we are holding it all along), but that's the lock on the epoll we are inserting into; locking of the epoll bein
ep_insert(): we only need tep->mtx around the insertion itself
We do need ep->mtx (and we are holding it all along), but that's the lock on the epoll we are inserting into; locking of the epoll being inserted is not needed for most of that work - as the matter of fact, we only need it to provide barriers for the fastpath check (for now).
Move taking and releasing it into ep_insert(). The caller (do_epoll_ctl()) doesn't need to bother with that at all. Moreover, that way we kill the kludge in ep_item_poll() - now it's always called with tep unlocked.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
e3e096e7 |
| 26-Sep-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
ep_insert(): don't open-code ep_remove() on failure exits
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
Revision tags: v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62 |
|
#
57804b1c |
| 31-Aug-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
lift locking/unlocking ep->mtx out of ep_{start,done}_scan()
get rid of depth/ep_locked arguments there and document the kludge in ep_item_poll() that has lead to ep_locked existence in the first pl
lift locking/unlocking ep->mtx out of ep_{start,done}_scan()
get rid of depth/ep_locked arguments there and document the kludge in ep_item_poll() that has lead to ep_locked existence in the first place
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
ff07952a |
| 31-Aug-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
ep_send_events_proc(): fold into the caller
... and get rid of struct ep_send_events_data - not needed anymore. The weird way of passing the arguments in (and real return value out - nominal return
ep_send_events_proc(): fold into the caller
... and get rid of struct ep_send_events_data - not needed anymore. The weird way of passing the arguments in (and real return value out - nominal return value of ep_send_events_proc() is ignored) was due to the signature forced on ep_scan_ready_list() callbacks.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
443f1a04 |
| 31-Aug-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
lift the calls of ep_send_events_proc() into the callers
... and kill ep_scan_ready_list()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|