Revision tags: v6.0.11 |
|
#
618d653a |
| 30-Nov-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: don't raw spin unlock to match cq_lock
There is one newly added place when we lock ring with io_cq_lock() but unlocking is hand coded calling spin_unlock directly. It's ugly and troublesom
io_uring: don't raw spin unlock to match cq_lock
There is one newly added place when we lock ring with io_cq_lock() but unlocking is hand coded calling spin_unlock directly. It's ugly and troublesome in the long run. Make it consistent with the other completion locking.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4ca4f0564492b90214a190cd5b2a6c76522de138.1669821213.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.0.10, v5.15.80 |
|
#
7cfe7a09 |
| 25-Nov-2022 |
Jens Axboe <axboe@kernel.dk> |
io_uring: clear TIF_NOTIFY_SIGNAL if set and task_work not available
With how task_work is added and signaled, we can have TIF_NOTIFY_SIGNAL set and no task_work pending as it got run in a previous
io_uring: clear TIF_NOTIFY_SIGNAL if set and task_work not available
With how task_work is added and signaled, we can have TIF_NOTIFY_SIGNAL set and no task_work pending as it got run in a previous loop. Treat TIF_NOTIFY_SIGNAL like get_signal(), always clear it if set regardless of whether or not task_work is pending to run.
Cc: stable@vger.kernel.org Fixes: 46a525e199e4 ("io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL") Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
27f35fe9 |
| 25-Nov-2022 |
Dylan Yudaken <dylany@meta.com> |
io_uring: remove io_req_complete_post_tw
It's only used in one place. Inline it.
Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221125103412.1425305-2-dylany@meta.
io_uring: remove io_req_complete_post_tw
It's only used in one place. Inline it.
Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221125103412.1425305-2-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
b529c96a |
| 24-Nov-2022 |
Dylan Yudaken <dylany@meta.com> |
io_uring: remove overflow param from io_post_aux_cqe
The only call sites which would not allow overflow are also call sites which would use the io_aux_cqe as they care about ordering.
So remove thi
io_uring: remove overflow param from io_post_aux_cqe
The only call sites which would not allow overflow are also call sites which would use the io_aux_cqe as they care about ordering.
So remove this parameter from io_post_aux_cqe.
Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-9-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
a77ab745 |
| 24-Nov-2022 |
Dylan Yudaken <dylany@meta.com> |
io_uring: make io_fill_cqe_aux static
This is only used in io_uring.c
Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-7-dylany@meta.com Signed-
io_uring: make io_fill_cqe_aux static
This is only used in io_uring.c
Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-7-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
9b8c5475 |
| 24-Nov-2022 |
Dylan Yudaken <dylany@meta.com> |
io_uring: add io_aux_cqe which allows deferred completion
Use the just introduced deferred post cqe completion state when possible in io_aux_cqe. If not possible fallback to io_post_aux_cqe.
This i
io_uring: add io_aux_cqe which allows deferred completion
Use the just introduced deferred post cqe completion state when possible in io_aux_cqe. If not possible fallback to io_post_aux_cqe.
This introduces a complication because of allow_overflow. For deferred completions we cannot know without locking the completion_lock if it will overflow (and even if we locked it, another post could sneak in and cause this cqe to be in overflow). However since overflow protection is mostly a best effort defence in depth to prevent infinite loops of CQEs for poll, just checking the overflow bit is going to be good enough and will result in at most 16 (array size of deferred cqes) overflows.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-6-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
973fc83f |
| 24-Nov-2022 |
Dylan Yudaken <dylany@meta.com> |
io_uring: defer all io_req_complete_failed
All failures happen under lock now, and can be deferred. To be consistent when the failure has happened after some multishot cqe has been deferred (and kee
io_uring: defer all io_req_complete_failed
All failures happen under lock now, and can be deferred. To be consistent when the failure has happened after some multishot cqe has been deferred (and keep ordering), always defer failures.
To make this obvious at the caller (and to help prevent a future bug) rename io_req_complete_failed to io_req_defer_failed.
Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221124093559.3780686-4-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
1bec951c |
| 23-Nov-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: iopoll protect complete_post
io_req_complete_post() may be used by iopoll enabled rings, grab locks in this case. That requires to pass issue_flags to propagate the locking state.
Signed-
io_uring: iopoll protect complete_post
io_req_complete_post() may be used by iopoll enabled rings, grab locks in this case. That requires to pass issue_flags to propagate the locking state.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cc6d854065c57c838ca8e8806f707a226b70fd2d.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
833b5dff |
| 23-Nov-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: remove io_req_tw_post_queue
Remove io_req_tw_post() and io_req_tw_post_queue(), we can use io_req_task_complete() instead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: htt
io_uring: remove io_req_tw_post_queue
Remove io_req_tw_post() and io_req_tw_post_queue(), we can use io_req_task_complete() instead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b9b73c08022c7f1457023ac841f35c0100e70345.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
44648532 |
| 20-Nov-2022 |
Jens Axboe <axboe@kernel.dk> |
io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups
Pass in EPOLL_URING_WAKE when signaling eventfd or doing poll related wakups, so that we can check for a circular event dependenc
io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups
Pass in EPOLL_URING_WAKE when signaling eventfd or doing poll related wakups, so that we can check for a circular event dependency between eventfd and epoll. If this flag is set when our wakeup handlers are called, then we know we have a dependency that needs to terminate multishot requests.
eventfd and epoll are the only such possible dependencies.
Cc: stable@vger.kernel.org # 6.0 Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
f9d567c7 |
| 17-Nov-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: inline __io_req_complete_post()
There is only one user of __io_req_complete_post(), inline it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ef4c9
io_uring: inline __io_req_complete_post()
There is only one user of __io_req_complete_post(), inline it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ef4c9059950a3da5cf68df00f977f1fd13bd9306.1668597569.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.0.9, v5.15.79 |
|
#
e52d2e58 |
| 11-Nov-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: inline io_req_task_work_add()
__io_req_task_work_add() is huge but marked inline, that makes compilers to generate lots of garbage. Inline the wrapper caller io_req_task_work_add() instead
io_uring: inline io_req_task_work_add()
__io_req_task_work_add() is huge but marked inline, that makes compilers to generate lots of garbage. Inline the wrapper caller io_req_task_work_add() instead.
before and after: text data bss dec hex filename 47347 16248 8 63603 f873 io_uring/io_uring.o text data bss dec hex filename 45303 16248 8 61559 f077 io_uring/io_uring.o
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/26dc8c28ca0160e3269ef3e55c5a8b917c4d4450.1668162751.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
91482864 |
| 17-Nov-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: fix multishot accept request leaks
Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's
io_uring: fix multishot accept request leaks
Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues.
Cc: stable@vger.kernel.org Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6 |
|
#
b3026767 |
| 27-Oct-2022 |
Dylan Yudaken <dylany@meta.com> |
io_uring: unlock if __io_run_local_work locked inside
It is possible for tw to lock the ring, and this was not propogated out to io_run_local_work. This can cause an unlock to be missed.
Instead pa
io_uring: unlock if __io_run_local_work locked inside
It is possible for tw to lock the ring, and this was not propogated out to io_run_local_work. This can cause an unlock to be missed.
Instead pass a pointer to locked into __io_run_local_work.
Fixes: 8ac5d85a89b4 ("io_uring: add local task_work run helper that is entered locked") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221027144429.3971400-3-dylany@meta.com [axboe: WARN_ON() -> WARN_ON_ONCE() and add a minor comment] Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1 |
|
#
44f87745 |
| 06-Oct-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: optimise locking for local tw with submit_wait
Running local task_work requires taking uring_lock, for submit + wait we can try to run them right after submit while we still hold the lock
io_uring: optimise locking for local tw with submit_wait
Running local task_work requires taking uring_lock, for submit + wait we can try to run them right after submit while we still hold the lock and save one lock/unlokc pair. The optimisation was implemented in the first local tw patches but got dropped for simplicity.
Suggested-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/281fc79d98b5d91fe4778c5137a17a2ab4693e5c.1665088876.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
fc86f9d3 |
| 05-Oct-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: remove redundant memory barrier in io_req_local_work_add
io_cqring_wake() needs a barrier for the waitqueue_active() check. However, in the case of io_req_local_work_add(), we call llist_a
io_uring: remove redundant memory barrier in io_req_local_work_add
io_cqring_wake() needs a barrier for the waitqueue_active() check. However, in the case of io_req_local_work_add(), we call llist_add() first, which implies an atomic. Hence we can replace smb_mb() with smp_mb__after_atomic().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/43983bc8bc507172adda7a0f00cab1aff09fd238.1665018309.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.72, v6.0 |
|
#
46a525e1 |
| 29-Sep-2022 |
Jens Axboe <axboe@kernel.dk> |
io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL
This isn't a reliable mechanism to tell if we have task_work pending, we really should be looking at whether we have any items queued. This is
io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL
This isn't a reliable mechanism to tell if we have task_work pending, we really should be looking at whether we have any items queued. This is problematic if forward progress is gated on running said task_work. One such example is reading from a pipe, where the write side has been closed right before the read is started. The fput() of the file queues TWA_RESUME task_work, and we need that task_work to be run before ->release() is called for the pipe. If ->release() isn't called, then the read will sit forever waiting on data that will never arise.
Fix this by io_run_task_work() so it checks if we have task_work pending rather than rely on TIF_NOTIFY_SIGNAL for that. The latter obviously doesn't work for task_work that is queued without TWA_SIGNAL.
Reported-by: Christiano Haesbaert <haesbaert@haesbaert.org> Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/issues/665 Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.71 |
|
#
aa1df3a3 |
| 23-Sep-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: fix CQE reordering
Overflowing CQEs may result in reordering, which is buggy in case of links, F_MORE and so on. If we guarantee that we don't reorder for the unlikely event of a CQ ring o
io_uring: fix CQE reordering
Overflowing CQEs may result in reordering, which is buggy in case of links, F_MORE and so on. If we guarantee that we don't reorder for the unlikely event of a CQ ring overflow, then we can further extend this to not have to terminate multishot requests if it happens. For other operations, like zerocopy sends, we have no choice but to honor CQE ordering.
Reported-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ec3bc55687b0768bbe20fb62d7d06cfced7d7e70.1663892031.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.70, v5.15.69, v5.15.68 |
|
#
6567506b |
| 08-Sep-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: disallow defer-tw run w/ no submitters
We try to restrict CQ waiters when IORING_SETUP_DEFER_TASKRUN is set, but if nothing has been submitted yet it'll allow any waiter, which violates th
io_uring: disallow defer-tw run w/ no submitters
We try to restrict CQ waiters when IORING_SETUP_DEFER_TASKRUN is set, but if nothing has been submitted yet it'll allow any waiter, which violates the contract.
Fixes: c0e0d6ba25f1 ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/b4f0d3f14236d7059d08c5abe2661ef0b78b5528.1662652536.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
76de6749 |
| 08-Sep-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: further limit non-owner defer-tw cq waiting
In case of DEFER_TASK_WORK we try to restrict waiters to only one task, which is also the only submitter; however, we don't do it reliably, whic
io_uring: further limit non-owner defer-tw cq waiting
In case of DEFER_TASK_WORK we try to restrict waiters to only one task, which is also the only submitter; however, we don't do it reliably, which might be very confusing and backfire in the future. E.g. we currently allow multiple tasks in io_iopoll_check().
Fixes: c0e0d6ba25f1 ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/94c83c0a7fe468260ee2ec31bdb0095d6e874ba2.1662652536.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.67, v5.15.66, v5.15.65 |
|
#
dac6a0ea |
| 03-Sep-2022 |
Jens Axboe <axboe@kernel.dk> |
io_uring: ensure iopoll runs local task work as well
Combine the two checks we have for task_work running and whether or not we need to shuffle the mutex into one, so we unify how task_work is run i
io_uring: ensure iopoll runs local task work as well
Combine the two checks we have for task_work running and whether or not we need to shuffle the mutex into one, so we unify how task_work is run in the iopoll loop. This helps ensure that local task_work is run when needed, and also optimizes that path to avoid a mutex shuffle if it's not needed.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
8ac5d85a |
| 03-Sep-2022 |
Jens Axboe <axboe@kernel.dk> |
io_uring: add local task_work run helper that is entered locked
We have a few spots that drop the mutex just to run local task_work, which immediately tries to grab it again. Add a helper that just
io_uring: add local task_work run helper that is entered locked
We have a few spots that drop the mutex just to run local task_work, which immediately tries to grab it again. Add a helper that just passes in whether we're locked already.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.64 |
|
#
c0e0d6ba |
| 30-Aug-2022 |
Dylan Yudaken <dylany@fb.com> |
io_uring: add IORING_SETUP_DEFER_TASKRUN
Allow deferring async tasks until the user calls io_uring_enter(2) with the IORING_ENTER_GETEVENTS flag. Enable this mode with a flag at io_uring_setup time.
io_uring: add IORING_SETUP_DEFER_TASKRUN
Allow deferring async tasks until the user calls io_uring_enter(2) with the IORING_ENTER_GETEVENTS flag. Enable this mode with a flag at io_uring_setup time. This functionality requires that the later io_uring_enter will be called from the same submission task, and therefore restrict this flag to work only when IORING_SETUP_SINGLE_ISSUER is also set.
Being able to hand pick when tasks are run prevents the problem where there is current work to be done, however task work runs anyway.
For example, a common workload would obtain a batch of CQEs, and process each one. Interrupting this to additional taskwork would add latency but not gain anything. If instead task work is deferred to just before more CQEs are obtained then no additional latency is added.
The way this is implemented is by trying to keep task work local to a io_ring_ctx, rather than to the submission task. This is required, as the application will want to wake up only a single io_ring_ctx at a time to process work, and so the lists of work have to be kept separate.
This has some other benefits like not having to check the task continually in handle_tw_list (and potentially unlocking/locking those), and reducing locks in the submit & process completions path.
There are networking cases where using this option can reduce request latency by 50%. For example a contrived example using [1] where the client sends 2k data and receives the same data back while doing some system calls (to trigger task work) shows this reduction. The reason ends up being that if sending responses is delayed by processing task work, then the client side sits idle. Whereas reordering the sends first means that the client runs it's workload in parallel with the local task work.
[1]: Using https://github.com/DylanZA/netbench/tree/defer_run Client: ./netbench --client_only 1 --control_port 10000 --host <host> --tx "epoll --threads 16 --per_thread 1 --size 2048 --resp 2048 --workload 1000" Server: ./netbench --server_only 1 --control_port 10000 --rx "io_uring --defer_taskrun 0 --workload 100" --rx "io_uring --defer_taskrun 1 --workload 100"
Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-5-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58 |
|
#
bd1a3783 |
| 27-Jul-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: export req alloc from core
We want to do request allocation out of the core io_uring code, make the allocation functions public for other io_uring parts.
Signed-off-by: Pavel Begunkov <as
io_uring: export req alloc from core
We want to do request allocation out of the core io_uring code, make the allocation functions public for other io_uring parts.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0314fedd3a02a514210ba42d4720332538c65956.1658913593.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.57, v5.15.56, v5.15.55 |
|
#
63809137 |
| 12-Jul-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: flush notifiers after sendzc
Allow to flush notifiers as a part of sendzc request by setting IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will flush the used [active] not
io_uring: flush notifiers after sendzc
Allow to flush notifiers as a part of sendzc request by setting IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will flush the used [active] notifier.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e0b4d9a6797e2fd6092824fe42953db7a519bbc8.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|