Revision tags: v6.1.34, v6.1.33 |
|
#
d86eaed1 |
| 07-Jun-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring: cleanup io_aux_cqe() API
Everybody is passing in the request, so get rid of the io_ring_ctx and explicit user_data pass-in. Both the ctx and user_data can be deduced from the request at ha
io_uring: cleanup io_aux_cqe() API
Everybody is passing in the request, so get rid of the io_ring_ctx and explicit user_data pass-in. Both the ctx and user_data can be deduced from the request at hand.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27 |
|
#
6e76ac59 |
| 28-Apr-2023 |
Josh Triplett <josh@joshtriplett.org> |
io_uring: Add io_uring_setup flag to pre-register ring fd and never install it
With IORING_REGISTER_USE_REGISTERED_RING, an application can register the ring fd and use it via registered index rathe
io_uring: Add io_uring_setup flag to pre-register ring fd and never install it
With IORING_REGISTER_USE_REGISTERED_RING, an application can register the ring fd and use it via registered index rather than installed fd. This allows using a registered ring for everything *except* the initial mmap.
With IORING_SETUP_NO_MMAP, io_uring_setup uses buffers allocated by the user, rather than requiring a subsequent mmap.
The combination of the two allows a user to operate *entirely* via a registered ring fd, making it unnecessary to ever install the fd in the first place. So, add a flag IORING_SETUP_REGISTERED_FD_ONLY to make io_uring_setup register the fd and return a registered index, without installing the fd.
This allows an application to avoid touching the fd table at all, and allows a library to never even momentarily install a file descriptor.
This splits out an io_ring_add_registered_file helper from io_ring_add_registered_fd, for use by io_uring_setup.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Link: https://lore.kernel.org/r/bc8f431bada371c183b95a83399628b605e978a3.1682699803.git.josh@joshtriplett.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
96c7d4f8 |
| 04-May-2023 |
Breno Leitao <leitao@debian.org> |
io_uring: Create a helper to return the SQE size
Create a simple helper that returns the size of the SQE. The SQE could have two size, depending of the flags.
If IO_URING_SETUP_SQE128 flag is set,
io_uring: Create a helper to return the SQE size
Create a simple helper that returns the size of the SQE. The SQE could have two size, depending of the flags.
If IO_URING_SETUP_SQE128 flag is set, then return a double SQE, otherwise returns the sizeof of io_uring_sqe (64 bytes).
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20230504121856.904491-2-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.26, v6.3, v6.1.25, v6.1.24 |
|
#
8ce4269e |
| 11-Apr-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: add irq lockdep checks
We don't post CQEs from the IRQ context, add a check catching that.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f23f7a24d
io_uring: add irq lockdep checks
We don't post CQEs from the IRQ context, add a check catching that.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f23f7a24dbe8027b3d37873fece2b6488f878b31.1681210788.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
8751d154 |
| 06-Apr-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: reduce scheduling due to tw
Every task_work will try to wake the task to be executed, which causes excessive scheduling and additional overhead. For some tw it's justified, but others won'
io_uring: reduce scheduling due to tw
Every task_work will try to wake the task to be executed, which causes excessive scheduling and additional overhead. For some tw it's justified, but others won't do much but post a single CQE.
When a task waits for multiple cqes, every such task_work will wake it up. Instead, the task may give a hint about how many cqes it waits for, io_req_local_work_add() will compare against it and skip wake ups if #cqes + #tw is not enough to satisfy the waiting condition. Task_work that uses the optimisation should be simple enough and never post more than one CQE. It's also ignored for non DEFER_TASKRUN rings.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d2b77e99d1e86624d8a69f7037d764b739dcd225.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
8501fe70 |
| 06-Apr-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: add tw add flags
We pass 'allow_local' into io_req_task_work_add() but will need more flags. Replace it with a flags bit field and name this allow_local flag.
Signed-off-by: Pavel Begunko
io_uring: add tw add flags
We pass 'allow_local' into io_req_task_work_add() but will need more flags. Replace it with a flags bit field and name this allow_local flag.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4c0f01e7ef4e6feebfb199093cc995af7a19befa.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
6e7248ad |
| 06-Apr-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: refactor io_cqring_wake()
Instead of smp_mb() + __io_cqring_wake() in __io_cq_unlock_post_flush() use equivalent io_cqring_wake(). With that we can clean it up further and remove __io_cqri
io_uring: refactor io_cqring_wake()
Instead of smp_mb() + __io_cqring_wake() in __io_cq_unlock_post_flush() use equivalent io_cqring_wake(). With that we can clean it up further and remove __io_cqring_wake().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/662ee5d898168ac206be06038525e97b64072a46.1680782017.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.23 |
|
#
e3ef728f |
| 30-Mar-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring: cap io_sqring_entries() at SQ ring size
We already do this manually for the !SQPOLL case, do it in general and we can also dump the ugly min3() in io_submit_sqes().
Signed-off-by: Jens Ax
io_uring: cap io_sqring_entries() at SQ ring size
We already do this manually for the !SQPOLL case, do it in general and we can also dump the ugly min3() in io_submit_sqes().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.22 |
|
#
a282967c |
| 27-Mar-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: encapsulate task_work state
For task works we're passing around a bool pointer for whether the current ring is locked or not, let's wrap it in a structure, that will make it more opaque pr
io_uring: encapsulate task_work state
For task works we're passing around a bool pointer for whether the current ring is locked or not, let's wrap it in a structure, that will make it more opaque preventing abuse and will also help us to pass more info in the future if needed.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11 |
|
#
2f2bb1ff |
| 06-Feb-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring: mark task TASK_RUNNING before handling resume/task work
Just like for task_work, set the task mode to TASK_RUNNING before doing any potential resume work. We're not holding any locks at th
io_uring: mark task TASK_RUNNING before handling resume/task work
Just like for task_work, set the task mode to TASK_RUNNING before doing any potential resume work. We're not holding any locks at this point, but we may have already set the task state to TASK_INTERRUPTIBLE in preparation for going to sleep waiting for events. Ensure that we set it back to TASK_RUNNING if we have work to process, to avoid warnings on calling blocking operations with !TASK_RUNNING.
Fixes: b5d3ae202fbf ("io_uring: handle TIF_NOTIFY_RESUME when checking for task_work") Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202302062208.24d3e563-oliver.sang@intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.10, v6.1.9, v6.1.8 |
|
#
c8576f3e |
| 23-Jan-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: refactor req allocation
Follow the io_get_sqe pattern returning the result via a pointer and hide request cache refill inside io_alloc_req().
Signed-off-by: Pavel Begunkov <asml.silence@g
io_uring: refactor req allocation
Follow the io_get_sqe pattern returning the result via a pointer and hide request cache refill inside io_alloc_req().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8c37c2e8a3cb5e4cd6a8ae3b91371227a92708a6.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
c1755c25 |
| 18-Jan-2023 |
Breno Leitao <leitao@debian.org> |
io_uring: Enable KASAN for request cache
Every io_uring request is represented by struct io_kiocb, which is cached locally by io_uring (not SLAB/SLUB) in the list called submit_state.freelist. This
io_uring: Enable KASAN for request cache
Every io_uring request is represented by struct io_kiocb, which is cached locally by io_uring (not SLAB/SLUB) in the list called submit_state.freelist. This patch simply enabled KASAN for this free list.
This list is initially created by KMEM_CACHE, but later, managed by io_uring. This patch basically poisons the objects that are not used (i.e., they are the free list), and unpoisons it when the object is allocated/removed from the list.
Touching these poisoned objects while in the freelist will cause a KASAN warning.
Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
b5d3ae20 |
| 24-Jan-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring: handle TIF_NOTIFY_RESUME when checking for task_work
If TIF_NOTIFY_RESUME is set, then we need to call resume_user_mode_work() for PF_IO_WORKER threads. They never return to usermode, henc
io_uring: handle TIF_NOTIFY_RESUME when checking for task_work
If TIF_NOTIFY_RESUME is set, then we need to call resume_user_mode_work() for PF_IO_WORKER threads. They never return to usermode, hence never get a chance to process any items that are marked by this flag. Most notably this includes the final put of files, but also any throttling markers set by block cgroups.
Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.7 |
|
#
89800a2d |
| 16-Jan-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: don't export io_put_task()
io_put_task() is only used in uring.c so enclose it there together with __io_put_task().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lo
io_uring: don't export io_put_task()
io_put_task() is only used in uring.c so enclose it there together with __io_put_task().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/43c7f9227e2ab215f1a6069dadbc5382bed346fe.1673887636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.6, v6.1.5, v6.0.19 |
|
#
bca39f39 |
| 09-Jan-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: add lazy poll_wq activation
Even though io_poll_wq_wake()'s waitqueue_active reuses a barrier we do for another waitqueue, it's not going to be the case in the future and so we want to hav
io_uring: add lazy poll_wq activation
Even though io_poll_wq_wake()'s waitqueue_active reuses a barrier we do for another waitqueue, it's not going to be the case in the future and so we want to have a fast path for it when the ring has never been polled.
Move poll_wq wake ups into __io_commit_cqring_flush() using a new flag called ->poll_activated. The idea behind the flag is to set it when the ring was polled for the first time. This requires additional sync to not miss events, which is done here by using task_work for ->task_complete rings, and by default enabling the flag for all other types of rings.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/060785e8e9137a920b232c0c7f575b131af19cac.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
7b235dd8 |
| 09-Jan-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: separate wq for ring polling
Don't use ->cq_wait for ring polling but add a separate wait queue for it. We need it for following patches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail
io_uring: separate wq for ring polling
Don't use ->cq_wait for ring polling but add a separate wait queue for it. We need it for following patches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dea0be0bf990503443c5c6c337fc66824af7d590.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
360173ab |
| 09-Jan-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: move io_run_local_work_locked
io_run_local_work_locked() is only used in io_uring.c, move it there. With that we can also make __io_run_local_work() static.
Signed-off-by: Pavel Begunkov
io_uring: move io_run_local_work_locked
io_run_local_work_locked() is only used in io_uring.c, move it there. With that we can also make __io_run_local_work() static.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/91757bcb33e5774e49fed6f2b6e058630608119b.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
3e565555 |
| 09-Jan-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: mark io_run_local_work static
io_run_local_work is enclosed in io_uring.c, we don't need to export it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org
io_uring: mark io_run_local_work static
io_run_local_work is enclosed in io_uring.c, we don't need to export it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b477fb81f5e77044f724a06fe245d5c078659364.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.0.18, v6.1.4 |
|
#
140102ae |
| 05-Jan-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: move defer tw task checks
Most places that want to run local tw explicitly and in advance check if they are allowed to do so. Don't rely on a similar check in __io_run_local_work(), leave
io_uring: move defer tw task checks
Most places that want to run local tw explicitly and in advance check if they are allowed to do so. Don't rely on a similar check in __io_run_local_work(), leave it as a just-in-case warning and make sure callers checks capabilities themselves.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/990fe0e8e70fd4d57e43625e5ce8fba584821d1a.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
1414d629 |
| 05-Jan-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: kill io_run_task_work_ctx
There is only one user of io_run_task_work_ctx(), inline it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/40953c65f7c88
io_uring: kill io_run_task_work_ctx
There is only one user of io_run_task_work_ctx(), inline it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/40953c65f7c88fb00cdc4d870ca5d5319fb3d7ea.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0c4fe008 |
| 05-Jan-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: rearrange defer list checks
There should be nothing in the ->work_llist for non DEFER_TASKRUN rings, so we can skip flag checks and test the list emptiness directly. Also move it out of io
io_uring: rearrange defer list checks
There should be nothing in the ->work_llist for non DEFER_TASKRUN rings, so we can skip flag checks and test the list emptiness directly. Also move it out of io_run_local_work() for inlining.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/331d63fd15ca79b35b95c82a82d9246110686392.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.3, v6.0.17 |
|
#
f26cc959 |
| 03-Jan-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: lockdep annotate CQ locking
Locking around CQE posting is complex and depends on options the ring is created with, add more thorough lockdep annotations checking all invariants.
Signed-of
io_uring: lockdep annotate CQ locking
Locking around CQE posting is complex and depends on options the ring is created with, add more thorough lockdep annotations checking all invariants.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/aa3770b4eacae3915d782cc2ab2f395a99b4b232.1672795976.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14 |
|
#
6434ec01 |
| 17-Dec-2022 |
Jens Axboe <axboe@kernel.dk> |
io_uring: don't use TIF_NOTIFY_SIGNAL to test for availability of task_work
Use task_work_pending() as a better test for whether we have task_work or not, TIF_NOTIFY_SIGNAL is only valid if the any
io_uring: don't use TIF_NOTIFY_SIGNAL to test for availability of task_work
Use task_work_pending() as a better test for whether we have task_work or not, TIF_NOTIFY_SIGNAL is only valid if the any of the task_work items had been queued with TWA_SIGNAL as the notification mechanism. Hence task_work_pending() is a more reliable check.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.0.13, v6.1, v6.0.12 |
|
#
6971253f |
| 02-Dec-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: revise completion_lock locking
io_kill_timeouts() doesn't post any events but queues everything to task_work. Locking there is needed for protecting linked requests traversing, we should g
io_uring: revise completion_lock locking
io_kill_timeouts() doesn't post any events but queues everything to task_work. Locking there is needed for protecting linked requests traversing, we should grab completion_lock directly instead of using io_cq_[un]lock helpers. Same goes for __io_req_find_next_prep().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/88e75d481a65dc295cb59722bb1cf76402d1c06b.1670002973.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
f66f7342 |
| 07-Dec-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: skip spinlocking for ->task_complete
->task_complete was added to serialised CQE posting by doing it from the task context only (or fallback wq when the task is dead), and now we can use t
io_uring: skip spinlocking for ->task_complete
->task_complete was added to serialised CQE posting by doing it from the task context only (or fallback wq when the task is dead), and now we can use that to avoid taking ->completion_lock while filling CQ entries. The patch skips spinlocking only in two spots, __io_submit_flush_completions() and flushing in io_aux_cqe, it's safer and covers all cases we care about. Extra care is taken to force taking the lock while queueing overflow entries.
It fundamentally relies on SINGLE_ISSUER to have only one task posting events. It also need to take into account overflowed CQEs, flushing of which happens in the cq wait path, and so this implementation also needs DEFER_TASKRUN to limit waiters. For the same reason we disable it for SQPOLL, and for IOPOLL as it won't benefit from it in any case. DEFER_TASKRUN, SQPOLL and IOPOLL requirement may be relaxed in the future.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2a8c91fd82cfcdcc1d2e5bac7051fe2c183bda73.1670384893.git.asml.silence@gmail.com [axboe: modify to apply] Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|