78aefac7 | 18-Jul-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: fix error pbuf checking
Commit bcc87d978b834c298bbdd9c52454c5d0a946e97e upstream.
Syz reports a problem, which boils down to NULL vs IS_ERR inconsistent error handling in io_alloc_pbuf_ri
io_uring: fix error pbuf checking
Commit bcc87d978b834c298bbdd9c52454c5d0a946e97e upstream.
Syz reports a problem, which boils down to NULL vs IS_ERR inconsistent error handling in io_alloc_pbuf_ring().
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:__io_remove_buffers+0xac/0x700 io_uring/kbuf.c:341 Call Trace: <TASK> io_put_bl io_uring/kbuf.c:378 [inline] io_destroy_buffers+0x14e/0x490 io_uring/kbuf.c:392 io_ring_ctx_free+0xa00/0x1070 io_uring/io_uring.c:2613 io_ring_exit_work+0x80f/0x8a0 io_uring/io_uring.c:2844 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312 worker_thread+0x86d/0xd40 kernel/workqueue.c:3390 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Cc: stable@vger.kernel.org Reported-by: syzbot+2074b1a3d447915c6f1c@syzkaller.appspotmail.com Fixes: 87585b05757dc ("io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c5f9df20560bd9830401e8e48abc029e7cfd9f5e.1721329239.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
1fdb9c9e | 13-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring: use unpin_user_pages() where appropriate
Commit 18595c0a58ae29ac6a996c5b664610119b73182d upstream.
There are a few cases of open-rolled loops around unpin_user_page(), use the generic hel
io_uring: use unpin_user_pages() where appropriate
Commit 18595c0a58ae29ac6a996c5b664610119b73182d upstream.
There are a few cases of open-rolled loops around unpin_user_page(), use the generic helper instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
46b1b3d8 | 12-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring
Commit 87585b05757dc70545efb434669708d276125559 upstream.
Rather than use remap_pfn_range() for this and manually free later, switch to us
io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring
Commit 87585b05757dc70545efb434669708d276125559 upstream.
Rather than use remap_pfn_range() for this and manually free later, switch to using vm_insert_page() and have it Just Work.
This requires a bit of effort on the mmap lookup side, as the ctx uring_lock isn't held, which otherwise protects buffer_lists from being torn down, and it's not safe to grab from mmap context that would introduce an ABBA deadlock between the mmap lock and the ctx uring_lock. Instead, lookup the buffer_list under RCU, as the the list is RCU freed already. Use the existing reference count to determine whether it's possible to safely grab a reference to it (eg if it's not zero already), and drop that reference when done with the mapping. If the mmap reference is the last one, the buffer_list and the associated memory can go away, since the vma insertion has references to the inserted pages at that point.
Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
af8f27ef | 12-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/kbuf: vmap pinned buffer ring
Commit e270bfd22a2a10d1cfbaddf23e79b6d0b405d21e upstream.
This avoids needing to care about HIGHMEM, and it makes the buffer indexing easier as both ring prov
io_uring/kbuf: vmap pinned buffer ring
Commit e270bfd22a2a10d1cfbaddf23e79b6d0b405d21e upstream.
This avoids needing to care about HIGHMEM, and it makes the buffer indexing easier as both ring provided buffer methods are now virtually mapped in a contigious fashion.
Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
6168ec87 | 13-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring: unify io_pin_pages()
Commit 1943f96b3816e0f0d3d6686374d6e1d617c8b42c upstream.
Move it into io_uring.c where it belongs, and use it in there as well rather than have two implementations o
io_uring: unify io_pin_pages()
Commit 1943f96b3816e0f0d3d6686374d6e1d617c8b42c upstream.
Move it into io_uring.c where it belongs, and use it in there as well rather than have two implementations of this.
Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
719e745e | 13-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring: use vmap() for ring mapping
Commit 09fc75e0c035a2cabb8caa15cec6e85159dd94f0 upstream.
This is the last holdout which does odd page checking, convert it to vmap just like what is done for
io_uring: use vmap() for ring mapping
Commit 09fc75e0c035a2cabb8caa15cec6e85159dd94f0 upstream.
This is the last holdout which does odd page checking, convert it to vmap just like what is done for the non-mmap path.
Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
b89f95b9 | 25-Nov-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: fix corner case forgetting to vunmap
Commit 43eef70e7e2ac74e7767731dd806720c7fb5e010 upstream.
io_pages_unmap() is a bit tricky in trying to figure whether the pages were previously vmap'
io_uring: fix corner case forgetting to vunmap
Commit 43eef70e7e2ac74e7767731dd806720c7fb5e010 upstream.
io_pages_unmap() is a bit tricky in trying to figure whether the pages were previously vmap'ed or not. In particular If there is juts one page it belives there is no need to vunmap. Paired io_pages_map(), however, could've failed io_mem_alloc_compound() and attempted to io_mem_alloc_single(), which does vmap, and that leads to unpaired vmap.
The solution is to fail if io_mem_alloc_compound() can't allocate a single page. That's the easiest way to deal with it, and those two functions are getting removed soon, so no need to overcomplicate it.
Cc: stable@vger.kernel.org Fixes: 3ab1db3c6039e ("io_uring: get rid of remap_pfn_range() for mapping rings/sqes") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/477e75a3907a2fe83249e49c0a92cd480b2c60e0.1732569842.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
a0b21f2a | 29-May-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring: don't attempt to mmap larger than what the user asks for
Commit 06fe9b1df1086b42718d632aa57e8f7cd1a66a21 upstream.
If IORING_FEAT_SINGLE_MMAP is ignored, as can happen if an application u
io_uring: don't attempt to mmap larger than what the user asks for
Commit 06fe9b1df1086b42718d632aa57e8f7cd1a66a21 upstream.
If IORING_FEAT_SINGLE_MMAP is ignored, as can happen if an application uses an ancient liburing or does setup manually, then 3 mmap's are required to map the ring into userspace. The kernel will still have collapsed the mappings, however userspace may ask for mapping them individually. If so, then we should not use the full number of ring pages, as it may exceed the partial mapping. Doing so will yield an -EFAULT from vm_insert_pages(), as we pass in more pages than what the application asked for.
Cap the number of pages to match what the application asked for, for the particular mapping operation.
Reported-by: Lucas Mülling <lmulling@proton.me> Link: https://github.com/axboe/liburing/issues/1157 Fixes: 3ab1db3c6039 ("io_uring: get rid of remap_pfn_range() for mapping rings/sqes") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
2905c4fe | 13-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring: get rid of remap_pfn_range() for mapping rings/sqes
Commit 3ab1db3c6039e02a9deb9d5091d28d559917a645 upstream.
Rather than use remap_pfn_range() for this and manually free later, switch to
io_uring: get rid of remap_pfn_range() for mapping rings/sqes
Commit 3ab1db3c6039e02a9deb9d5091d28d559917a645 upstream.
Rather than use remap_pfn_range() for this and manually free later, switch to using vm_insert_pages() and have it Just Work.
If possible, allocate a single compound page that covers the range that is needed. If that works, then we can just use page_address() on that page. If we fail to get a compound page, allocate single pages and use vmap() to map them into the kernel virtual address space.
This just covers the rings/sqes, the other remaining user of the mmap remap_pfn_range() user will be converted separately. Once that is done, we can kill the old alloc/free code.
Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
570f4d6e | 08-Feb-2025 |
Uday Shankar <ushankar@purestorage.com> |
io-wq: backoff when retrying worker creation
[ Upstream commit 13918315c5dc5a515926c8799042ea6885c2b734 ]
When io_uring submission goes async for the first time on a given task, we'll try to create
io-wq: backoff when retrying worker creation
[ Upstream commit 13918315c5dc5a515926c8799042ea6885c2b734 ]
When io_uring submission goes async for the first time on a given task, we'll try to create a worker thread to handle the submission. Creating this worker thread can fail due to various transient conditions, such as an outstanding signal in the forking thread, so we have retry logic with a limit of 3 retries. However, this retry logic appears to be too aggressive/fast - we've observed a thread blowing through the retry limit while having the same outstanding signal the whole time. Here's an excerpt of some tracing that demonstrates the issue:
First, signal 26 is generated for the process. It ends up getting routed to thread 92942.
0) cbd-92284 /* signal_generate: sig=26 errno=0 code=-2 comm=psblkdASD pid=92934 grp=1 res=0 */
This causes create_io_thread in the signalled thread to fail with ERESTARTNOINTR, and thus a retry is queued.
13) task_th-92942 /* io_uring_queue_async_work: ring 000000007325c9ae, request 0000000080c96d8e, user_data 0x0, opcode URING_CMD, flags 0x8240001, normal queue, work 000000006e96dd3f */ 13) task_th-92942 io_wq_enqueue() { 13) task_th-92942 _raw_spin_lock(); 13) task_th-92942 io_wq_activate_free_worker(); 13) task_th-92942 _raw_spin_lock(); 13) task_th-92942 create_io_worker() { 13) task_th-92942 __kmalloc_cache_noprof(); 13) task_th-92942 __init_swait_queue_head(); 13) task_th-92942 kprobe_ftrace_handler() { 13) task_th-92942 get_kprobe(); 13) task_th-92942 aggr_pre_handler() { 13) task_th-92942 pre_handler_kretprobe(); 13) task_th-92942 /* create_enter: (create_io_thread+0x0/0x50) fn=0xffffffff8172c0e0 arg=0xffff888996bb69c0 node=-1 */ 13) task_th-92942 } /* aggr_pre_handler */ ... 13) task_th-92942 } /* copy_process */ 13) task_th-92942 } /* create_io_thread */ 13) task_th-92942 kretprobe_rethook_handler() { 13) task_th-92942 /* create_exit: (create_io_worker+0x8a/0x1a0 <- create_io_thread) arg1=0xfffffffffffffdff */ 13) task_th-92942 } /* kretprobe_rethook_handler */ 13) task_th-92942 queue_work_on() { ...
The CPU is then handed to a kworker to process the queued retry:
------------------------------------------ 13) task_th-92942 => kworker-54154 ------------------------------------------ 13) kworker-54154 io_workqueue_create() { 13) kworker-54154 io_queue_worker_create() { 13) kworker-54154 task_work_add() { 13) kworker-54154 wake_up_state() { 13) kworker-54154 try_to_wake_up() { 13) kworker-54154 _raw_spin_lock_irqsave(); 13) kworker-54154 _raw_spin_unlock_irqrestore(); 13) kworker-54154 } /* try_to_wake_up */ 13) kworker-54154 } /* wake_up_state */ 13) kworker-54154 kick_process(); 13) kworker-54154 } /* task_work_add */ 13) kworker-54154 } /* io_queue_worker_create */ 13) kworker-54154 } /* io_workqueue_create */
And then we immediately switch back to the original task to try creating a worker again. This fails, because the original task still hasn't handled its signal.
----------------------------------------- 13) kworker-54154 => task_th-92942 ------------------------------------------ 13) task_th-92942 create_worker_cont() { 13) task_th-92942 kprobe_ftrace_handler() { 13) task_th-92942 get_kprobe(); 13) task_th-92942 aggr_pre_handler() { 13) task_th-92942 pre_handler_kretprobe(); 13) task_th-92942 /* create_enter: (create_io_thread+0x0/0x50) fn=0xffffffff8172c0e0 arg=0xffff888996bb69c0 node=-1 */ 13) task_th-92942 } /* aggr_pre_handler */ 13) task_th-92942 } /* kprobe_ftrace_handler */ 13) task_th-92942 create_io_thread() { 13) task_th-92942 copy_process() { 13) task_th-92942 task_active_pid_ns(); 13) task_th-92942 _raw_spin_lock_irq(); 13) task_th-92942 recalc_sigpending(); 13) task_th-92942 _raw_spin_lock_irq(); 13) task_th-92942 } /* copy_process */ 13) task_th-92942 } /* create_io_thread */ 13) task_th-92942 kretprobe_rethook_handler() { 13) task_th-92942 /* create_exit: (create_worker_cont+0x35/0x1b0 <- create_io_thread) arg1=0xfffffffffffffdff */ 13) task_th-92942 } /* kretprobe_rethook_handler */ 13) task_th-92942 io_worker_release(); 13) task_th-92942 queue_work_on() { 13) task_th-92942 clear_pending_if_disabled(); 13) task_th-92942 __queue_work() { 13) task_th-92942 } /* __queue_work */ 13) task_th-92942 } /* queue_work_on */ 13) task_th-92942 } /* create_worker_cont */
The pattern repeats another couple times until we blow through the retry counter, at which point we give up. All outstanding work is canceled, and the io_uring command which triggered all this is failed with ECANCELED:
13) task_th-92942 io_acct_cancel_pending_work() { ... 13) task_th-92942 /* io_uring_complete: ring 000000007325c9ae, req 0000000080c96d8e, user_data 0x0, result -125, cflags 0x0 extra1 0 extra2 0 */
Finally, the task gets around to processing its outstanding signal 26, but it's too late.
13) task_th-92942 /* signal_deliver: sig=26 errno=0 code=-2 sa_handler=59566a0 sa_flags=14000000 */
Try to address this issue by adding a small scaling delay when retrying worker creation. This should give the forking thread time to handle its signal in the above case. This isn't a particularly satisfying solution, as sufficiently paradoxical scheduling would still have us hitting the same issue, and I'm open to suggestions for something better. But this is likely to prevent this (already rare) issue from hitting in practice.
Signed-off-by: Uday Shankar <ushankar@purestorage.com> Link: https://lore.kernel.org/r/20250208-wq_retry-v2-1-4f6f5041d303@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
685da33c | 25-Feb-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring/net: save msg_control for compat
[ Upstream commit 6ebf05189dfc6d0d597c99a6448a4d1064439a18 ]
Match the compat part of io_sendmsg_copy_hdr() with its counterpart and save msg_control.
Fix
io_uring/net: save msg_control for compat
[ Upstream commit 6ebf05189dfc6d0d597c99a6448a4d1064439a18 ]
Match the compat part of io_sendmsg_copy_hdr() with its counterpart and save msg_control.
Fixes: c55978024d123 ("io_uring/net: move receive multishot out of the generic msghdr path") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2a8418821fe83d3b64350ad2b3c0303e9b732bbd.1740498502.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
b9826e3b | 14-Feb-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: prevent opcode speculation
commit 1e988c3fe1264708f4f92109203ac5b1d65de50b upstream.
sqe->opcode is used for different tables, make sure we santitise it against speculations.
Cc: stable@
io_uring: prevent opcode speculation
commit 1e988c3fe1264708f4f92109203ac5b1d65de50b upstream.
sqe->opcode is used for different tables, make sure we santitise it against speculations.
Cc: stable@vger.kernel.org Fixes: d3656344fea03 ("io_uring: add lookup table for various opcode needs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Li Zetao <lizetao1@huawei.com> Link: https://lore.kernel.org/r/7eddbf31c8ca0a3947f8ed98271acc2b4349c016.1739568408.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
146a185f | 12-Feb-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring/kbuf: reallocate buf lists on upgrade
commit 8802766324e1f5d414a81ac43365c20142e85603 upstream.
IORING_REGISTER_PBUF_RING can reuse an old struct io_buffer_list if it was created for legac
io_uring/kbuf: reallocate buf lists on upgrade
commit 8802766324e1f5d414a81ac43365c20142e85603 upstream.
IORING_REGISTER_PBUF_RING can reuse an old struct io_buffer_list if it was created for legacy selected buffer and has been emptied. It violates the requirement that most of the field should stay stable after publish. Always reallocate it instead.
Cc: stable@vger.kernel.org Reported-by: Pumpkin Chang <pumpkin@devco.re> Fixes: 2fcabce2d7d34 ("io_uring: disallow mixed provided buffer group registrations") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
644636ee | 10-Feb-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring/rw: commit provided buffer state on async
When we get -EIOCBQUEUED, we need to ensure that the buffer is consumed from the provided buffer ring, which can be done with io_kbuf_recycle() + R
io_uring/rw: commit provided buffer state on async
When we get -EIOCBQUEUED, we need to ensure that the buffer is consumed from the provided buffer ring, which can be done with io_kbuf_recycle() + REQ_F_PARTIAL_IO.
Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg> Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Reported-by: Jacob Soo <jacob.soo@starlabs.sg> Fixes: c7fb19428d67d ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
a94592ec | 10-Feb-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: fix io_req_prep_async with provided buffers
io_req_prep_async() can import provided buffers, commit the ring state by giving up on that before, it'll be reimported later if needed.
Report
io_uring: fix io_req_prep_async with provided buffers
io_req_prep_async() can import provided buffers, commit the ring state by giving up on that before, it'll be reimported later if needed.
Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg> Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Reported-by: Jacob Soo <jacob.soo@starlabs.sg> Fixes: c7fb19428d67d ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
130675a2 | 30-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/net: don't retry connect operation on EPOLLERR
commit 8c8492ca64e79c6e0f433e8c9d2bcbd039ef83d0 upstream.
If a socket is shutdown before the connection completes, POLLERR is set in the poll
io_uring/net: don't retry connect operation on EPOLLERR
commit 8c8492ca64e79c6e0f433e8c9d2bcbd039ef83d0 upstream.
If a socket is shutdown before the connection completes, POLLERR is set in the poll mask. However, connect ignores this as it doesn't know, and attempts the connection again. This may lead to a bogus -ETIMEDOUT result, where it should have noticed the POLLERR and just returned -ECONNRESET instead.
Have the poll logic check for whether or not POLLERR is set in the mask, and if so, mark the request as failed. Then connect can appropriately fail the request rather than retry it.
Reported-by: Sergey Galas <ssgalas@cloud.ru> Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/discussions/1335 Fixes: 3fb1bd688172 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
b86f1d51 | 27-Jan-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: fix multishots with selected buffers
commit d63b0e8a628e62ca85a0f7915230186bb92f8bb4 upstream.
We do io_kbuf_recycle() when arming a poll but every iteration of a multishot can grab more
io_uring: fix multishots with selected buffers
commit d63b0e8a628e62ca85a0f7915230186bb92f8bb4 upstream.
We do io_kbuf_recycle() when arming a poll but every iteration of a multishot can grab more buffers, which is why we need to flush the kbuf ring state before continuing with waiting.
Cc: stable@vger.kernel.org Fixes: b3fdea6ecb55c ("io_uring: multishot recv") Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg> Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Reported-by: Jacob Soo <jacob.soo@starlabs.sg> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1bfc9990fe435f1fc6152ca9efeba5eb3e68339c.1738025570.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
563ba170 | 22-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/uring_cmd: use cached cmd_op in io_uring_cmd_sock()
[ Upstream commit d58d82bd0efd6c8edd452fc2f6c6dd052ec57cb2 ]
io_uring_cmd_sock() does a normal read of cmd->sqe->cmd_op, where it really
io_uring/uring_cmd: use cached cmd_op in io_uring_cmd_sock()
[ Upstream commit d58d82bd0efd6c8edd452fc2f6c6dd052ec57cb2 ]
io_uring_cmd_sock() does a normal read of cmd->sqe->cmd_op, where it really should be using a READ_ONCE() as ->sqe may still be pointing to the original SQE. Since the prep side already does this READ_ONCE() and stores it locally, use that value rather than re-read it.
Fixes: 8e9fad0e70b7b ("io_uring: Add io_uring command support for sockets") Link: https://lore.kernel.org/r/20250121-uring-sockcmd-fix-v1-1-add742802a29@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
8efff2aa | 08-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/eventfd: ensure io_eventfd_signal() defers another RCU period
Commit c9a40292a44e78f71258b8522655bffaf5753bdb upstream.
io_eventfd_do_signal() is invoked from an RCU callback, but when dro
io_uring/eventfd: ensure io_eventfd_signal() defers another RCU period
Commit c9a40292a44e78f71258b8522655bffaf5753bdb upstream.
io_eventfd_do_signal() is invoked from an RCU callback, but when dropping the reference to the io_ev_fd, it calls io_eventfd_free() directly if the refcount drops to zero. This isn't correct, as any potential freeing of the io_ev_fd should be deferred another RCU grace period.
Just call io_eventfd_put() rather than open-code the dec-and-test and free, which will correctly defer it another RCU grace period.
Fixes: 21a091b970cd ("io_uring: signal registered eventfd to process deferred task work") Reported-by: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
e3ed5a14 | 04-Jan-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring/timeout: fix multishot updates
commit c83c846231db8b153bfcb44d552d373c34f78245 upstream.
After update only the first shot of a multishot timeout request adheres to the new timeout value wh
io_uring/timeout: fix multishot updates
commit c83c846231db8b153bfcb44d552d373c34f78245 upstream.
After update only the first shot of a multishot timeout request adheres to the new timeout value while all subsequent retries continue to use the old value. Don't forget to update the timeout stored in struct io_timeout_data.
Cc: stable@vger.kernel.org Fixes: ea97f6c8558e8 ("io_uring: add support for multishot timeouts") Reported-by: Christian Mazakas <christian.mazakas@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e6516c3304eb654ec234cfa65c88a9579861e597.1736015288.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
80120bb4 | 26-Dec-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring/sqpoll: fix sqpoll error handling races
commit e33ac68e5e21ec1292490dfe061e75c0dbdd3bd4 upstream.
BUG: KASAN: slab-use-after-free in __lock_acquire+0x370b/0x4a10 kernel/locking/lockdep.c:5
io_uring/sqpoll: fix sqpoll error handling races
commit e33ac68e5e21ec1292490dfe061e75c0dbdd3bd4 upstream.
BUG: KASAN: slab-use-after-free in __lock_acquire+0x370b/0x4a10 kernel/locking/lockdep.c:5089 Call Trace: <TASK> ... _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline] try_to_wake_up+0xb5/0x23c0 kernel/sched/core.c:4205 io_sq_thread_park+0xac/0xe0 io_uring/sqpoll.c:55 io_sq_thread_finish+0x6b/0x310 io_uring/sqpoll.c:96 io_sq_offload_create+0x162/0x11d0 io_uring/sqpoll.c:497 io_uring_create io_uring/io_uring.c:3724 [inline] io_uring_setup+0x1728/0x3230 io_uring/io_uring.c:3806 ...
Kun Hu reports that the SQPOLL creating error path has UAF, which happens if io_uring_alloc_task_context() fails and then io_sq_thread() manages to run and complete before the rest of error handling code, which means io_sq_thread_finish() is looking at already killed task.
Note that this is mostly theoretical, requiring fault injection on the allocation side to trigger in practice.
Cc: stable@vger.kernel.org Reported-by: Kun Hu <huk23@m.fudan.edu.cn> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0f2f1aa5729332612bd01fe0f2f385fd1f06ce7c.1735231717.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
4cba4412 | 18-Mar-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring/rw: avoid punting to io-wq directly
Commit 6e6b8c62120a22acd8cb759304e4cd2e3215d488 upstream.
kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to
io_uring/rw: avoid punting to io-wq directly
Commit 6e6b8c62120a22acd8cb759304e4cd2e3215d488 upstream.
kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let the core code io_uring handle offloading.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/413564e550fe23744a970e1783dfa566291b0e6f.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 6e6b8c62120a22acd8cb759304e4cd2e3215d488) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
41928840 | 10-Sep-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN
Commit c0a9d496e0fece67db777bd48550376cf2960c47 upstream.
Some file systems, ocfs2 in this case, will return -EOPNOTSUPP for an IOCB_NOWA
io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN
Commit c0a9d496e0fece67db777bd48550376cf2960c47 upstream.
Some file systems, ocfs2 in this case, will return -EOPNOTSUPP for an IOCB_NOWAIT read/write attempt. While this can be argued to be correct, the usual return value for something that requires blocking issue is -EAGAIN.
A refactoring io_uring commit dropped calling kiocb_done() for negative return values, which is otherwise where we already do that transformation. To ensure we catch it in both spots, check it in __io_read() itself as well.
Reported-by: Robert Sander <r.sander@heinlein-support.de> Link: https://fosstodon.org/@gurubert@mastodon.gurubert.de/113112431889638440 Cc: stable@vger.kernel.org Fixes: a08d195b586a ("io_uring/rw: split io_read() into a helper") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
6c27fc6a | 11-Sep-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: split io_read() into a helper
Commit a08d195b586a217d76b42062f88f375a3eedda4d upstream.
Add __io_read() which does the grunt of the work, leaving the completion side to the new io_read
io_uring/rw: split io_read() into a helper
Commit a08d195b586a217d76b42062f88f375a3eedda4d upstream.
Add __io_read() which does the grunt of the work, leaving the completion side to the new io_read(). No functional changes in this patch.
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit a08d195b586a217d76b42062f88f375a3eedda4d) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
2ca94c8d | 19-Dec-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: check if iowq is killed before queuing
commit dbd2ca9367eb19bc5e269b8c58b0b1514ada9156 upstream.
task work can be executed after the task has gone through io_uring termination, whether it
io_uring: check if iowq is killed before queuing
commit dbd2ca9367eb19bc5e269b8c58b0b1514ada9156 upstream.
task work can be executed after the task has gone through io_uring termination, whether it's the final task_work run or the fallback path. In this case, task work will find ->io_wq being already killed and null'ed, which is a problem if it then tries to forward the request to io_queue_iowq(). Make io_queue_iowq() fail requests in this case.
Note that it also checks PF_KTHREAD, because the user can first close a DEFER_TASKRUN ring and shortly after kill the task, in which case ->iowq check would race.
Cc: stable@vger.kernel.org Fixes: 50c52250e2d74 ("block: implement async io_uring discard cmd") Fixes: 773af69121ecc ("io_uring: always reissue from task_work context") Reported-by: Will <willsroot@protonmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/63312b4a2c2bb67ad67b857d17a300e1d3b078e8.1734637909.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|