a52d4f65 | 28-Sep-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/fs: remove sqe->rw_flags checking from LINKAT
This is unionized with the actual link flags, so they can of course be set and they will be evaluated further down. If not we fail any LINKAT t
io_uring/fs: remove sqe->rw_flags checking from LINKAT
This is unionized with the actual link flags, so they can of course be set and they will be evaluated further down. If not we fail any LINKAT that has to set option flags.
Fixes: cf30da90bc3a ("io_uring: add support for IORING_OP_LINKAT") Cc: stable@vger.kernel.org Reported-by: Thomas Leonard <talex5@gmail.com> Link: https://github.com/axboe/liburing/issues/955 Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
023464fe | 07-Sep-2023 |
Jens Axboe <axboe@kernel.dk> |
Revert "io_uring: fix IO hang in io_wq_put_and_exit from do_exit()"
This reverts commit b484a40dc1f16edb58e5430105a021e1916e6f27.
This commit cancels all requests with io-wq, not just the ones from
Revert "io_uring: fix IO hang in io_wq_put_and_exit from do_exit()"
This reverts commit b484a40dc1f16edb58e5430105a021e1916e6f27.
This commit cancels all requests with io-wq, not just the ones from the originating task. This breaks use cases that have thread pools, or just multiple tasks issuing requests on the same ring. The liburing regression test for this also shows that problem:
$ test/thread-exit.t cqe->res=-125, Expected 512
where an IO thread gets its request canceled rather than complete successfully.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
27122c07 | 07-Sep-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: fix unprotected iopoll overflow
[ 71.490669] WARNING: CPU: 3 PID: 17070 at io_uring/io_uring.c:769 io_cqring_event_overflow+0x47b/0x6b0 [ 71.498381] Call Trace: [ 71.498590] <TASK>
io_uring: fix unprotected iopoll overflow
[ 71.490669] WARNING: CPU: 3 PID: 17070 at io_uring/io_uring.c:769 io_cqring_event_overflow+0x47b/0x6b0 [ 71.498381] Call Trace: [ 71.498590] <TASK> [ 71.501858] io_req_cqe_overflow+0x105/0x1e0 [ 71.502194] __io_submit_flush_completions+0x9f9/0x1090 [ 71.503537] io_submit_sqes+0xebd/0x1f00 [ 71.503879] __do_sys_io_uring_enter+0x8c5/0x2380 [ 71.507360] do_syscall_64+0x39/0x80
We decoupled CQ locking from ->task_complete but haven't fixed up places forcing locking for CQ overflows.
Fixes: ec26c225f06f5 ("io_uring: merge iopoll and normal completion paths") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
45500dc4 | 07-Sep-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: break out of iowq iopoll on teardown
io-wq will retry iopoll even when it failed with -EAGAIN. If that races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers, such workers
io_uring: break out of iowq iopoll on teardown
io-wq will retry iopoll even when it failed with -EAGAIN. If that races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers, such workers might potentially infinitely spin retrying iopoll again and again and each time failing on some allocation / waiting / etc. Don't keep spinning if io-wq is dying.
Fixes: 561fb04a6a225 ("io_uring: replace workqueue usage with io-wq") Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
76d3ccec | 21-Aug-2023 |
Matteo Rizzo <matteorizzo@google.com> |
io_uring: add a sysctl to disable io_uring system-wide
Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or 2. When 0 (the default), all processes are allowed to create io_uring i
io_uring: add a sysctl to disable io_uring system-wide
Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or 2. When 0 (the default), all processes are allowed to create io_uring instances, which is the current behavior. When 1, io_uring creation is disabled (io_uring_setup() will fail with -EPERM) for unprivileged processes not in the kernel.io_uring_group group. When 2, calls to io_uring_setup() fail with -EPERM regardless of privilege.
Signed-off-by: Matteo Rizzo <matteorizzo@google.com> [JEM: modified to add io_uring_group] Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Link: https://lore.kernel.org/r/x49y1i42j1z.fsf@segfault.boston.devel.redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
32f5dea0 | 01-Sep-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/fdinfo: only print ->sq_array[] if it's there
If a ring is setup with IORING_SETUP_NO_SQARRAY, then we don't have the SQ array. Don't try to dump info from it through fdinfo if that is the
io_uring/fdinfo: only print ->sq_array[] if it's there
If a ring is setup with IORING_SETUP_NO_SQARRAY, then we don't have the SQ array. Don't try to dump info from it through fdinfo if that is the case.
Reported-by: syzbot+216e2ea6e0bf4a0acdd7@syzkaller.appspotmail.com Fixes: 2af89abda7d9 ("io_uring: add option to remove SQ indirection") Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
b484a40d | 01-Sep-2023 |
Ming Lei <ming.lei@redhat.com> |
io_uring: fix IO hang in io_wq_put_and_exit from do_exit()
io_wq_put_and_exit() is called from do_exit(), but all FIXED_FILE requests in io_wq aren't canceled in io_uring_cancel_generic() called fro
io_uring: fix IO hang in io_wq_put_and_exit from do_exit()
io_wq_put_and_exit() is called from do_exit(), but all FIXED_FILE requests in io_wq aren't canceled in io_uring_cancel_generic() called from do_exit(). Meantime io_wq IO code path may share resource with normal iopoll code path.
So if any HIPRI request is submittd via io_wq, this request may not get resouce for moving on, given iopoll isn't possible in io_wq_put_and_exit().
The issue can be triggered when terminating 't/io_uring -n4 /dev/nullb0' with default null_blk parameters.
Fix it by always cancelling all requests in io_wq by adding helper of io_uring_cancel_wq(), and this way is reasonable because io_wq destroying follows canceling requests immediately.
Closes: https://lore.kernel.org/linux-block/3893581.1691785261@warthog.procyon.org.uk/ Reported-by: David Howells <dhowells@redhat.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230901134916.2415386-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
bd6fc5da | 28-Aug-2023 |
Gabriel Krisman Bertazi <krisman@suse.de> |
io_uring: Don't set affinity on a dying sqpoll thread
Syzbot reported a null-ptr-deref of sqd->thread inside io_sqpoll_wq_cpu_affinity. It turns out the sqd->thread can go away from under us during
io_uring: Don't set affinity on a dying sqpoll thread
Syzbot reported a null-ptr-deref of sqd->thread inside io_sqpoll_wq_cpu_affinity. It turns out the sqd->thread can go away from under us during io_uring_register, in case the process gets a fatal signal during io_uring_register.
It is not particularly hard to hit the race, and while I am not sure this is the exact case hit by syzbot, it solves it. Finally, checking ->thread is enough to close the race because we locked sqd while "parking" the thread, thus preventing it from going away.
I reproduced it fairly consistently with a program that does:
int main(void) { ... io_uring_queue_init(RING_LEN, &ring1, IORING_SETUP_SQPOLL); while (1) { io_uring_register_iowq_aff(ring, 1, &mask); } }
Executed in a loop with timeout to trigger SIGTERM: while true; do timeout 1 /a.out ; done
This will hit the following BUG() in very few attempts.
BUG: kernel NULL pointer dereference, address: 00000000000007a8 PGD 800000010e949067 P4D 800000010e949067 PUD 10e46e067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 15715 Comm: dead-sqpoll Not tainted 6.5.0-rc7-next-20230825-g193296236fa0-dirty #23 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:io_sqpoll_wq_cpu_affinity+0x27/0x70 Code: 90 90 90 0f 1f 44 00 00 55 53 48 8b 9f 98 03 00 00 48 85 db 74 4f 48 89 df 48 89 f5 e8 e2 f8 ff ff 48 8b 43 38 48 85 c0 74 22 <48> 8b b8 a8 07 00 00 48 89 ee e8 ba b1 00 00 48 89 df 89 c5 e8 70 RSP: 0018:ffffb04040ea7e70 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff93c010749e40 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffffa7653331 RDI: 00000000ffffffff RBP: ffffb04040ea7eb8 R08: 0000000000000000 R09: c0000000ffffdfff R10: ffff93c01141b600 R11: ffffb04040ea7d18 R12: ffff93c00ea74840 R13: 0000000000000011 R14: 0000000000000000 R15: ffff93c00ea74800 FS: 00007fb7c276ab80(0000) GS:ffff93c36f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000007a8 CR3: 0000000111634003 CR4: 0000000000370ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die_body+0x1a/0x60 ? page_fault_oops+0x154/0x440 ? do_user_addr_fault+0x174/0x7b0 ? exc_page_fault+0x63/0x140 ? asm_exc_page_fault+0x22/0x30 ? io_sqpoll_wq_cpu_affinity+0x27/0x70 __io_register_iowq_aff+0x2b/0x60 __io_uring_register+0x614/0xa70 __x64_sys_io_uring_register+0xaa/0x1a0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7fb7c226fec9 Code: 2e 00 b8 ca 00 00 00 0f 05 eb a5 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 7f 2d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe2c0674f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb7c226fec9 RDX: 00007ffe2c067530 RSI: 0000000000000011 RDI: 0000000000000003 RBP: 00007ffe2c0675d0 R08: 00007ffe2c067550 R09: 00007ffe2c067550 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffe2c067750 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: CR2: 00000000000007a8 ---[ end trace 0000000000000000 ]---
Reported-by: syzbot+c74fea926a78b8a91042@syzkaller.appspotmail.com Fixes: ebdfefc09c6d ("io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/87v8cybuo6.fsf@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
0aa7aa5f | 24-Aug-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: move multishot cqe cache in ctx
We cache multishot CQEs before flushing them to the CQ in submit_state.cqe. It's a 16 entry cache totalling 256 bytes in the middle of the io_submit_state s
io_uring: move multishot cqe cache in ctx
We cache multishot CQEs before flushing them to the CQ in submit_state.cqe. It's a 16 entry cache totalling 256 bytes in the middle of the io_submit_state structure. Move it out of there, it should help with CPU caches for the submission state, and shouldn't affect cached CQEs.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dbe1f39c043ee23da918836be44fcec252ce6711.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
2af89abd | 24-Aug-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: add option to remove SQ indirection
Not many aware, but io_uring submission queue has two levels. The first level usually appears as sq_array and stores indexes into the actual SQ.
To my
io_uring: add option to remove SQ indirection
Not many aware, but io_uring submission queue has two levels. The first level usually appears as sq_array and stores indexes into the actual SQ.
To my knowledge, no one has ever seriously used it, nor liburing exposes it to users. Add IORING_SETUP_NO_SQARRAY, when set we don't bother creating and using the sq_array and SQ heads/tails will be pointing directly into the SQ. Improves memory footprint, in term of both allocations as well as cache usage, and also should make io_get_sqe() less branchy in the end.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0ffa3268a5ef61d326201ff43a233315c96312e0.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
093a650b | 24-Aug-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: force inline io_fill_cqe_req
There are only 2 callers of io_fill_cqe_req left, and one of them is extremely hot. Force inline the function.
Signed-off-by: Pavel Begunkov <asml.silence@gma
io_uring: force inline io_fill_cqe_req
There are only 2 callers of io_fill_cqe_req left, and one of them is extremely hot. Force inline the function.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ffce4fc5e3521966def848a4d930586dfe33ae11.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
ec26c225 | 24-Aug-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: merge iopoll and normal completion paths
io_do_iopoll() and io_submit_flush_completions() are pretty similar, both filling CQEs and then free a list of requests. Don't duplicate it and mak
io_uring: merge iopoll and normal completion paths
io_do_iopoll() and io_submit_flush_completions() are pretty similar, both filling CQEs and then free a list of requests. Don't duplicate it and make iopoll use __io_submit_flush_completions(), which also helps with inlining and other optimisations.
For that, we need to first find all completed iopoll requests and splice them from the iopoll list and then pass it down. This adds one extra list traversal, which should be fine as requests will stay hot in cache.
CQ locking is already conditional, introduce ->lockless_cq and skip locking for IOPOLL as it's protected by ->uring_lock.
We also add a wakeup optimisation for IOPOLL to __io_cq_unlock_post(), so it works just like io_cqring_ev_posted_iopoll().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3840473f5e8a960de35b77292026691880f6bdbc.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
54927baf | 24-Aug-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: reorder cqring_flush and wakeups
Unlike in the past, io_commit_cqring_flush() doesn't do anything that may need io_cqring_wake() to be issued after, all requests it completes will go via t
io_uring: reorder cqring_flush and wakeups
Unlike in the past, io_commit_cqring_flush() doesn't do anything that may need io_cqring_wake() to be issued after, all requests it completes will go via task_work. Do io_commit_cqring_flush() after io_cqring_wake() to clean up __io_cq_unlock_post().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ed32dcfeec47e6c97bd6b18c152ddce5b218403f.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
59fbc409 | 24-Aug-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: optimise extra io_get_cqe null check
If the cached cqe check passes in io_get_cqe*() it already means that the cqe we return is valid and non-zero, however the compiler is unable to optimi
io_uring: optimise extra io_get_cqe null check
If the cached cqe check passes in io_get_cqe*() it already means that the cqe we return is valid and non-zero, however the compiler is unable to optimise null checks like in io_fill_cqe_req().
Do a bit of trickery, return success/fail boolean from io_get_cqe*() and store cqe in the cqe parameter. That makes it do the right thing, erasing the check together with the introduced indirection.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/322ea4d3377d3d4efd8ae90ab8ed28a99f518210.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
20d6b633 | 24-Aug-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: refactor __io_get_cqe()
Make __io_get_cqe simpler by not grabbing the cqe from refilled cached, but letting io_get_cqe() do it for us. That's cleaner and removes some duplication.
Signed-
io_uring: refactor __io_get_cqe()
Make __io_get_cqe simpler by not grabbing the cqe from refilled cached, but letting io_get_cqe() do it for us. That's cleaner and removes some duplication.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/74dc8fdf2657e438b2e05e1d478a3596924604e9.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
b24c5d75 | 24-Aug-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: simplify big_cqe handling
Don't keep big_cqe bits of req in a union with hash_node, find a separate space for it. It's bit safer, but also if we keep it always initialised, we can get rid
io_uring: simplify big_cqe handling
Don't keep big_cqe bits of req in a union with hash_node, find a separate space for it. It's bit safer, but also if we keep it always initialised, we can get rid of ugly REQ_F_CQE32_INIT handling.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/447aa1b2968978c99e655ba88db536e903df0fe9.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
31d3ba92 | 24-Aug-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: cqe init hardening
io_kiocb::cqe stores the completion info which we'll memcpy to userspace, and we rely on callbacks and other later steps to populate it with right values. We have never
io_uring: cqe init hardening
io_kiocb::cqe stores the completion info which we'll memcpy to userspace, and we rely on callbacks and other later steps to populate it with right values. We have never had problems with that, but it would still be safer to zero it on allocation.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b16a3b64dde678686460d3c3792c3ba6d3d1bc7a.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
a0727c73 | 24-Aug-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: improve cqe !tracing hot path
While looking at io_fill_cqe_req()'s asm I stumbled on our trace points turning into the chunk below:
trace_io_uring_complete(req->ctx, req, req->cqe.user_da
io_uring: improve cqe !tracing hot path
While looking at io_fill_cqe_req()'s asm I stumbled on our trace points turning into the chunk below:
trace_io_uring_complete(req->ctx, req, req->cqe.user_data, req->cqe.res, req->cqe.flags, req->extra1, req->extra2);
io_uring/io_uring.c:898: trace_io_uring_complete(req->ctx, req, req->cqe.user_data, movq 232(%rbx), %rdi # req_44(D)->big_cqe.extra2, _5 movq 224(%rbx), %rdx # req_44(D)->big_cqe.extra1, _6 movl 84(%rbx), %r9d # req_44(D)->cqe.D.81184.flags, _7 movl 80(%rbx), %r8d # req_44(D)->cqe.res, _8 movq 72(%rbx), %rcx # req_44(D)->cqe.user_data, _9 movq 88(%rbx), %rsi # req_44(D)->ctx, _10 ./arch/x86/include/asm/jump_label.h:27: asm_volatile_goto("1:" 1:jmp .L1772 # objtool NOPs this # ...
It does a jump_label for actual tracing, but those 6 moves will stay there in the hottest io_uring path. As an optimisation, add a trace_io_uring_complete_enabled() check, which is also uses jump_labels, it tricks the compiler into behaving. It removes the junk without changing anything else int the hot path.
Note: apparently, it's not only me noticing it, and people are also working it around. We should remove the check when it's solved generically or rework tracing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/555d8312644b3776f4be7e23f9b92943875c4bc7.1692916914.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
e484fd73 | 17-Aug-2023 |
Amir Goldstein <amir73il@gmail.com> |
io_uring: use kiocb_{start,end}_write() helpers
Use helpers instead of the open coded dance to silence lockdep warnings.
Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73i
io_uring: use kiocb_{start,end}_write() helpers
Use helpers instead of the open coded dance to silence lockdep warnings.
Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Message-Id: <20230817141337.1025891-5-amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|