Revision tags: v6.6.35, v6.6.34, v6.6.33 |
|
#
1bbadf95 |
| 04-Jun-2024 |
Su Hui <suhui@nfschina.com> |
io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()
[ Upstream commit 91215f70ea8541e9011c0b48f8b59b9e0ce6953b ]
Clang static checker (scan-build) warning: o_uring/io-wq.c:line 1051,
io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()
[ Upstream commit 91215f70ea8541e9011c0b48f8b59b9e0ce6953b ]
Clang static checker (scan-build) warning: o_uring/io-wq.c:line 1051, column 3 The expression is an uninitialized value. The computed value will also be garbage.
'match.nr_pending' is used in io_acct_cancel_pending_work(), but it is not fully initialized. Change the order of assignment for 'match' to fix this problem.
Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Su Hui <suhui@nfschina.com> Link: https://lore.kernel.org/r/20240604121242.2661244-1-suhui@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.32, v6.6.31 |
|
#
ab702c34 |
| 07-May-2024 |
Breno Leitao <leitao@debian.org> |
io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
[ Upstream commit 8a565304927fbd28c9f028c492b5c1714002cbab ]
Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq to
io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
[ Upstream commit 8a565304927fbd28c9f028c492b5c1714002cbab ]
Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq to address potential data races.
The structure io_worker->flags may be accessed through various data paths, leading to concurrency issues. When KCSAN is enabled, it reveals data races occurring in io_worker_handle_work and io_wq_activate_free_worker functions.
BUG: KCSAN: data-race in io_worker_handle_work / io_wq_activate_free_worker write to 0xffff8885c4246404 of 4 bytes by task 49071 on cpu 28: io_worker_handle_work (io_uring/io-wq.c:434 io_uring/io-wq.c:569) io_wq_worker (io_uring/io-wq.c:?) <snip>
read to 0xffff8885c4246404 of 4 bytes by task 49024 on cpu 5: io_wq_activate_free_worker (io_uring/io-wq.c:? io_uring/io-wq.c:285) io_wq_enqueue (io_uring/io-wq.c:947) io_queue_iowq (io_uring/io_uring.c:524) io_req_task_submit (io_uring/io_uring.c:1511) io_handle_tw_list (io_uring/io_uring.c:1198) <snip>
Line numbers against commit 18daea77cca6 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm").
These races involve writes and reads to the same memory location by different tasks running on different CPUs. To mitigate this, refactor the code to use atomic operations such as set_bit(), test_bit(), and clear_bit() instead of basic "and" and "or" operations. This ensures thread-safe manipulation of worker flags.
Also, move `create_index` to avoid holes in the structure.
Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240507170002.2269003-1-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 91215f70ea85 ("io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.30, v6.6.29, v6.6.28 |
|
#
f32f810d |
| 15-Apr-2024 |
Gabriel Krisman Bertazi <krisman@suse.de> |
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active work") closed a race between a cancellation and the work being removed from the wq for execution. To ensure the request is always reachable by the cancellation, we need to move it within the wq lock, which also synchronizes the cancellation. But commit 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") replaced the wq lock here and accidentally reintroduced the race by releasing the acct_lock too early.
In other words:
worker | cancellation work = io_get_next_work() | raw_spin_unlock(&acct->lock); | | | io_acct_cancel_pending_work | io_wq_worker_cancel() worker->next_work = work
Using acct_lock is still enough since we synchronize on it on io_acct_cancel_pending_work.
Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.35, v6.6.34, v6.6.33 |
|
#
1bbadf95 |
| 04-Jun-2024 |
Su Hui <suhui@nfschina.com> |
io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()
[ Upstream commit 91215f70ea8541e9011c0b48f8b59b9e0ce6953b ]
Clang static checker (scan-build) warning: o_uring/io-wq.c:line 1051,
io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()
[ Upstream commit 91215f70ea8541e9011c0b48f8b59b9e0ce6953b ]
Clang static checker (scan-build) warning: o_uring/io-wq.c:line 1051, column 3 The expression is an uninitialized value. The computed value will also be garbage.
'match.nr_pending' is used in io_acct_cancel_pending_work(), but it is not fully initialized. Change the order of assignment for 'match' to fix this problem.
Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Su Hui <suhui@nfschina.com> Link: https://lore.kernel.org/r/20240604121242.2661244-1-suhui@nfschina.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.32, v6.6.31 |
|
#
ab702c34 |
| 07-May-2024 |
Breno Leitao <leitao@debian.org> |
io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
[ Upstream commit 8a565304927fbd28c9f028c492b5c1714002cbab ]
Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq to
io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
[ Upstream commit 8a565304927fbd28c9f028c492b5c1714002cbab ]
Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq to address potential data races.
The structure io_worker->flags may be accessed through various data paths, leading to concurrency issues. When KCSAN is enabled, it reveals data races occurring in io_worker_handle_work and io_wq_activate_free_worker functions.
BUG: KCSAN: data-race in io_worker_handle_work / io_wq_activate_free_worker write to 0xffff8885c4246404 of 4 bytes by task 49071 on cpu 28: io_worker_handle_work (io_uring/io-wq.c:434 io_uring/io-wq.c:569) io_wq_worker (io_uring/io-wq.c:?) <snip>
read to 0xffff8885c4246404 of 4 bytes by task 49024 on cpu 5: io_wq_activate_free_worker (io_uring/io-wq.c:? io_uring/io-wq.c:285) io_wq_enqueue (io_uring/io-wq.c:947) io_queue_iowq (io_uring/io_uring.c:524) io_req_task_submit (io_uring/io_uring.c:1511) io_handle_tw_list (io_uring/io_uring.c:1198) <snip>
Line numbers against commit 18daea77cca6 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm").
These races involve writes and reads to the same memory location by different tasks running on different CPUs. To mitigate this, refactor the code to use atomic operations such as set_bit(), test_bit(), and clear_bit() instead of basic "and" and "or" operations. This ensures thread-safe manipulation of worker flags.
Also, move `create_index` to avoid holes in the structure.
Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240507170002.2269003-1-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 91215f70ea85 ("io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.30, v6.6.29, v6.6.28 |
|
#
f32f810d |
| 15-Apr-2024 |
Gabriel Krisman Bertazi <krisman@suse.de> |
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active work") closed a race between a cancellation and the work being removed from the wq for execution. To ensure the request is always reachable by the cancellation, we need to move it within the wq lock, which also synchronizes the cancellation. But commit 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") replaced the wq lock here and accidentally reintroduced the race by releasing the acct_lock too early.
In other words:
worker | cancellation work = io_get_next_work() | raw_spin_unlock(&acct->lock); | | | io_acct_cancel_pending_work | io_wq_worker_cancel() worker->next_work = work
Using acct_lock is still enough since we synchronize on it on io_acct_cancel_pending_work.
Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.30, v6.6.29, v6.6.28 |
|
#
f32f810d |
| 15-Apr-2024 |
Gabriel Krisman Bertazi <krisman@suse.de> |
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active work") closed a race between a cancellation and the work being removed from the wq for execution. To ensure the request is always reachable by the cancellation, we need to move it within the wq lock, which also synchronizes the cancellation. But commit 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") replaced the wq lock here and accidentally reintroduced the race by releasing the acct_lock too early.
In other words:
worker | cancellation work = io_get_next_work() | raw_spin_unlock(&acct->lock); | | | io_acct_cancel_pending_work | io_wq_worker_cancel() worker->next_work = work
Using acct_lock is still enough since we synchronize on it on io_acct_cancel_pending_work.
Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.30, v6.6.29, v6.6.28 |
|
#
f32f810d |
| 15-Apr-2024 |
Gabriel Krisman Bertazi <krisman@suse.de> |
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active work") closed a race between a cancellation and the work being removed from the wq for execution. To ensure the request is always reachable by the cancellation, we need to move it within the wq lock, which also synchronizes the cancellation. But commit 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") replaced the wq lock here and accidentally reintroduced the race by releasing the acct_lock too early.
In other words:
worker | cancellation work = io_get_next_work() | raw_spin_unlock(&acct->lock); | | | io_acct_cancel_pending_work | io_wq_worker_cancel() worker->next_work = work
Using acct_lock is still enough since we synchronize on it on io_acct_cancel_pending_work.
Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.30, v6.6.29, v6.6.28 |
|
#
f32f810d |
| 15-Apr-2024 |
Gabriel Krisman Bertazi <krisman@suse.de> |
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active work") closed a race between a cancellation and the work being removed from the wq for execution. To ensure the request is always reachable by the cancellation, we need to move it within the wq lock, which also synchronizes the cancellation. But commit 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") replaced the wq lock here and accidentally reintroduced the race by releasing the acct_lock too early.
In other words:
worker | cancellation work = io_get_next_work() | raw_spin_unlock(&acct->lock); | | | io_acct_cancel_pending_work | io_wq_worker_cancel() worker->next_work = work
Using acct_lock is still enough since we synchronize on it on io_acct_cancel_pending_work.
Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.30, v6.6.29, v6.6.28 |
|
#
f32f810d |
| 15-Apr-2024 |
Gabriel Krisman Bertazi <krisman@suse.de> |
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active work") closed a race between a cancellation and the work being removed from the wq for execution. To ensure the request is always reachable by the cancellation, we need to move it within the wq lock, which also synchronizes the cancellation. But commit 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") replaced the wq lock here and accidentally reintroduced the race by releasing the acct_lock too early.
In other words:
worker | cancellation work = io_get_next_work() | raw_spin_unlock(&acct->lock); | | | io_acct_cancel_pending_work | io_wq_worker_cancel() worker->next_work = work
Using acct_lock is still enough since we synchronize on it on io_acct_cancel_pending_work.
Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.30, v6.6.29, v6.6.28 |
|
#
f32f810d |
| 15-Apr-2024 |
Gabriel Krisman Bertazi <krisman@suse.de> |
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active
io-wq: write next_work before dropping acct_lock
[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]
Commit 361aee450c6e ("io-wq: add intermediate work step between pending list and active work") closed a race between a cancellation and the work being removed from the wq for execution. To ensure the request is always reachable by the cancellation, we need to move it within the wq lock, which also synchronizes the cancellation. But commit 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") replaced the wq lock here and accidentally reintroduced the race by releasing the acct_lock too early.
In other words:
worker | cancellation work = io_get_next_work() | raw_spin_unlock(&acct->lock); | | | io_acct_cancel_pending_work | io_wq_worker_cancel() worker->next_work = work
Using acct_lock is still enough since we synchronize on it on io_acct_cancel_pending_work.
Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6 |
|
#
0f8baa3c |
| 05-Oct-2023 |
Jeff Moyer <jmoyer@redhat.com> |
io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls()
I received a bug report with the following signature:
[ 1759.937637] BUG: unable to handle page fault for address: ffff
io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls()
I received a bug report with the following signature:
[ 1759.937637] BUG: unable to handle page fault for address: ffffffffffffffe8 [ 1759.944564] #PF: supervisor read access in kernel mode [ 1759.949732] #PF: error_code(0x0000) - not-present page [ 1759.954901] PGD 7ab615067 P4D 7ab615067 PUD 7ab617067 PMD 0 [ 1759.960596] Oops: 0000 1 PREEMPT SMP PTI [ 1759.964804] CPU: 15 PID: 109 Comm: cpuhp/15 Kdump: loaded Tainted: G X ------- — 5.14.0-362.3.1.el9_3.x86_64 #1 [ 1759.976609] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/20/2018 [ 1759.985181] RIP: 0010:io_wq_for_each_worker.isra.0+0x24/0xa0 [ 1759.990877] Code: 90 90 90 90 90 90 0f 1f 44 00 00 41 56 41 55 41 54 55 48 8d 6f 78 53 48 8b 47 78 48 39 c5 74 4f 49 89 f5 49 89 d4 48 8d 58 e8 <8b> 13 85 d2 74 32 8d 4a 01 89 d0 f0 0f b1 0b 75 5c 09 ca 78 3d 48 [ 1760.009758] RSP: 0000:ffffb6f403603e20 EFLAGS: 00010286 [ 1760.015013] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: 0000000000000000 [ 1760.022188] RDX: ffffb6f403603e50 RSI: ffffffffb11e95b0 RDI: ffff9f73b09e9400 [ 1760.029362] RBP: ffff9f73b09e9478 R08: 000000000000000f R09: 0000000000000000 [ 1760.036536] R10: ffffffffffffff00 R11: ffffb6f403603d80 R12: ffffb6f403603e50 [ 1760.043712] R13: ffffffffb11e95b0 R14: ffffffffb28531e8 R15: ffff9f7a6fbdf548 [ 1760.050887] FS: 0000000000000000(0000) GS:ffff9f7a6fbc0000(0000) knlGS:0000000000000000 [ 1760.059025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1760.064801] CR2: ffffffffffffffe8 CR3: 00000007ab610002 CR4: 00000000007706e0 [ 1760.071976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1760.079150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1760.086325] PKRU: 55555554 [ 1760.089044] Call Trace: [ 1760.091501] <TASK> [ 1760.093612] ? show_trace_log_lvl+0x1c4/0x2df [ 1760.097995] ? show_trace_log_lvl+0x1c4/0x2df [ 1760.102377] ? __io_wq_cpu_online+0x54/0xb0 [ 1760.106584] ? __die_body.cold+0x8/0xd [ 1760.110356] ? page_fault_oops+0x134/0x170 [ 1760.114479] ? kernelmode_fixup_or_oops+0x84/0x110 [ 1760.119298] ? exc_page_fault+0xa8/0x150 [ 1760.123247] ? asm_exc_page_fault+0x22/0x30 [ 1760.127458] ? __pfx_io_wq_worker_affinity+0x10/0x10 [ 1760.132453] ? __pfx_io_wq_worker_affinity+0x10/0x10 [ 1760.137446] ? io_wq_for_each_worker.isra.0+0x24/0xa0 [ 1760.142527] __io_wq_cpu_online+0x54/0xb0 [ 1760.146558] cpuhp_invoke_callback+0x109/0x460 [ 1760.151029] ? __pfx_io_wq_cpu_offline+0x10/0x10 [ 1760.155673] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 1760.160320] cpuhp_thread_fun+0x8d/0x140 [ 1760.164266] smpboot_thread_fn+0xd3/0x1a0 [ 1760.168297] kthread+0xdd/0x100 [ 1760.171457] ? __pfx_kthread+0x10/0x10 [ 1760.175225] ret_from_fork+0x29/0x50 [ 1760.178826] </TASK> [ 1760.181022] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common isst_if_common ipmi_ssif nfit libnvdimm mgag200 i2c_algo_bit ioatdma drm_shmem_helper drm_kms_helper acpi_ipmi syscopyarea x86_pkg_temp_thermal sysfillrect ipmi_si intel_powerclamp sysimgblt ipmi_devintf coretemp acpi_power_meter ipmi_msghandler rapl pcspkr dca intel_pch_thermal intel_cstate ses lpc_ich intel_uncore enclosure hpilo mei_me mei acpi_tad fuse drm xfs sd_mod sg bnx2x nvme nvme_core crct10dif_pclmul crc32_pclmul nvme_common ghash_clmulni_intel smartpqi tg3 t10_pi mdio uas libcrc32c crc32c_intel scsi_transport_sas usb_storage hpwdt wmi dm_mirror dm_region_hash dm_log dm_mod [ 1760.248623] CR2: ffffffffffffffe8
A cpu hotplug callback was issued before wq->all_list was initialized. This results in a null pointer dereference. The fix is to fully setup the io_wq before calling cpuhp_state_add_instance_nocalls().
Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Link: https://lore.kernel.org/r/x49y1ghnecs.fsf@segfault.boston.devel.redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.5.5, v6.5.4, v6.5.3 |
|
#
45500dc4 |
| 07-Sep-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: break out of iowq iopoll on teardown
io-wq will retry iopoll even when it failed with -EAGAIN. If that races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers, such workers
io_uring: break out of iowq iopoll on teardown
io-wq will retry iopoll even when it failed with -EAGAIN. If that races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers, such workers might potentially infinitely spin retrying iopoll again and again and each time failing on some allocation / waiting / etc. Don't keep spinning if io-wq is dying.
Fixes: 561fb04a6a225 ("io_uring: replace workqueue usage with io-wq") Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46 |
|
#
ebdfefc0 |
| 13-Aug-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used
If we setup the ring with SQPOLL, then that polling thread has its own io-wq setup. This means that if the application uses IORIN
io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used
If we setup the ring with SQPOLL, then that polling thread has its own io-wq setup. This means that if the application uses IORING_REGISTER_IOWQ_AFF to set the io-wq affinity, we should not be setting it for the invoking task, but rather the sqpoll task.
Add an sqpoll helper that parks the thread and updates the affinity, and use that one if we're using SQPOLL.
Fixes: fe76421d1da1 ("io_uring: allow user configurable IO thread CPU affinity") Cc: stable@vger.kernel.org # 5.10+ Link: https://github.com/axboe/liburing/discussions/884 Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.45 |
|
#
22f7fb80 |
| 09-Aug-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/io-wq: don't gate worker wake up success on wake_up_process()
All we really care about is finding a free worker. If said worker is already running, it's either starting new work already or
io_uring/io-wq: don't gate worker wake up success on wake_up_process()
All we really care about is finding a free worker. If said worker is already running, it's either starting new work already or it's just finishing up existing work. For the latter, we'll be finding this work item next anyway, and for the former, if the worker does go to sleep, it'll create a new worker anyway as we have pending items.
This reduces try_to_wake_up() overhead considerably:
23.16% -10.46% [kernel.kallsyms] [k] try_to_wake_up
Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
de36a15f |
| 09-Aug-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/io-wq: reduce frequency of acct->lock acquisitions
When we check if we have work to run, we grab the acct lock, check, drop it, and then return the result. If we do have work to run, then r
io_uring/io-wq: reduce frequency of acct->lock acquisitions
When we check if we have work to run, we grab the acct lock, check, drop it, and then return the result. If we do have work to run, then running the work will again grab acct->lock and get the work item.
This causes us to grab acct->lock more frequently than we need to. If we have work to do, have io_acct_run_queue() return with the acct lock still acquired. io_worker_handle_work() is then always invoked with the acct lock already held.
In a simple test cases that stats files (IORING_OP_STATX always hits io-wq), we see a nice reduction in locking overhead with this change:
19.32% -12.55% [kernel.kallsyms] [k] __cmpwait_case_32 20.90% -12.07% [kernel.kallsyms] [k] queued_spin_lock_slowpath
Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
78848b9b |
| 09-Aug-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/io-wq: don't grab wq->lock for worker activation
The worker free list is RCU protected, and checks for workers going away when iterating it. There's no need to hold the wq->lock around the
io_uring/io-wq: don't grab wq->lock for worker activation
The worker free list is RCU protected, and checks for workers going away when iterating it. There's no need to hold the wq->lock around the lookup.
Reviewed-by: Hao Xu <howeyxu@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34 |
|
#
adeaa3f2 |
| 13-Jun-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/io-wq: clear current->worker_private on exit
A recent fix stopped clearing PF_IO_WORKER from current->flags on exit, which meant that we can now call inc/dec running on the worker after it
io_uring/io-wq: clear current->worker_private on exit
A recent fix stopped clearing PF_IO_WORKER from current->flags on exit, which meant that we can now call inc/dec running on the worker after it has been removed if it ends up scheduling in/out as part of exit.
If this happens after an RCU grace period has passed, then the struct pointed to by current->worker_private may have been freed, and we can now be accessing memory that is freed.
Ensure this doesn't happen by clearing the task worker_private field. Both io_wq_worker_running() and io_wq_worker_sleeping() check this field before going any further, and we don't need any accounting etc done after this worker has exited.
Fixes: fd37b884003c ("io_uring/io-wq: don't clear PF_IO_WORKER on exit") Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
fd37b884 |
| 12-Jun-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/io-wq: don't clear PF_IO_WORKER on exit
A recent commit gated the core dumping task exit logic on current->flags remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time. This
io_uring/io-wq: don't clear PF_IO_WORKER on exit
A recent commit gated the core dumping task exit logic on current->flags remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time. This exposed a problem with the io-wq handling of that, which explicitly clears PF_IO_WORKER before calling do_exit().
The reasons for this manual clear of PF_IO_WORKER is historical, where io-wq used to potentially trigger a sleep on exit. As the io-wq thread is exiting, it should not participate any further accounting. But these days we don't need to rely on current->flags anymore, so we can safely remove the PF_IO_WORKER clearing.
Reported-by: Zorro Lang <zlang@redhat.com> Reported-by: Dave Chinner <david@fromorbit.com> Link: https://lore.kernel.org/all/ZIZSPyzReZkGBEFy@dread.disaster.area/ Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22 |
|
#
07d99096 |
| 27-Mar-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/io-wq: drop outdated comment
Since the move to PF_IO_WORKER, we don't juggle memory context manually anymore. Remove that outdated part of the comment for __io_worker_idle().
Signed-off-by
io_uring/io-wq: drop outdated comment
Since the move to PF_IO_WORKER, we don't juggle memory context manually anymore. Remove that outdated part of the comment for __io_worker_idle().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.21 |
|
#
eb47943f |
| 21-Mar-2023 |
Gabriel Krisman Bertazi <krisman@suse.de> |
io-wq: Drop struct io_wqe
Since commit 0654b05e7e65 ("io_uring: One wqe per wq"), we have just a single io_wqe instance embedded per io_wq. Drop the extra structure in favor of accessing struct io_
io-wq: Drop struct io_wqe
Since commit 0654b05e7e65 ("io_uring: One wqe per wq"), we have just a single io_wqe instance embedded per io_wq. Drop the extra structure in favor of accessing struct io_wq directly, cleaning up quite a bit of dereferences and backpointers.
No functional changes intended. Tested with liburing's testsuite and mmtests performance microbenchmarks. I didn't observe any performance regressions.
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
dfd63baf |
| 21-Mar-2023 |
Gabriel Krisman Bertazi <krisman@suse.de> |
io-wq: Move wq accounting to io_wq
Since we now have a single io_wqe per io_wq instead of per-node, and in preparation to its removal, move the accounting into the parent structure.
Signed-off-by:
io-wq: Move wq accounting to io_wq
Since we now have a single io_wqe per io_wq instead of per-node, and in preparation to its removal, move the accounting into the parent structure.
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.20, v6.1.19, v6.1.18, v6.1.17 |
|
#
da64d6db |
| 10-Mar-2023 |
Breno Leitao <leitao@debian.org> |
io_uring: One wqe per wq
Right now io_wq allocates one io_wqe per NUMA node. As io_wq is now bound to a task, the task basically uses only the NUMA local io_wqe, and almost never changes NUMA nodes
io_uring: One wqe per wq
Right now io_wq allocates one io_wqe per NUMA node. As io_wq is now bound to a task, the task basically uses only the NUMA local io_wqe, and almost never changes NUMA nodes, thus, the other wqes are mostly unused.
Allocate just one io_wqe embedded into io_wq, and uses all possible cpus (cpu_possible_mask) in the io_wqe->cpumask.
Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20230310201107.4020580-1-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.16 |
|
#
01e68ce0 |
| 08-Mar-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers
Every now and then reports come in that are puzzled on why changing affinity on the io-wq workers fails with EINVAL. This happens beca
io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers
Every now and then reports come in that are puzzled on why changing affinity on the io-wq workers fails with EINVAL. This happens because they set PF_NO_SETAFFINITY as part of their creation, as io-wq organizes workers into groups based on what CPU they are running on.
However, this is purely an optimization and not a functional requirement. We can allow setting affinity, and just lazily update our worker to wqe mappings. If a given io-wq thread times out, it normally exits if there's no more work to do. The exception is if it's the last worker available. For the timeout case, check the affinity of the worker against group mask and exit even if it's the last worker. New workers should be created with the right mask and in the right location.
Reported-by:Daniel Dao <dqminh@cloudflare.com> Link: https://lore.kernel.org/io-uring/CA+wXwBQwgxB3_UphSny-yAP5b26meeOu1W4TwYVcD_+5gOhvPw@mail.gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19 |
|
#
e6db6f93 |
| 08-Jan-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/io-wq: only free worker if it was allocated for creation
We have two types of task_work based creation, one is using an existing worker to setup a new one (eg when going to sleep and we hav
io_uring/io-wq: only free worker if it was allocated for creation
We have two types of task_work based creation, one is using an existing worker to setup a new one (eg when going to sleep and we have no free workers), and the other is allocating a new worker. Only the latter should be freed when we cancel task_work creation for a new worker.
Fixes: af82425c6a2d ("io_uring/io-wq: free worker if task_work creation is canceled") Reported-by: syzbot+d56ec896af3637bdb7e4@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|