History log of /openbmc/linux/io_uring/io-wq.c (Results 1 – 25 of 31)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.35, v6.6.34, v6.6.33
# 1bbadf95 04-Jun-2024 Su Hui <suhui@nfschina.com>

io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()

[ Upstream commit 91215f70ea8541e9011c0b48f8b59b9e0ce6953b ]

Clang static checker (scan-build) warning:
o_uring/io-wq.c:line 1051,

io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()

[ Upstream commit 91215f70ea8541e9011c0b48f8b59b9e0ce6953b ]

Clang static checker (scan-build) warning:
o_uring/io-wq.c:line 1051, column 3
The expression is an uninitialized value. The computed value will
also be garbage.

'match.nr_pending' is used in io_acct_cancel_pending_work(), but it is
not fully initialized. Change the order of assignment for 'match' to fix
this problem.

Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock")
Signed-off-by: Su Hui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20240604121242.2661244-1-suhui@nfschina.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31
# ab702c34 07-May-2024 Breno Leitao <leitao@debian.org>

io_uring/io-wq: Use set_bit() and test_bit() at worker->flags

[ Upstream commit 8a565304927fbd28c9f028c492b5c1714002cbab ]

Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq
to

io_uring/io-wq: Use set_bit() and test_bit() at worker->flags

[ Upstream commit 8a565304927fbd28c9f028c492b5c1714002cbab ]

Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq
to address potential data races.

The structure io_worker->flags may be accessed through various data
paths, leading to concurrency issues. When KCSAN is enabled, it reveals
data races occurring in io_worker_handle_work and
io_wq_activate_free_worker functions.

BUG: KCSAN: data-race in io_worker_handle_work / io_wq_activate_free_worker
write to 0xffff8885c4246404 of 4 bytes by task 49071 on cpu 28:
io_worker_handle_work (io_uring/io-wq.c:434 io_uring/io-wq.c:569)
io_wq_worker (io_uring/io-wq.c:?)
<snip>

read to 0xffff8885c4246404 of 4 bytes by task 49024 on cpu 5:
io_wq_activate_free_worker (io_uring/io-wq.c:? io_uring/io-wq.c:285)
io_wq_enqueue (io_uring/io-wq.c:947)
io_queue_iowq (io_uring/io_uring.c:524)
io_req_task_submit (io_uring/io_uring.c:1511)
io_handle_tw_list (io_uring/io_uring.c:1198)
<snip>

Line numbers against commit 18daea77cca6 ("Merge tag 'for-linus' of
git://git.kernel.org/pub/scm/virt/kvm/kvm").

These races involve writes and reads to the same memory location by
different tasks running on different CPUs. To mitigate this, refactor
the code to use atomic operations such as set_bit(), test_bit(), and
clear_bit() instead of basic "and" and "or" operations. This ensures
thread-safe manipulation of worker flags.

Also, move `create_index` to avoid holes in the structure.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240507170002.2269003-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 91215f70ea85 ("io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()")
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28
# f32f810d 15-Apr-2024 Gabriel Krisman Bertazi <krisman@suse.de>

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active work") closed a race between a cancellation and the work
being removed from the wq for execution. To ensure the request is
always reachable by the cancellation, we need to move it within the wq
lock, which also synchronizes the cancellation. But commit
42abc95f05bf ("io-wq: decouple work_list protection from the big
wqe->lock") replaced the wq lock here and accidentally reintroduced the
race by releasing the acct_lock too early.

In other words:

worker | cancellation
work = io_get_next_work() |
raw_spin_unlock(&acct->lock); |
|
| io_acct_cancel_pending_work
| io_wq_worker_cancel()
worker->next_work = work

Using acct_lock is still enough since we synchronize on it on
io_acct_cancel_pending_work.

Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.35, v6.6.34, v6.6.33
# 1bbadf95 04-Jun-2024 Su Hui <suhui@nfschina.com>

io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()

[ Upstream commit 91215f70ea8541e9011c0b48f8b59b9e0ce6953b ]

Clang static checker (scan-build) warning:
o_uring/io-wq.c:line 1051,

io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()

[ Upstream commit 91215f70ea8541e9011c0b48f8b59b9e0ce6953b ]

Clang static checker (scan-build) warning:
o_uring/io-wq.c:line 1051, column 3
The expression is an uninitialized value. The computed value will
also be garbage.

'match.nr_pending' is used in io_acct_cancel_pending_work(), but it is
not fully initialized. Change the order of assignment for 'match' to fix
this problem.

Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock")
Signed-off-by: Su Hui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20240604121242.2661244-1-suhui@nfschina.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31
# ab702c34 07-May-2024 Breno Leitao <leitao@debian.org>

io_uring/io-wq: Use set_bit() and test_bit() at worker->flags

[ Upstream commit 8a565304927fbd28c9f028c492b5c1714002cbab ]

Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq
to

io_uring/io-wq: Use set_bit() and test_bit() at worker->flags

[ Upstream commit 8a565304927fbd28c9f028c492b5c1714002cbab ]

Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq
to address potential data races.

The structure io_worker->flags may be accessed through various data
paths, leading to concurrency issues. When KCSAN is enabled, it reveals
data races occurring in io_worker_handle_work and
io_wq_activate_free_worker functions.

BUG: KCSAN: data-race in io_worker_handle_work / io_wq_activate_free_worker
write to 0xffff8885c4246404 of 4 bytes by task 49071 on cpu 28:
io_worker_handle_work (io_uring/io-wq.c:434 io_uring/io-wq.c:569)
io_wq_worker (io_uring/io-wq.c:?)
<snip>

read to 0xffff8885c4246404 of 4 bytes by task 49024 on cpu 5:
io_wq_activate_free_worker (io_uring/io-wq.c:? io_uring/io-wq.c:285)
io_wq_enqueue (io_uring/io-wq.c:947)
io_queue_iowq (io_uring/io_uring.c:524)
io_req_task_submit (io_uring/io_uring.c:1511)
io_handle_tw_list (io_uring/io_uring.c:1198)
<snip>

Line numbers against commit 18daea77cca6 ("Merge tag 'for-linus' of
git://git.kernel.org/pub/scm/virt/kvm/kvm").

These races involve writes and reads to the same memory location by
different tasks running on different CPUs. To mitigate this, refactor
the code to use atomic operations such as set_bit(), test_bit(), and
clear_bit() instead of basic "and" and "or" operations. This ensures
thread-safe manipulation of worker flags.

Also, move `create_index` to avoid holes in the structure.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240507170002.2269003-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 91215f70ea85 ("io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()")
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28
# f32f810d 15-Apr-2024 Gabriel Krisman Bertazi <krisman@suse.de>

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active work") closed a race between a cancellation and the work
being removed from the wq for execution. To ensure the request is
always reachable by the cancellation, we need to move it within the wq
lock, which also synchronizes the cancellation. But commit
42abc95f05bf ("io-wq: decouple work_list protection from the big
wqe->lock") replaced the wq lock here and accidentally reintroduced the
race by releasing the acct_lock too early.

In other words:

worker | cancellation
work = io_get_next_work() |
raw_spin_unlock(&acct->lock); |
|
| io_acct_cancel_pending_work
| io_wq_worker_cancel()
worker->next_work = work

Using acct_lock is still enough since we synchronize on it on
io_acct_cancel_pending_work.

Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28
# f32f810d 15-Apr-2024 Gabriel Krisman Bertazi <krisman@suse.de>

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active work") closed a race between a cancellation and the work
being removed from the wq for execution. To ensure the request is
always reachable by the cancellation, we need to move it within the wq
lock, which also synchronizes the cancellation. But commit
42abc95f05bf ("io-wq: decouple work_list protection from the big
wqe->lock") replaced the wq lock here and accidentally reintroduced the
race by releasing the acct_lock too early.

In other words:

worker | cancellation
work = io_get_next_work() |
raw_spin_unlock(&acct->lock); |
|
| io_acct_cancel_pending_work
| io_wq_worker_cancel()
worker->next_work = work

Using acct_lock is still enough since we synchronize on it on
io_acct_cancel_pending_work.

Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28
# f32f810d 15-Apr-2024 Gabriel Krisman Bertazi <krisman@suse.de>

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active work") closed a race between a cancellation and the work
being removed from the wq for execution. To ensure the request is
always reachable by the cancellation, we need to move it within the wq
lock, which also synchronizes the cancellation. But commit
42abc95f05bf ("io-wq: decouple work_list protection from the big
wqe->lock") replaced the wq lock here and accidentally reintroduced the
race by releasing the acct_lock too early.

In other words:

worker | cancellation
work = io_get_next_work() |
raw_spin_unlock(&acct->lock); |
|
| io_acct_cancel_pending_work
| io_wq_worker_cancel()
worker->next_work = work

Using acct_lock is still enough since we synchronize on it on
io_acct_cancel_pending_work.

Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28
# f32f810d 15-Apr-2024 Gabriel Krisman Bertazi <krisman@suse.de>

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active work") closed a race between a cancellation and the work
being removed from the wq for execution. To ensure the request is
always reachable by the cancellation, we need to move it within the wq
lock, which also synchronizes the cancellation. But commit
42abc95f05bf ("io-wq: decouple work_list protection from the big
wqe->lock") replaced the wq lock here and accidentally reintroduced the
race by releasing the acct_lock too early.

In other words:

worker | cancellation
work = io_get_next_work() |
raw_spin_unlock(&acct->lock); |
|
| io_acct_cancel_pending_work
| io_wq_worker_cancel()
worker->next_work = work

Using acct_lock is still enough since we synchronize on it on
io_acct_cancel_pending_work.

Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28
# f32f810d 15-Apr-2024 Gabriel Krisman Bertazi <krisman@suse.de>

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active work") closed a race between a cancellation and the work
being removed from the wq for execution. To ensure the request is
always reachable by the cancellation, we need to move it within the wq
lock, which also synchronizes the cancellation. But commit
42abc95f05bf ("io-wq: decouple work_list protection from the big
wqe->lock") replaced the wq lock here and accidentally reintroduced the
race by releasing the acct_lock too early.

In other words:

worker | cancellation
work = io_get_next_work() |
raw_spin_unlock(&acct->lock); |
|
| io_acct_cancel_pending_work
| io_wq_worker_cancel()
worker->next_work = work

Using acct_lock is still enough since we synchronize on it on
io_acct_cancel_pending_work.

Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28
# f32f810d 15-Apr-2024 Gabriel Krisman Bertazi <krisman@suse.de>

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active

io-wq: write next_work before dropping acct_lock

[ Upstream commit 068c27e32e51e94e4a9eb30ae85f4097a3602980 ]

Commit 361aee450c6e ("io-wq: add intermediate work step between pending
list and active work") closed a race between a cancellation and the work
being removed from the wq for execution. To ensure the request is
always reachable by the cancellation, we need to move it within the wq
lock, which also synchronizes the cancellation. But commit
42abc95f05bf ("io-wq: decouple work_list protection from the big
wqe->lock") replaced the wq lock here and accidentally reintroduced the
race by releasing the acct_lock too early.

In other words:

worker | cancellation
work = io_get_next_work() |
raw_spin_unlock(&acct->lock); |
|
| io_acct_cancel_pending_work
| io_wq_worker_cancel()
worker->next_work = work

Using acct_lock is still enough since we synchronize on it on
io_acct_cancel_pending_work.

Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20240416021054.3940-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6
# 0f8baa3c 05-Oct-2023 Jeff Moyer <jmoyer@redhat.com>

io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls()

I received a bug report with the following signature:

[ 1759.937637] BUG: unable to handle page fault for address: ffff

io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls()

I received a bug report with the following signature:

[ 1759.937637] BUG: unable to handle page fault for address: ffffffffffffffe8
[ 1759.944564] #PF: supervisor read access in kernel mode
[ 1759.949732] #PF: error_code(0x0000) - not-present page
[ 1759.954901] PGD 7ab615067 P4D 7ab615067 PUD 7ab617067 PMD 0
[ 1759.960596] Oops: 0000 1 PREEMPT SMP PTI
[ 1759.964804] CPU: 15 PID: 109 Comm: cpuhp/15 Kdump: loaded Tainted: G X ------- — 5.14.0-362.3.1.el9_3.x86_64 #1
[ 1759.976609] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/20/2018
[ 1759.985181] RIP: 0010:io_wq_for_each_worker.isra.0+0x24/0xa0
[ 1759.990877] Code: 90 90 90 90 90 90 0f 1f 44 00 00 41 56 41 55 41 54 55 48 8d 6f 78 53 48 8b 47 78 48 39 c5 74 4f 49 89 f5 49 89 d4 48 8d 58 e8 <8b> 13 85 d2 74 32 8d 4a 01 89 d0 f0 0f b1 0b 75 5c 09 ca 78 3d 48
[ 1760.009758] RSP: 0000:ffffb6f403603e20 EFLAGS: 00010286
[ 1760.015013] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: 0000000000000000
[ 1760.022188] RDX: ffffb6f403603e50 RSI: ffffffffb11e95b0 RDI: ffff9f73b09e9400
[ 1760.029362] RBP: ffff9f73b09e9478 R08: 000000000000000f R09: 0000000000000000
[ 1760.036536] R10: ffffffffffffff00 R11: ffffb6f403603d80 R12: ffffb6f403603e50
[ 1760.043712] R13: ffffffffb11e95b0 R14: ffffffffb28531e8 R15: ffff9f7a6fbdf548
[ 1760.050887] FS: 0000000000000000(0000) GS:ffff9f7a6fbc0000(0000) knlGS:0000000000000000
[ 1760.059025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1760.064801] CR2: ffffffffffffffe8 CR3: 00000007ab610002 CR4: 00000000007706e0
[ 1760.071976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1760.079150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1760.086325] PKRU: 55555554
[ 1760.089044] Call Trace:
[ 1760.091501] <TASK>
[ 1760.093612] ? show_trace_log_lvl+0x1c4/0x2df
[ 1760.097995] ? show_trace_log_lvl+0x1c4/0x2df
[ 1760.102377] ? __io_wq_cpu_online+0x54/0xb0
[ 1760.106584] ? __die_body.cold+0x8/0xd
[ 1760.110356] ? page_fault_oops+0x134/0x170
[ 1760.114479] ? kernelmode_fixup_or_oops+0x84/0x110
[ 1760.119298] ? exc_page_fault+0xa8/0x150
[ 1760.123247] ? asm_exc_page_fault+0x22/0x30
[ 1760.127458] ? __pfx_io_wq_worker_affinity+0x10/0x10
[ 1760.132453] ? __pfx_io_wq_worker_affinity+0x10/0x10
[ 1760.137446] ? io_wq_for_each_worker.isra.0+0x24/0xa0
[ 1760.142527] __io_wq_cpu_online+0x54/0xb0
[ 1760.146558] cpuhp_invoke_callback+0x109/0x460
[ 1760.151029] ? __pfx_io_wq_cpu_offline+0x10/0x10
[ 1760.155673] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 1760.160320] cpuhp_thread_fun+0x8d/0x140
[ 1760.164266] smpboot_thread_fn+0xd3/0x1a0
[ 1760.168297] kthread+0xdd/0x100
[ 1760.171457] ? __pfx_kthread+0x10/0x10
[ 1760.175225] ret_from_fork+0x29/0x50
[ 1760.178826] </TASK>
[ 1760.181022] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common isst_if_common ipmi_ssif nfit libnvdimm mgag200 i2c_algo_bit ioatdma drm_shmem_helper drm_kms_helper acpi_ipmi syscopyarea x86_pkg_temp_thermal sysfillrect ipmi_si intel_powerclamp sysimgblt ipmi_devintf coretemp acpi_power_meter ipmi_msghandler rapl pcspkr dca intel_pch_thermal intel_cstate ses lpc_ich intel_uncore enclosure hpilo mei_me mei acpi_tad fuse drm xfs sd_mod sg bnx2x nvme nvme_core crct10dif_pclmul crc32_pclmul nvme_common ghash_clmulni_intel smartpqi tg3 t10_pi mdio uas libcrc32c crc32c_intel scsi_transport_sas usb_storage hpwdt wmi dm_mirror dm_region_hash dm_log dm_mod
[ 1760.248623] CR2: ffffffffffffffe8

A cpu hotplug callback was issued before wq->all_list was initialized.
This results in a null pointer dereference. The fix is to fully setup
the io_wq before calling cpuhp_state_add_instance_nocalls().

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Link: https://lore.kernel.org/r/x49y1ghnecs.fsf@segfault.boston.devel.redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


Revision tags: v6.5.5, v6.5.4, v6.5.3
# 45500dc4 07-Sep-2023 Pavel Begunkov <asml.silence@gmail.com>

io_uring: break out of iowq iopoll on teardown

io-wq will retry iopoll even when it failed with -EAGAIN. If that
races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers,
such workers

io_uring: break out of iowq iopoll on teardown

io-wq will retry iopoll even when it failed with -EAGAIN. If that
races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers,
such workers might potentially infinitely spin retrying iopoll again and
again and each time failing on some allocation / waiting / etc. Don't
keep spinning if io-wq is dying.

Fixes: 561fb04a6a225 ("io_uring: replace workqueue usage with io-wq")
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


Revision tags: v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46
# ebdfefc0 13-Aug-2023 Jens Axboe <axboe@kernel.dk>

io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used

If we setup the ring with SQPOLL, then that polling thread has its
own io-wq setup. This means that if the application uses
IORIN

io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used

If we setup the ring with SQPOLL, then that polling thread has its
own io-wq setup. This means that if the application uses
IORING_REGISTER_IOWQ_AFF to set the io-wq affinity, we should not be
setting it for the invoking task, but rather the sqpoll task.

Add an sqpoll helper that parks the thread and updates the affinity,
and use that one if we're using SQPOLL.

Fixes: fe76421d1da1 ("io_uring: allow user configurable IO thread CPU affinity")
Cc: stable@vger.kernel.org # 5.10+
Link: https://github.com/axboe/liburing/discussions/884
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


Revision tags: v6.1.45
# 22f7fb80 09-Aug-2023 Jens Axboe <axboe@kernel.dk>

io_uring/io-wq: don't gate worker wake up success on wake_up_process()

All we really care about is finding a free worker. If said worker is
already running, it's either starting new work already or

io_uring/io-wq: don't gate worker wake up success on wake_up_process()

All we really care about is finding a free worker. If said worker is
already running, it's either starting new work already or it's just
finishing up existing work. For the latter, we'll be finding this work
item next anyway, and for the former, if the worker does go to sleep,
it'll create a new worker anyway as we have pending items.

This reduces try_to_wake_up() overhead considerably:

23.16% -10.46% [kernel.kallsyms] [k] try_to_wake_up

Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# de36a15f 09-Aug-2023 Jens Axboe <axboe@kernel.dk>

io_uring/io-wq: reduce frequency of acct->lock acquisitions

When we check if we have work to run, we grab the acct lock, check,
drop it, and then return the result. If we do have work to run, then
r

io_uring/io-wq: reduce frequency of acct->lock acquisitions

When we check if we have work to run, we grab the acct lock, check,
drop it, and then return the result. If we do have work to run, then
running the work will again grab acct->lock and get the work item.

This causes us to grab acct->lock more frequently than we need to.
If we have work to do, have io_acct_run_queue() return with the acct
lock still acquired. io_worker_handle_work() is then always invoked
with the acct lock already held.

In a simple test cases that stats files (IORING_OP_STATX always hits
io-wq), we see a nice reduction in locking overhead with this change:

19.32% -12.55% [kernel.kallsyms] [k] __cmpwait_case_32
20.90% -12.07% [kernel.kallsyms] [k] queued_spin_lock_slowpath

Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 78848b9b 09-Aug-2023 Jens Axboe <axboe@kernel.dk>

io_uring/io-wq: don't grab wq->lock for worker activation

The worker free list is RCU protected, and checks for workers going away
when iterating it. There's no need to hold the wq->lock around the

io_uring/io-wq: don't grab wq->lock for worker activation

The worker free list is RCU protected, and checks for workers going away
when iterating it. There's no need to hold the wq->lock around the
lookup.

Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


Revision tags: v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34
# adeaa3f2 13-Jun-2023 Jens Axboe <axboe@kernel.dk>

io_uring/io-wq: clear current->worker_private on exit

A recent fix stopped clearing PF_IO_WORKER from current->flags on exit,
which meant that we can now call inc/dec running on the worker after it

io_uring/io-wq: clear current->worker_private on exit

A recent fix stopped clearing PF_IO_WORKER from current->flags on exit,
which meant that we can now call inc/dec running on the worker after it
has been removed if it ends up scheduling in/out as part of exit.

If this happens after an RCU grace period has passed, then the struct
pointed to by current->worker_private may have been freed, and we can
now be accessing memory that is freed.

Ensure this doesn't happen by clearing the task worker_private field.
Both io_wq_worker_running() and io_wq_worker_sleeping() check this
field before going any further, and we don't need any accounting etc
done after this worker has exited.

Fixes: fd37b884003c ("io_uring/io-wq: don't clear PF_IO_WORKER on exit")
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# fd37b884 12-Jun-2023 Jens Axboe <axboe@kernel.dk>

io_uring/io-wq: don't clear PF_IO_WORKER on exit

A recent commit gated the core dumping task exit logic on current->flags
remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time.
This

io_uring/io-wq: don't clear PF_IO_WORKER on exit

A recent commit gated the core dumping task exit logic on current->flags
remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time.
This exposed a problem with the io-wq handling of that, which explicitly
clears PF_IO_WORKER before calling do_exit().

The reasons for this manual clear of PF_IO_WORKER is historical, where
io-wq used to potentially trigger a sleep on exit. As the io-wq thread
is exiting, it should not participate any further accounting. But these
days we don't need to rely on current->flags anymore, so we can safely
remove the PF_IO_WORKER clearing.

Reported-by: Zorro Lang <zlang@redhat.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/all/ZIZSPyzReZkGBEFy@dread.disaster.area/
Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22
# 07d99096 27-Mar-2023 Jens Axboe <axboe@kernel.dk>

io_uring/io-wq: drop outdated comment

Since the move to PF_IO_WORKER, we don't juggle memory context manually
anymore. Remove that outdated part of the comment for __io_worker_idle().

Signed-off-by

io_uring/io-wq: drop outdated comment

Since the move to PF_IO_WORKER, we don't juggle memory context manually
anymore. Remove that outdated part of the comment for __io_worker_idle().

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


Revision tags: v6.1.21
# eb47943f 21-Mar-2023 Gabriel Krisman Bertazi <krisman@suse.de>

io-wq: Drop struct io_wqe

Since commit 0654b05e7e65 ("io_uring: One wqe per wq"), we have just a
single io_wqe instance embedded per io_wq. Drop the extra structure in
favor of accessing struct io_

io-wq: Drop struct io_wqe

Since commit 0654b05e7e65 ("io_uring: One wqe per wq"), we have just a
single io_wqe instance embedded per io_wq. Drop the extra structure in
favor of accessing struct io_wq directly, cleaning up quite a bit of
dereferences and backpointers.

No functional changes intended. Tested with liburing's testsuite
and mmtests performance microbenchmarks. I didn't observe any
performance regressions.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# dfd63baf 21-Mar-2023 Gabriel Krisman Bertazi <krisman@suse.de>

io-wq: Move wq accounting to io_wq

Since we now have a single io_wqe per io_wq instead of per-node, and in
preparation to its removal, move the accounting into the parent
structure.

Signed-off-by:

io-wq: Move wq accounting to io_wq

Since we now have a single io_wqe per io_wq instead of per-node, and in
preparation to its removal, move the accounting into the parent
structure.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


Revision tags: v6.1.20, v6.1.19, v6.1.18, v6.1.17
# da64d6db 10-Mar-2023 Breno Leitao <leitao@debian.org>

io_uring: One wqe per wq

Right now io_wq allocates one io_wqe per NUMA node. As io_wq is now
bound to a task, the task basically uses only the NUMA local io_wqe, and
almost never changes NUMA nodes

io_uring: One wqe per wq

Right now io_wq allocates one io_wqe per NUMA node. As io_wq is now
bound to a task, the task basically uses only the NUMA local io_wqe, and
almost never changes NUMA nodes, thus, the other wqes are mostly
unused.

Allocate just one io_wqe embedded into io_wq, and uses all possible cpus
(cpu_possible_mask) in the io_wqe->cpumask.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20230310201107.4020580-1-leitao@debian.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


Revision tags: v6.1.16
# 01e68ce0 08-Mar-2023 Jens Axboe <axboe@kernel.dk>

io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers

Every now and then reports come in that are puzzled on why changing
affinity on the io-wq workers fails with EINVAL. This happens beca

io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers

Every now and then reports come in that are puzzled on why changing
affinity on the io-wq workers fails with EINVAL. This happens because they
set PF_NO_SETAFFINITY as part of their creation, as io-wq organizes
workers into groups based on what CPU they are running on.

However, this is purely an optimization and not a functional requirement.
We can allow setting affinity, and just lazily update our worker to wqe
mappings. If a given io-wq thread times out, it normally exits if there's
no more work to do. The exception is if it's the last worker available.
For the timeout case, check the affinity of the worker against group mask
and exit even if it's the last worker. New workers should be created with
the right mask and in the right location.

Reported-by:Daniel Dao <dqminh@cloudflare.com>
Link: https://lore.kernel.org/io-uring/CA+wXwBQwgxB3_UphSny-yAP5b26meeOu1W4TwYVcD_+5gOhvPw@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


Revision tags: v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19
# e6db6f93 08-Jan-2023 Jens Axboe <axboe@kernel.dk>

io_uring/io-wq: only free worker if it was allocated for creation

We have two types of task_work based creation, one is using an existing
worker to setup a new one (eg when going to sleep and we hav

io_uring/io-wq: only free worker if it was allocated for creation

We have two types of task_work based creation, one is using an existing
worker to setup a new one (eg when going to sleep and we have no free
workers), and the other is allocating a new worker. Only the latter
should be freed when we cancel task_work creation for a new worker.

Fixes: af82425c6a2d ("io_uring/io-wq: free worker if task_work creation is canceled")
Reported-by: syzbot+d56ec896af3637bdb7e4@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


12