Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14 |
|
#
601b5540 |
| 23-Jan-2024 |
Jan Kara <jack@suse.cz> |
blk-wbt: Fix detection of dirty-throttled tasks
commit f814bdda774c183b0cc15ec8f3b6e7c6f4527ba5 upstream.
The detection of dirty-throttled tasks in blk-wbt has been subtly broken since its beginnin
blk-wbt: Fix detection of dirty-throttled tasks
commit f814bdda774c183b0cc15ec8f3b6e7c6f4527ba5 upstream.
The detection of dirty-throttled tasks in blk-wbt has been subtly broken since its beginning in 2016. Namely if we are doing cgroup writeback and the throttled task is not in the root cgroup, balance_dirty_pages() will set dirty_sleep for the non-root bdi_writeback structure. However blk-wbt checks dirty_sleep only in the root cgroup bdi_writeback structure. Thus detection of recently throttled tasks is not working in this case (we noticed this when we switched to cgroup v2 and suddently writeback was slow).
Since blk-wbt has no easy way to get to proper bdi_writeback and furthermore its intention has always been to work on the whole device rather than on individual cgroups, just move the dirty_sleep timestamp from bdi_writeback to backing_dev_info. That fixes the checking for recently throttled task and saves memory for everybody as a bonus.
CC: stable@vger.kernel.org Fixes: b57d74aff9ab ("writeback: track if we're sleeping on progress in balance_dirty_pages()") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240123175826.21452-1-jack@suse.cz [axboe: fixup indentation errors] Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31 |
|
#
06257fda |
| 26-May-2023 |
Yu Kuai <yukuai3@huawei.com> |
blk-wbt: cleanup rwb_enabled() and wbt_disabled()
'wb_normal' will set to 0 if 'min_lat_nsec' is 0, and 'min_lat_nsec' can only be set to 0 through sysfs configuration where 'WBT_STATE_OFF_MANUAL' i
blk-wbt: cleanup rwb_enabled() and wbt_disabled()
'wb_normal' will set to 0 if 'min_lat_nsec' is 0, and 'min_lat_nsec' can only be set to 0 through sysfs configuration where 'WBT_STATE_OFF_MANUAL' is set together, in the meantime, they can only be cleared together through sysfs afterwards. Hence 'wb_normal != 0' is the same as 'rwb->enable_state != WBT_STATE_OFF_MANUAL'.
The code is redundan, hence replace the checking of 'wb_normal' to 'enable_state' in rwb_enabled() and reuse rwb_enabled() for wbt_disabled().
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230527010644.647900-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
71b8642e |
| 26-May-2023 |
Yu Kuai <yukuai3@huawei.com> |
blk-wbt: remove dead code to handle wbt enable/disable with io inflight
enable or disable wbt is always called with queue freezed, so that wbt can never be enabled or disabled while io is still infl
blk-wbt: remove dead code to handle wbt enable/disable with io inflight
enable or disable wbt is always called with queue freezed, so that wbt can never be enabled or disabled while io is still inflight, and this behaviour should always hold to avoid io hang(There have been reported several times).
Therefor, the code to handle wbt enable/diskble with io inflight is not and never will be used, hence remove such dead code.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230527010644.647900-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25 |
|
#
a13bd91b |
| 14-Apr-2023 |
Yu Kuai <yukuai3@huawei.com> |
block/rq_qos: protect rq_qos apis with a new lock
commit 50e34d78815e ("block: disable the elevator int del_gendisk") move rq_qos_exit() from disk_release() to del_gendisk(), this will introduce som
block/rq_qos: protect rq_qos apis with a new lock
commit 50e34d78815e ("block: disable the elevator int del_gendisk") move rq_qos_exit() from disk_release() to del_gendisk(), this will introduce some problems:
1) If rq_qos_add() is triggered by enabling iocost/iolatency through cgroupfs, then it can concurrent with del_gendisk(), it's not safe to write 'q->rq_qos' concurrently.
2) Activate cgroup policy that is relied on rq_qos will call rq_qos_add() and blkcg_activate_policy(), and if rq_qos_exit() is called in the middle, null-ptr-dereference will be triggered in blkcg_activate_policy().
3) blkg_conf_open_bdev() can call blkdev_get_no_open() first to find the disk, then if rq_qos_exit() from del_gendisk() is done before rq_qos_add(), then memory will be leaked.
This patch add a new disk level mutex 'rq_qos_mutex':
1) The lock will protect rq_qos_exit() directly.
2) For wbt that doesn't relied on blk-cgroup, rq_qos_add() can only be called from disk initialization for now because wbt can't be destructed until rq_qos_exit(), so it's safe not to protect wbt for now. Hoever, in case that rq_qos dynamically destruction is supported in the furture, this patch also protect rq_qos_add() from wbt_init() directly, this is enough because blk-sysfs already synchronize writers with disk removal.
3) For iocost and iolatency, in order to synchronize disk removal and cgroup configuration, the lock is held after blkdev_get_no_open() from blkg_conf_open_bdev(), and is released in blkg_conf_exit(). In order to fix the above memory leak, disk_live() is checked after holding the new lock.
Fixes: 50e34d78815e ("block: disable the elevator int del_gendisk") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230414084008.2085155-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
8a2b20a9 |
| 22-May-2023 |
Yu Kuai <yukuai3@huawei.com> |
blk-wbt: fix that wbt can't be disabled by default
commit b11d31ae01e6 ("blk-wbt: remove unnecessary check in wbt_enable_default()") removes the checking of CONFIG_BLK_WBT_MQ by mistake, which is us
blk-wbt: fix that wbt can't be disabled by default
commit b11d31ae01e6 ("blk-wbt: remove unnecessary check in wbt_enable_default()") removes the checking of CONFIG_BLK_WBT_MQ by mistake, which is used to control enable or disable wbt by default.
Fix the problem by adding back the checking. This patch also do a litter cleanup to make related code more readable.
Fixes: b11d31ae01e6 ("blk-wbt: remove unnecessary check in wbt_enable_default()") Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/lkml/CAKXUXMzfKq_J9nKHGyr5P5rvUETY4B-fxoQD4sO+NYjFOfVtZA@mail.gmail.com/t/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230522121854.2928880-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10 |
|
#
ba91c849 |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos
This is what about half of the users already want, and it's only going to grow more.
Signed-off-by: Christoph Hellwig <hch@lst.
blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos
This is what about half of the users already want, and it's only going to grow more.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
3963d84d |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-rq-qos: constify rq_qos_ops
These op vectors are constant, so mark them const.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun He
blk-rq-qos: constify rq_qos_ops
These op vectors are constant, so mark them const.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
ce57b558 |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-rq-qos: make rq_qos_add and rq_qos_del more useful
Switch to passing a gendisk, and make rq_qos_add initialize all required fields and drop the not required q argument from rq_qos_del.
Signed-o
blk-rq-qos: make rq_qos_add and rq_qos_del more useful
Switch to passing a gendisk, and make rq_qos_add initialize all required fields and drop the not required q argument from rq_qos_del.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
4e1d91ae |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-wbt: open code wbt_queue_depth_changed in wbt_init
wbt_queue_depth_changed just updates a field and calls another function. Open code it in wbt_init, so that the local queue variable can be used
blk-wbt: open code wbt_queue_depth_changed in wbt_init
wbt_queue_depth_changed just updates a field and calls another function. Open code it in wbt_init, so that the local queue variable can be used instead of the one stored in the rq_qos. This will allow delaying that rq_qos->queue assignment in a subsequent patch.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0bc65bd4 |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-wbt: move private information from blk-wbt.h to blk-wbt.c
A large part of blk-wbt.h is only used in blk-wbt.c, so move it there.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun He
blk-wbt: move private information from blk-wbt.h to blk-wbt.c
A large part of blk-wbt.h is only used in blk-wbt.c, so move it there.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
958f2965 |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-wbt: pass a gendisk to wbt_init
Pass a gendisk to wbt_init to prepare for phasing out usage of the request_queue in the blk-cgroup code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-b
blk-wbt: pass a gendisk to wbt_init
Pass a gendisk to wbt_init to prepare for phasing out usage of the request_queue in the blk-cgroup code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
04aad37b |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-wbt: pass a gendisk to wbt_{enable,disable}_default
Pass a gendisk to wbt_enable_default and wbt_disable_default to prepare for phasing out usage of the request_queue in the blk-cgroup code.
Si
blk-wbt: pass a gendisk to wbt_{enable,disable}_default
Pass a gendisk to wbt_enable_default and wbt_disable_default to prepare for phasing out usage of the request_queue in the blk-cgroup code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3 |
|
#
671fae5e |
| 19-Oct-2022 |
Yu Kuai <yukuai3@huawei.com> |
blk-wbt: don't enable throttling if default elevator is bfq
Commit b5dc5d4d1f4f ("block,bfq: Disable writeback throttling") tries to disable wbt for bfq, it's done by calling wbt_disable_default() i
blk-wbt: don't enable throttling if default elevator is bfq
Commit b5dc5d4d1f4f ("block,bfq: Disable writeback throttling") tries to disable wbt for bfq, it's done by calling wbt_disable_default() in bfq_init_queue(). However, wbt is still enabled if default elevator is bfq:
device_add_disk elevator_init_mq bfq_init_queue wbt_disable_default -> done nothing
blk_register_queue wbt_enable_default -> wbt is enabled
Fix the problem by adding a new flag ELEVATOR_FLAG_DISBALE_WBT, bfq will set the flag in bfq_init_queue, and following wbt_enable_default() won't enable wbt while the flag is set.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221019121518.3865235-7-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
3642ef4d |
| 19-Oct-2022 |
Yu Kuai <yukuai3@huawei.com> |
blk-wbt: don't show valid wbt_lat_usec in sysfs while wbt is disabled
Currently, if wbt is initialized and then disabled by wbt_disable_default(), sysfs will still show valid wbt_lat_usec, which wil
blk-wbt: don't show valid wbt_lat_usec in sysfs while wbt is disabled
Currently, if wbt is initialized and then disabled by wbt_disable_default(), sysfs will still show valid wbt_lat_usec, which will confuse users that wbt is still enabled.
This patch shows wbt_lat_usec as zero if it's disabled.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reported-and-tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221019121518.3865235-5-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
a9a236d2 |
| 19-Oct-2022 |
Yu Kuai <yukuai3@huawei.com> |
blk-wbt: make enable_state more accurate
Currently, if user disable wbt through sysfs, 'enable_state' will be 'WBT_STATE_ON_MANUAL', which will be confusing. Add a new state 'WBT_STATE_OFF_MANUAL' t
blk-wbt: make enable_state more accurate
Currently, if user disable wbt through sysfs, 'enable_state' will be 'WBT_STATE_ON_MANUAL', which will be confusing. Add a new state 'WBT_STATE_OFF_MANUAL' to cover that case.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221019121518.3865235-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
b11d31ae |
| 19-Oct-2022 |
Yu Kuai <yukuai3@huawei.com> |
blk-wbt: remove unnecessary check in wbt_enable_default()
If CONFIG_BLK_WBT_MQ is disabled, wbt_init() won't do anything.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig
blk-wbt: remove unnecessary check in wbt_enable_default()
If CONFIG_BLK_WBT_MQ is disabled, wbt_init() won't do anything.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221019121518.3865235-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.0.2, v5.15.74, v5.15.73, v6.0.1 |
|
#
285febab |
| 09-Oct-2022 |
Yu Kuai <yukuai3@huawei.com> |
blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init()
commit 8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is initialized") moves wbt_set_write_cache() before rq_qos_add(), which
blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init()
commit 8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is initialized") moves wbt_set_write_cache() before rq_qos_add(), which is wrong because wbt_rq_qos() is still NULL.
Fix the problem by removing wbt_set_write_cache() and setting 'rwb->wc' directly. Noted that this patch also remove the redundant setting of 'rab->wc'.
Fixes: 8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is initialized") Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20221009101038.1692875-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68 |
|
#
8c5035df |
| 13-Sep-2022 |
Yu Kuai <yukuai3@huawei.com> |
blk-wbt: call rq_qos_add() after wb_normal is initialized
Our test found a problem that wbt inflight counter is negative, which will cause io hang(noted that this problem doesn't exist in mainline):
blk-wbt: call rq_qos_add() after wb_normal is initialized
Our test found a problem that wbt inflight counter is negative, which will cause io hang(noted that this problem doesn't exist in mainline):
t1: device create t2: issue io add_disk blk_register_queue wbt_enable_default wbt_init rq_qos_add // wb_normal is still 0 /* * in mainline, disk can't be opened before * bdev_add(), however, in old kernels, disk * can be opened before blk_register_queue(). */ blkdev_issue_flush // disk size is 0, however, it's not checked submit_bio_wait submit_bio blk_mq_submit_bio rq_qos_throttle wbt_wait bio_to_wbt_flags rwb_enabled // wb_normal is 0, inflight is not increased
wbt_queue_depth_changed(&rwb->rqos); wbt_update_limits // wb_normal is initialized rq_qos_track wbt_track rq->wbt_flags |= bio_to_wbt_flags(rwb, bio); // wb_normal is not 0,wbt_flags will be set t3: io completion blk_mq_free_request rq_qos_done wbt_done wbt_is_tracked // return true __wbt_done wbt_rqw_done atomic_dec_return(&rqw->inflight); // inflight is decreased
commit 8235b5c1e8c1 ("block: call bdev_add later in device_add_disk") can avoid this problem, however it's better to fix this problem in wbt:
1) Lower kernel can't backport this patch due to lots of refactor. 2) Root cause is that wbt call rq_qos_add() before wb_normal is initialized.
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism") Cc: <stable@vger.kernel.org> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20220913105749.3086243-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56 |
|
#
14a6e2eb |
| 20-Jul-2022 |
Jinke Han <hanjinke.666@bytedance.com> |
block: don't allow the same type rq_qos add more than once
In our test of iocost, we encountered some list add/del corruptions of inner_walk list in ioc_timer_fn.
The reason can be described as fol
block: don't allow the same type rq_qos add more than once
In our test of iocost, we encountered some list add/del corruptions of inner_walk list in ioc_timer_fn.
The reason can be described as follows:
cpu 0 cpu 1 ioc_qos_write ioc_qos_write
ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ... rq_qos_add(q, rqos); } ... rq_qos_add(q, rqos); ... }
When the io.cost.qos file is written by two cpus concurrently, rq_qos may be added to one disk twice. In that case, there will be two iocs enabled and running on one disk. They own different iocgs on their active list. In the ioc_timer_fn function, because of the iocgs from two iocs have the same root iocg, the root iocg's walk_list may be overwritten by each other and this leads to list add/del corruptions in building or destroying the inner_walk list.
And so far, the blk-rq-qos framework works in case that one instance for one type rq_qos per queue by default. This patch make this explicit and also fix the crash above.
Signed-off-by: Jinke Han <hanjinke.666@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220720093616.70584-1-hanjinke.666@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.55 |
|
#
16458cf3 |
| 14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
block: Use the new blk_opf_t type
Use the new blk_opf_t type for arguments and variables that represent request flags or a bitwise combination of a request operation and request flags. Rename the fu
block: Use the new blk_opf_t type
Use the new blk_opf_t type for arguments and variables that represent request flags or a bitwise combination of a request operation and request flags. Rename the function arguments and also a structure member that hold a request operation and flags from 'rw' into 'opf'.
This patch does not change any functionality.
Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-7-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
77e7ffd7 |
| 14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
block: Use enum req_op where appropriate
Change the type of the arguments that are used to pass a REQ_OP_* value from int or unsigned int into enum req_op to improve static type checking.
Cc: Chris
block: Use enum req_op where appropriate
Change the type of the arguments that are used to pass a REQ_OP_* value from int or unsigned int into enum req_op to improve static type checking.
Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14 |
|
#
480d42dc |
| 19-Oct-2021 |
Andrea Righi <andrea.righi@canonical.com> |
blk-wbt: prevent NULL pointer dereference in wb_timer_fn
The timer callback used to evaluate if the latency is exceeded can be executed after the corresponding disk has been released, causing the fo
blk-wbt: prevent NULL pointer dereference in wb_timer_fn
The timer callback used to evaluate if the latency is exceeded can be executed after the corresponding disk has been released, causing the following NULL pointer dereference:
[ 119.987108] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ 119.987617] #PF: supervisor read access in kernel mode [ 119.987971] #PF: error_code(0x0000) - not-present page [ 119.988325] PGD 7c4a4067 P4D 7c4a4067 PUD 7bf63067 PMD 0 [ 119.988697] Oops: 0000 [#1] SMP NOPTI [ 119.988959] CPU: 1 PID: 9353 Comm: cloud-init Not tainted 5.15-rc5+arighi #rc5+arighi [ 119.989520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 [ 119.990055] RIP: 0010:wb_timer_fn+0x44/0x3c0 [ 119.990376] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00 [ 119.991578] RSP: 0000:ffffb5f580957da8 EFLAGS: 00010246 [ 119.991937] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 [ 119.992412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88f476d7f780 [ 119.992895] RBP: ffffb5f580957dd0 R08: 0000000000000000 R09: 0000000000000000 [ 119.993371] R10: 0000000000000004 R11: 0000000000000002 R12: ffff88f476c84500 [ 119.993847] R13: ffff88f4434390c0 R14: 0000000000000000 R15: ffff88f4bdc98c00 [ 119.994323] FS: 00007fb90bcd9c00(0000) GS:ffff88f4bdc80000(0000) knlGS:0000000000000000 [ 119.994952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 119.995380] CR2: 0000000000000098 CR3: 000000007c0d6000 CR4: 00000000000006e0 [ 119.995906] Call Trace: [ 119.996130] ? blk_stat_free_callback_rcu+0x30/0x30 [ 119.996505] blk_stat_timer_fn+0x138/0x140 [ 119.996830] call_timer_fn+0x2b/0x100 [ 119.997136] __run_timers.part.0+0x1d1/0x240 [ 119.997470] ? kvm_clock_get_cycles+0x11/0x20 [ 119.997826] ? ktime_get+0x3e/0xa0 [ 119.998110] ? native_apic_msr_write+0x2c/0x30 [ 119.998456] ? lapic_next_event+0x20/0x30 [ 119.998779] ? clockevents_program_event+0x94/0xf0 [ 119.999150] run_timer_softirq+0x2a/0x50 [ 119.999465] __do_softirq+0xcb/0x26f [ 119.999764] irq_exit_rcu+0x8c/0xb0 [ 120.000057] sysvec_apic_timer_interrupt+0x43/0x90 [ 120.000429] ? asm_sysvec_apic_timer_interrupt+0xa/0x20 [ 120.000836] asm_sysvec_apic_timer_interrupt+0x12/0x20
In this case simply return from the timer callback (no action required) to prevent the NULL pointer dereference.
BugLink: https://bugs.launchpad.net/bugs/1947557 Link: https://lore.kernel.org/linux-mm/YWRNVTk9N8K0RMst@arighi-desktop/ Fixes: 34dbad5d26e2 ("blk-stat: convert to callback-based statistics reporting") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Link: https://lore.kernel.org/r/YW6N2qXpBU3oc50q@arighi-desktop Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
d7eadffc |
| 09-Oct-2022 |
Yu Kuai <yukuai3@huawei.com> |
blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init()
commit 285febabac4a16655372d23ff43e89ff6f216691 upstream.
commit 8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is initialize
blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init()
commit 285febabac4a16655372d23ff43e89ff6f216691 upstream.
commit 8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is initialized") moves wbt_set_write_cache() before rq_qos_add(), which is wrong because wbt_rq_qos() is still NULL.
Fix the problem by removing wbt_set_write_cache() and setting 'rwb->wc' directly. Noted that this patch also remove the redundant setting of 'rab->wc'.
Fixes: 8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is initialized") Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20221009101038.1692875-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
e3e5baa3 |
| 13-Sep-2022 |
Yu Kuai <yukuai3@huawei.com> |
blk-wbt: call rq_qos_add() after wb_normal is initialized
commit 8c5035dfbb9475b67c82b3fdb7351236525bf52b upstream.
Our test found a problem that wbt inflight counter is negative, which will cause
blk-wbt: call rq_qos_add() after wb_normal is initialized
commit 8c5035dfbb9475b67c82b3fdb7351236525bf52b upstream.
Our test found a problem that wbt inflight counter is negative, which will cause io hang(noted that this problem doesn't exist in mainline):
t1: device create t2: issue io add_disk blk_register_queue wbt_enable_default wbt_init rq_qos_add // wb_normal is still 0 /* * in mainline, disk can't be opened before * bdev_add(), however, in old kernels, disk * can be opened before blk_register_queue(). */ blkdev_issue_flush // disk size is 0, however, it's not checked submit_bio_wait submit_bio blk_mq_submit_bio rq_qos_throttle wbt_wait bio_to_wbt_flags rwb_enabled // wb_normal is 0, inflight is not increased
wbt_queue_depth_changed(&rwb->rqos); wbt_update_limits // wb_normal is initialized rq_qos_track wbt_track rq->wbt_flags |= bio_to_wbt_flags(rwb, bio); // wb_normal is not 0,wbt_flags will be set t3: io completion blk_mq_free_request rq_qos_done wbt_done wbt_is_tracked // return true __wbt_done wbt_rqw_done atomic_dec_return(&rqw->inflight); // inflight is decreased
commit 8235b5c1e8c1 ("block: call bdev_add later in device_add_disk") can avoid this problem, however it's better to fix this problem in wbt:
1) Lower kernel can't backport this patch due to lots of refactor. 2) Root cause is that wbt call rq_qos_add() before wb_normal is initialized.
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism") Cc: <stable@vger.kernel.org> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20220913105749.3086243-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
0b7f5d7a |
| 20-Jul-2022 |
Jinke Han <hanjinke.666@bytedance.com> |
block: don't allow the same type rq_qos add more than once
[ Upstream commit 14a6e2eb7df5c7897c15b109cba29ab0c4a791b6 ]
In our test of iocost, we encountered some list add/del corruptions of inner_
block: don't allow the same type rq_qos add more than once
[ Upstream commit 14a6e2eb7df5c7897c15b109cba29ab0c4a791b6 ]
In our test of iocost, we encountered some list add/del corruptions of inner_walk list in ioc_timer_fn.
The reason can be described as follows:
cpu 0 cpu 1 ioc_qos_write ioc_qos_write
ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ... rq_qos_add(q, rqos); } ... rq_qos_add(q, rqos); ... }
When the io.cost.qos file is written by two cpus concurrently, rq_qos may be added to one disk twice. In that case, there will be two iocs enabled and running on one disk. They own different iocgs on their active list. In the ioc_timer_fn function, because of the iocgs from two iocs have the same root iocg, the root iocg's walk_list may be overwritten by each other and this leads to list add/del corruptions in building or destroying the inner_walk list.
And so far, the blk-rq-qos framework works in case that one instance for one type rq_qos per queue by default. This patch make this explicit and also fix the crash above.
Signed-off-by: Jinke Han <hanjinke.666@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220720093616.70584-1-hanjinke.666@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|