#
ed454332 |
| 05-Jan-2025 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.69' into for/openbmc/dev-6.6
This is the 6.6.69 stable release
|
Revision tags: v6.6.69, v6.6.68, v6.6.67 |
|
#
ee18012c |
| 18-Dec-2024 |
Ming Lei <ming.lei@redhat.com> |
block: avoid to reuse `hctx` not removed from cpuhp callback list
commit 85672ca9ceeaa1dcf2777a7048af5f4aee3fd02b upstream.
If the 'hctx' isn't removed from cpuhp callback list, we can't reuse it,
block: avoid to reuse `hctx` not removed from cpuhp callback list
commit 85672ca9ceeaa1dcf2777a7048af5f4aee3fd02b upstream.
If the 'hctx' isn't removed from cpuhp callback list, we can't reuse it, otherwise use-after-free may be triggered.
Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202412172217.b906db7c-lkp@intel.com Tested-by: kernel test robot <oliver.sang@intel.com> Fixes: 22465bbac53c ("blk-mq: move cpuhp callback registering out of q->sysfs_lock") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241218101617.3275704-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.66, v6.6.65, v6.6.64 |
|
#
58bf9358 |
| 06-Dec-2024 |
Ming Lei <ming.lei@redhat.com> |
blk-mq: move cpuhp callback registering out of q->sysfs_lock
[ Upstream commit 22465bbac53c821319089016f268a2437de9b00a ]
Registering and unregistering cpuhp callback requires global cpu hotplug lo
blk-mq: move cpuhp callback registering out of q->sysfs_lock
[ Upstream commit 22465bbac53c821319089016f268a2437de9b00a ]
Registering and unregistering cpuhp callback requires global cpu hotplug lock, which is used everywhere. Meantime q->sysfs_lock is used in block layer almost everywhere.
It is easy to trigger lockdep warning[1] by connecting the two locks.
Fix the warning by moving blk-mq's cpuhp callback registering out of q->sysfs_lock. Add one dedicated global lock for covering registering & unregistering hctx's cpuhp, and it is safe to do so because hctx is guaranteed to be live if our request_queue is live.
[1] https://lore.kernel.org/lkml/Z04pz3AlvI4o0Mr8@agluck-desk3/
Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Newman <peternewman@google.com> Cc: Babu Moger <babu.moger@amd.com> Reported-by: Luck Tony <tony.luck@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20241206111611.978870-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
079fcc92 |
| 06-Dec-2024 |
Ming Lei <ming.lei@redhat.com> |
blk-mq: register cpuhp callback after hctx is added to xarray table
[ Upstream commit 4bf485a7db5d82ddd0f3ad2b299893199090375e ]
We need to retrieve 'hctx' from xarray table in the cpuhp callback,
blk-mq: register cpuhp callback after hctx is added to xarray table
[ Upstream commit 4bf485a7db5d82ddd0f3ad2b299893199090375e ]
We need to retrieve 'hctx' from xarray table in the cpuhp callback, so the callback should be registered after this 'hctx' is added to xarray table.
Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Newman <peternewman@google.com> Cc: Babu Moger <babu.moger@amd.com> Cc: Luck Tony <tony.luck@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20241206111611.978870-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
ecc23d0a |
| 09-Dec-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.64' into for/openbmc/dev-6.6
This is the 6.6.64 stable release
|
Revision tags: v6.6.63, v6.6.62, v6.6.61, v6.6.60, v6.6.59 |
|
#
68a69ed5 |
| 22-Oct-2024 |
Bart Van Assche <bvanassche@acm.org> |
blk-mq: Make blk_mq_quiesce_tagset() hold the tag list mutex less long
commit ccd9e252c515ac5a3ed04a414c95d1307d17f159 upstream.
Make sure that the tag_list_lock mutex is not held any longer than n
blk-mq: Make blk_mq_quiesce_tagset() hold the tag list mutex less long
commit ccd9e252c515ac5a3ed04a414c95d1307d17f159 upstream.
Make sure that the tag_list_lock mutex is not held any longer than necessary. This change reduces latency if e.g. blk_mq_quiesce_tagset() is called concurrently from more than one thread. This function is used by the NVMe core and also by the UFS driver.
Reported-by: Peter Wang <peter.wang@mediatek.com> Cc: Chao Leng <lengchao@huawei.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: stable@vger.kernel.org Fixes: 414dd48e882c ("blk-mq: add tagset quiesce interface") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20241022181617.2716173-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.58, v6.6.57 |
|
#
e95080fb |
| 14-Oct-2024 |
Muchun Song <songmuchun@bytedance.com> |
block: fix ordering between checking BLK_MQ_S_STOPPED request adding
commit 96a9fe64bfd486ebeeacf1e6011801ffe89dae18 upstream.
Supposing first scenario with a virtio_blk driver.
CPU0
block: fix ordering between checking BLK_MQ_S_STOPPED request adding
commit 96a9fe64bfd486ebeeacf1e6011801ffe89dae18 upstream.
Supposing first scenario with a virtio_blk driver.
CPU0 CPU1
blk_mq_try_issue_directly() __blk_mq_issue_directly() q->mq_ops->queue_rq() virtio_queue_rq() blk_mq_stop_hw_queue() virtblk_done() blk_mq_request_bypass_insert() 1) store blk_mq_start_stopped_hw_queue() clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_sched_dispatch_requests() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) return blk_mq_sched_dispatch_requests() if (blk_mq_hctx_stopped()) 2) load return __blk_mq_sched_dispatch_requests()
Supposing another scenario.
CPU0 CPU1
blk_mq_requeue_work() blk_mq_insert_request() 1) store virtblk_done() blk_mq_start_stopped_hw_queue() blk_mq_run_hw_queues() clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_sched_dispatch_requests() if (blk_mq_hctx_stopped()) 2) load continue blk_mq_run_hw_queue()
Both scenarios are similar, the full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees BLK_MQ_S_STOPPED is cleared or CPU1 sees dispatch list. Otherwise, either CPU will not rerun the hardware queue causing starvation of the request.
The easy way to fix it is to add the essential full memory barrier into helper of blk_mq_hctx_stopped(). In order to not affect the fast path (hardware queue is not stopped most of the time), we only insert the barrier into the slow path. Actually, only slow path needs to care about missing of dispatching the request to the low-level device driver.
Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism") Cc: stable@vger.kernel.org Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241014092934.53630-4-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
679b1874 |
| 14-Oct-2024 |
Muchun Song <songmuchun@bytedance.com> |
block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding
commit 6bda857bcbb86fb9d0e54fbef93a093d51172acc upstream.
Supposing the following scenario.
CPU0 CPU1
block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding
commit 6bda857bcbb86fb9d0e54fbef93a093d51172acc upstream.
Supposing the following scenario.
CPU0 CPU1
blk_mq_insert_request() 1) store blk_mq_unquiesce_queue() blk_queue_flag_clear() 3) store blk_mq_run_hw_queues() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_run_hw_queue() if (blk_queue_quiesced()) 2) load return blk_mq_sched_dispatch_requests()
The full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is cleared or CPU1 sees dispatch list or setting of bitmap of software queue. Otherwise, either CPU will not rerun the hardware queue causing starvation.
So the first solution is to 1) add a pair of memory barrier to fix the problem, another solution is to 2) use hctx->queue->queue_lock to synchronize QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not easy to be maintained.
Fixes: f4560ffe8cec ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue") Cc: stable@vger.kernel.org Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241014092934.53630-3-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
fe0d9800 |
| 14-Oct-2024 |
Muchun Song <songmuchun@bytedance.com> |
block: fix missing dispatching request when queue is started or unquiesced
commit 2003ee8a9aa14d766b06088156978d53c2e9be3d upstream.
Supposing the following scenario with a virtio_blk driver.
CPU0
block: fix missing dispatching request when queue is started or unquiesced
commit 2003ee8a9aa14d766b06088156978d53c2e9be3d upstream.
Supposing the following scenario with a virtio_blk driver.
CPU0 CPU1 CPU2
blk_mq_try_issue_directly() __blk_mq_issue_directly() q->mq_ops->queue_rq() virtio_queue_rq() blk_mq_stop_hw_queue() virtblk_done() blk_mq_try_issue_directly() if (blk_mq_hctx_stopped()) blk_mq_request_bypass_insert() blk_mq_run_hw_queue() blk_mq_run_hw_queue() blk_mq_run_hw_queue() blk_mq_insert_request() return
After CPU0 has marked the queue as stopped, CPU1 will see the queue is stopped. But before CPU1 puts the request on the dispatch list, CPU2 receives the interrupt of completion of request, so it will run the hardware queue and marks the queue as non-stopped. Meanwhile, CPU1 also runs the same hardware queue. After both CPU1 and CPU2 complete blk_mq_run_hw_queue(), CPU1 just puts the request to the same hardware queue and returns. It misses dispatching a request. Fix it by running the hardware queue explicitly. And blk_mq_request_issue_directly() should handle a similar situation. Fix it as well.
Fixes: d964f04a8fde ("blk-mq: fix direct issue") Cc: stable@vger.kernel.org Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241014092934.53630-2-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.56, v6.6.55, v6.6.54 |
|
#
07588a58 |
| 30-Sep-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.53' into for/openbmc/dev-6.6
This is the 6.6.53 stable release
|
Revision tags: v6.6.53, v6.6.52, v6.6.51, v6.6.50, v6.6.49, v6.6.48, v6.6.47, v6.6.46, v6.6.45, v6.6.44, v6.6.43, v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37, v6.6.36, v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15 |
|
#
c20e89c9 |
| 30-Jan-2024 |
Hongyu Jin <hongyu.jin@unisoc.com> |
block: Fix where bio IO priority gets set
[ Upstream commit f3c89983cb4fc00be64eb0d5cbcfcdf2cacb965e ]
Commit 82b74cac2849 ("blk-ioprio: Convert from rqos policy to direct call") pushed setting bio
block: Fix where bio IO priority gets set
[ Upstream commit f3c89983cb4fc00be64eb0d5cbcfcdf2cacb965e ]
Commit 82b74cac2849 ("blk-ioprio: Convert from rqos policy to direct call") pushed setting bio I/O priority down into blk_mq_submit_bio() -- which is too low within block core's submit_bio() because it skips setting I/O priority for block drivers that implement fops->submit_bio() (e.g. DM, MD, etc).
Fix this by moving bio_set_ioprio() up from blk-mq.c to blk-core.c and call it from submit_bio(). This ensures all block drivers call bio_set_ioprio() during initial bio submission.
Fixes: a78418e6a04c ("block: Always initialize bio IO priority on submit") Co-developed-by: Yibin Ding <yibin.ding@unisoc.com> Signed-off-by: Yibin Ding <yibin.ding@unisoc.com> Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> [snitzer: revised commit header] Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240130202638.62600-2-snitzer@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7e24a55b |
| 04-Aug-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.44' into for/openbmc/dev-6.6
This is the 6.6.44 stable release
|
#
b2c67e1f |
| 09-May-2024 |
Bart Van Assche <bvanassche@acm.org> |
block: Call .limit_depth() after .hctx has been set
[ Upstream commit 6259151c04d4e0085e00d2dcb471ebdd1778e72e ]
Call .limit_depth() after data->hctx has been set such that data->hctx can be used i
block: Call .limit_depth() after .hctx has been set
[ Upstream commit 6259151c04d4e0085e00d2dcb471ebdd1778e72e ]
Call .limit_depth() after data->hctx has been set such that data->hctx can be used in .limit_depth() implementations.
Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Zhiguo Niu <zhiguo.niu@unisoc.com> Fixes: 07757588e507 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240509170149.7639-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
b181f702 |
| 12-Jun-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.33' into dev-6.6
This is the 6.6.33 stable release
|
#
e5d98cc3 |
| 09-May-2024 |
Yu Kuai <yukuai3@huawei.com> |
block: support to account io_ticks precisely
[ Upstream commit 99dc422335d8b2bd4d105797241d3e715bae90e9 ]
Currently, io_ticks is accounted based on sampling, specifically update_io_ticks() will alw
block: support to account io_ticks precisely
[ Upstream commit 99dc422335d8b2bd4d105797241d3e715bae90e9 ]
Currently, io_ticks is accounted based on sampling, specifically update_io_ticks() will always account io_ticks by 1 jiffies from bdev_start_io_acct()/blk_account_io_start(), and the result can be inaccurate, for example(HZ is 250):
Test script: fio -filename=/dev/sda -bs=4k -rw=write -direct=1 -name=test -thinktime=4ms
Test result: util is about 90%, while the disk is really idle.
This behaviour is introduced by commit 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting"), however, there was a key point that is missed that this patch also improve performance a lot:
Before the commit: part_round_stats: if (part->stamp != now) stats |= 1;
part_in_flight() -> there can be lots of task here in 1 jiffies. part_round_stats_single() __part_stat_add() part->stamp = now;
After the commit: update_io_ticks: stamp = part->bd_stamp; if (time_after(now, stamp)) if (try_cmpxchg()) __part_stat_add() -> only one task can reach here in 1 jiffies.
Hence in order to account io_ticks precisely, we only need to know if there are IO inflight at most once in one jiffies. Noted that for rq-based device, iterating tags should not be used here because 'tags->lock' is grabbed in blk_mq_find_and_get_req(), hence part_stat_lock_inc/dec() and part_in_flight() is used to trace inflight. The additional overhead is quite little:
- per cpu add/dec for each IO for rq-based device; - per cpu sum for each jiffies;
And it's verified by null-blk that there are no performance degration under heavy IO pressure.
Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240509123717.3223892-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
46eeaa11 |
| 03-Apr-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.24' into dev-6.6
This is the 6.6.24 stable release
|
#
cc80b5d7 |
| 27-Mar-2024 |
Damien Le Moal <dlemoal@kernel.org> |
block: Do not force full zone append completion in req_bio_endio()
commit 55251fbdf0146c252ceff146a1bb145546f3e034 upstream.
This reverts commit 748dc0b65ec2b4b7b3dbd7befcc4a54fdcac7988.
Partial z
block: Do not force full zone append completion in req_bio_endio()
commit 55251fbdf0146c252ceff146a1bb145546f3e034 upstream.
This reverts commit 748dc0b65ec2b4b7b3dbd7befcc4a54fdcac7988.
Partial zone append completions cannot be supported as there is no guarantees that the fragmented data will be written sequentially in the same manner as with a full command. Commit 748dc0b65ec2 ("block: fix partial zone append completion handling in req_bio_endio()") changed req_bio_endio() to always advance a partially failed BIO by its full length, but this can lead to incorrect accounting. So revert this change and let low level device drivers handle this case by always failing completely zone append operations. With this revert, users will still see an IO error for a partially completed zone append BIO.
Fixes: 748dc0b65ec2 ("block: fix partial zone append completion handling in req_bio_endio()") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240328004409.594888-2-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
c595db6d |
| 13-Mar-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.18' into dev-6.6
This is the 6.6.18 stable release
|
Revision tags: v6.6.14, v6.6.13, v6.6.12, v6.6.11 |
|
#
b2261c2e |
| 10-Jan-2024 |
Damien Le Moal <dlemoal@kernel.org> |
block: fix partial zone append completion handling in req_bio_endio()
[ Upstream commit 748dc0b65ec2b4b7b3dbd7befcc4a54fdcac7988 ]
Partial completions of zone append request is not allowed but if a
block: fix partial zone append completion handling in req_bio_endio()
[ Upstream commit 748dc0b65ec2b4b7b3dbd7befcc4a54fdcac7988 ]
Partial completions of zone append request is not allowed but if a zone append completion indicates a number of completed bytes different from the original BIO size, only the BIO status is set to error. This leads to bio_advance() not setting the BIO size to 0 and thus to not call bio_endio() at the end of req_bio_endio().
Make sure a partially completed zone append is failed and completed immediately by forcing the completed number of bytes (nbytes) to be equal to the BIO size, thus ensuring that bio_endio() is called.
Fixes: 297db731847e ("block: fix req_bio_endio append error handling") Cc: stable@kernel.vger.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240110092942.442334-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
87832e93 |
| 10-Feb-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.16' into dev-6.6
This is the 6.6.16 stable release
|
#
1188f7f1 |
| 10-Feb-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.14' into dev-6.6
This is the 6.6.14 stable release
|
#
1b0b4c42 |
| 10-Feb-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.13' into dev-6.6
This is the 6.6.13 stable release
|
#
6d8b0162 |
| 12-Jan-2024 |
Ming Lei <ming.lei@redhat.com> |
blk-mq: fix IO hang from sbitmap wakeup race
[ Upstream commit 5266caaf5660529e3da53004b8b7174cab6374ed ]
In blk_mq_mark_tag_wait(), __add_wait_queue() may be re-ordered with the following blk_mq_g
blk-mq: fix IO hang from sbitmap wakeup race
[ Upstream commit 5266caaf5660529e3da53004b8b7174cab6374ed ]
In blk_mq_mark_tag_wait(), __add_wait_queue() may be re-ordered with the following blk_mq_get_driver_tag() in case of getting driver tag failure.
Then in __sbitmap_queue_wake_up(), waitqueue_active() may not observe the added waiter in blk_mq_mark_tag_wait() and wake up nothing, meantime blk_mq_mark_tag_wait() can't get driver tag successfully.
This issue can be reproduced by running the following test in loop, and fio hang can be observed in < 30min when running it on my test VM in laptop.
modprobe -r scsi_debug modprobe scsi_debug delay=0 dev_size_mb=4096 max_queue=1 host_max_queue=1 submit_queues=4 dev=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename` fio --filename=/dev/"$dev" --direct=1 --rw=randrw --bs=4k --iodepth=1 \ --runtime=100 --numjobs=40 --time_based --name=test \ --ioengine=libaio
Fix the issue by adding one explicit barrier in blk_mq_mark_tag_wait(), which is just fine in case of running out of tag.
Cc: Jan Kara <jack@suse.cz> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Reported-by: Changhui Zhong <czhong@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240112122626.4181044-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
8b607504 |
| 12-Jan-2024 |
Jens Axboe <axboe@kernel.dk> |
block: ensure we hold a queue reference when using queue limits
[ Upstream commit 7b4f36cd22a65b750b4cb6ac14804fb7d6e6c67d ]
q_usage_counter is the only thing preventing us from the limits changing
block: ensure we hold a queue reference when using queue limits
[ Upstream commit 7b4f36cd22a65b750b4cb6ac14804fb7d6e6c67d ]
q_usage_counter is the only thing preventing us from the limits changing under us in __bio_split_to_limits, but blk_mq_submit_bio doesn't hold it while calling into it.
Move the splitting inside the region where we know we've got a queue reference. Ideally this could still remain a shared section of code, but let's keep the fix simple and defer any refactoring here to later.
Reported-by: Christoph Hellwig <hch@lst.de> Fixes: 900e08075202 ("block: move queue enter logic into blk_mq_submit_bio()") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4 |
|
#
511f6025 |
| 01-Dec-2023 |
Ming Lei <ming.lei@redhat.com> |
blk-mq: don't count completed flush data request as inflight in case of quiesce
[ Upstream commit 0e4237ae8d159e3d28f3cd83146a46f576ffb586 ]
Request queue quiesce may interrupt flush sequence, and
blk-mq: don't count completed flush data request as inflight in case of quiesce
[ Upstream commit 0e4237ae8d159e3d28f3cd83146a46f576ffb586 ]
Request queue quiesce may interrupt flush sequence, and the original request may have been marked as COMPLETE, but can't get finished because of queue quiesce.
This way is fine from driver viewpoint, because flush sequence is block layer concept, and it isn't related with driver.
However, driver(such as dm-rq) can call blk_mq_queue_inflight() to count & drain inflight requests, then the wait & drain never gets done because the completed & not-finished flush request is counted as inflight.
Fix this issue by not counting completed flush data request as inflight in case of quiesce.
Cc: Mike Snitzer <snitzer@kernel.org> Cc: David Jeffery <djeffery@redhat.com> Cc: John Pittman <jpittman@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231201085605.577730-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|