Revision tags: v6.6.25, v6.6.24, v6.6.23 |
|
#
15a3fc5c |
| 11-Mar-2024 |
Mikulas Patocka <mpatocka@redhat.com> |
dm: call the resume method on internal suspend
[ Upstream commit 65e8fbde64520001abf1c8d0e573561b4746ef38 ]
There is this reported crash when experimenting with the lvm2 testsuite. The list corrupt
dm: call the resume method on internal suspend
[ Upstream commit 65e8fbde64520001abf1c8d0e573561b4746ef38 ]
There is this reported crash when experimenting with the lvm2 testsuite. The list corruption is caused by the fact that the postsuspend and resume methods were not paired correctly; there were two consecutive calls to the origin_postsuspend function. The second call attempts to remove the "hash_list" entry from a list, while it was already removed by the first call.
Fix __dm_internal_resume so that it calls the preresume and resume methods of the table's targets.
If a preresume method of some target fails, we are in a tricky situation. We can't return an error because dm_internal_resume isn't supposed to return errors. We can't return success, because then the "resume" and "postsuspend" methods would not be paired correctly. So, we set the DMF_SUSPENDED flag and we fake normal suspend - it may confuse userspace tools, but it won't cause a kernel crash.
------------[ cut here ]------------ kernel BUG at lib/list_debug.c:56! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 1 PID: 8343 Comm: dmsetup Not tainted 6.8.0-rc6 #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 RIP: 0010:__list_del_entry_valid_or_report+0x77/0xc0 <snip> RSP: 0018:ffff8881b831bcc0 EFLAGS: 00010282 RAX: 000000000000004e RBX: ffff888143b6eb80 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff819053d0 RDI: 00000000ffffffff RBP: ffff8881b83a3400 R08: 00000000fffeffff R09: 0000000000000058 R10: 0000000000000000 R11: ffffffff81a24080 R12: 0000000000000001 R13: ffff88814538e000 R14: ffff888143bc6dc0 R15: ffffffffa02e4bb0 FS: 00000000f7c0f780(0000) GS:ffff8893f0a40000(0000) knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000057fb5000 CR3: 0000000143474000 CR4: 00000000000006b0 Call Trace: <TASK> ? die+0x2d/0x80 ? do_trap+0xeb/0xf0 ? __list_del_entry_valid_or_report+0x77/0xc0 ? do_error_trap+0x60/0x80 ? __list_del_entry_valid_or_report+0x77/0xc0 ? exc_invalid_op+0x49/0x60 ? __list_del_entry_valid_or_report+0x77/0xc0 ? asm_exc_invalid_op+0x16/0x20 ? table_deps+0x1b0/0x1b0 [dm_mod] ? __list_del_entry_valid_or_report+0x77/0xc0 origin_postsuspend+0x1a/0x50 [dm_snapshot] dm_table_postsuspend_targets+0x34/0x50 [dm_mod] dm_suspend+0xd8/0xf0 [dm_mod] dev_suspend+0x1f2/0x2f0 [dm_mod] ? table_deps+0x1b0/0x1b0 [dm_mod] ctl_ioctl+0x300/0x5f0 [dm_mod] dm_compat_ctl_ioctl+0x7/0x10 [dm_mod] __x64_compat_sys_ioctl+0x104/0x170 do_syscall_64+0x184/0x1b0 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0xf7e6aead <snip> ---[ end trace 0000000000000000 ]---
Fixes: ffcc39364160 ("dm: enhance internal suspend and resume interface") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4 |
|
#
a9ce3853 |
| 15-Sep-2023 |
Jens Axboe <axboe@kernel.dk> |
dm: don't attempt to queue IO under RCU protection
dm looks up the table for IO based on the request type, with an assumption that if the request is marked REQ_NOWAIT, it's fine to attempt to submit
dm: don't attempt to queue IO under RCU protection
dm looks up the table for IO based on the request type, with an assumption that if the request is marked REQ_NOWAIT, it's fine to attempt to submit that IO while under RCU read lock protection. This is not OK, as REQ_NOWAIT just means that we should not be sleeping waiting on other IO, it does not mean that we can't potentially schedule.
A simple test case demonstrates this quite nicely:
int main(int argc, char *argv[]) { struct iovec iov; int fd;
fd = open("/dev/dm-0", O_RDONLY | O_DIRECT); posix_memalign(&iov.iov_base, 4096, 4096); iov.iov_len = 4096; preadv2(fd, &iov, 1, 0, RWF_NOWAIT); return 0; }
which will instantly spew:
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5580, name: dm-nowait preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 INFO: lockdep is turned off. CPU: 7 PID: 5580 Comm: dm-nowait Not tainted 6.6.0-rc1-g39956d2dcd81 #132 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x11d/0x1b0 __might_resched+0x3c3/0x5e0 ? preempt_count_sub+0x150/0x150 mempool_alloc+0x1e2/0x390 ? mempool_resize+0x7d0/0x7d0 ? lock_sync+0x190/0x190 ? lock_release+0x4b7/0x670 ? internal_get_user_pages_fast+0x868/0x2d40 bio_alloc_bioset+0x417/0x8c0 ? bvec_alloc+0x200/0x200 ? internal_get_user_pages_fast+0xb8c/0x2d40 bio_alloc_clone+0x53/0x100 dm_submit_bio+0x27f/0x1a20 ? lock_release+0x4b7/0x670 ? blk_try_enter_queue+0x1a0/0x4d0 ? dm_dax_direct_access+0x260/0x260 ? rcu_is_watching+0x12/0xb0 ? blk_try_enter_queue+0x1cc/0x4d0 __submit_bio+0x239/0x310 ? __bio_queue_enter+0x700/0x700 ? kvm_clock_get_cycles+0x40/0x60 ? ktime_get+0x285/0x470 submit_bio_noacct_nocheck+0x4d9/0xb80 ? should_fail_request+0x80/0x80 ? preempt_count_sub+0x150/0x150 ? lock_release+0x4b7/0x670 ? __bio_add_page+0x143/0x2d0 ? iov_iter_revert+0x27/0x360 submit_bio_noacct+0x53e/0x1b30 submit_bio_wait+0x10a/0x230 ? submit_bio_wait_endio+0x40/0x40 __blkdev_direct_IO_simple+0x4f8/0x780 ? blkdev_bio_end_io+0x4c0/0x4c0 ? stack_trace_save+0x90/0xc0 ? __bio_clone+0x3c0/0x3c0 ? lock_release+0x4b7/0x670 ? lock_sync+0x190/0x190 ? atime_needs_update+0x3bf/0x7e0 ? timestamp_truncate+0x21b/0x2d0 ? inode_owner_or_capable+0x240/0x240 blkdev_direct_IO.part.0+0x84a/0x1810 ? rcu_is_watching+0x12/0xb0 ? lock_release+0x4b7/0x670 ? blkdev_read_iter+0x40d/0x530 ? reacquire_held_locks+0x4e0/0x4e0 ? __blkdev_direct_IO_simple+0x780/0x780 ? rcu_is_watching+0x12/0xb0 ? __mark_inode_dirty+0x297/0xd50 ? preempt_count_add+0x72/0x140 blkdev_read_iter+0x2a4/0x530 do_iter_readv_writev+0x2f2/0x3c0 ? generic_copy_file_range+0x1d0/0x1d0 ? fsnotify_perm.part.0+0x25d/0x630 ? security_file_permission+0xd8/0x100 do_iter_read+0x31b/0x880 ? import_iovec+0x10b/0x140 vfs_readv+0x12d/0x1a0 ? vfs_iter_read+0xb0/0xb0 ? rcu_is_watching+0x12/0xb0 ? rcu_is_watching+0x12/0xb0 ? lock_release+0x4b7/0x670 do_preadv+0x1b3/0x260 ? do_readv+0x370/0x370 __x64_sys_preadv2+0xef/0x150 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f5af41ad806 Code: 41 54 41 89 fc 55 44 89 c5 53 48 89 cb 48 83 ec 18 80 3d e4 dd 0d 00 00 74 7a 45 89 c1 49 89 ca 45 31 c0 b8 47 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 be 00 00 00 48 85 c0 79 4a 48 8b 0d da 55 RSP: 002b:00007ffd3145c7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000147 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5af41ad806 RDX: 0000000000000001 RSI: 00007ffd3145c850 RDI: 0000000000000003 RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000008 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 R13: 00007ffd3145c850 R14: 000055f5f0431dd8 R15: 0000000000000001 </TASK>
where in fact it is dm itself that attempts to allocate a bio clone with GFP_NOIO under the rcu read lock, regardless of the request type.
Fix this by getting rid of the special casing for REQ_NOWAIT, and just use the normal SRCU protected table lookup. Get rid of the bio based table locking helpers at the same time, as they are now unused.
Cc: stable@vger.kernel.org Fixes: 563a225c9fd2 ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
Revision tags: v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34 |
|
#
c4f512d2 |
| 13-Jun-2023 |
Mike Snitzer <snitzer@kernel.org> |
dm: skip dm-stats work in alloc_io() unless needed
Don't dm_stats_record_start() if dm_stats_used() is false.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
#
06eed768 |
| 13-Jun-2023 |
Mike Snitzer <snitzer@kernel.org> |
dm: avoid needless dm_io access if all IO accounting is disabled
Update dm_io_acct() to eliminate most dm_io struct accesses if both block core's IO stats and dm-stats are disabled.
Signed-off-by:
dm: avoid needless dm_io access if all IO accounting is disabled
Update dm_io_acct() to eliminate most dm_io struct accesses if both block core's IO stats and dm-stats are disabled.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
#
526d1006 |
| 12-Jun-2023 |
Li Nan <linan122@huawei.com> |
dm: support turning off block-core's io stats accounting
Commit bc58ba9468d9 ("block: add sysfs file for controlling io stats accounting") allowed users to turn off disk stat accounting completely b
dm: support turning off block-core's io stats accounting
Commit bc58ba9468d9 ("block: add sysfs file for controlling io stats accounting") allowed users to turn off disk stat accounting completely by checking if queue flag QUEUE_FLAG_IO_STAT is set. In dm, this flag is neither set nor checked: so block-core's io stats are continuously counted and cannot be turned off.
Add support for turning off block-core's io stats accounting for dm. Set QUEUE_FLAG_IO_STAT for dm's request_queue. If QUEUE_FLAG_IO_STAT is set when an io starts, record the need for block core's io stats by setting the DM_IO_BLK_STAT dm_io flag to avoid io stats being disabled in the middle of the io.
DM statistics (dm-stats) is independent of block-core's io stats and remains unchanged.
Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
#
be04c14a |
| 14-Jun-2023 |
Mike Snitzer <snitzer@kernel.org> |
dm: use op specific max_sectors when splitting abnormal io
Split abnormal IO in terms of the corresponding operation specific max_sectors (max_discard_sectors, max_secure_erase_sectors or max_write_
dm: use op specific max_sectors when splitting abnormal io
Split abnormal IO in terms of the corresponding operation specific max_sectors (max_discard_sectors, max_secure_erase_sectors or max_write_zeroes_sectors).
This fixes a significant dm-thinp discard performance regression that was introduced with commit e2dd8aca2d76 ("dm bio prison v1: improve concurrent IO performance"). Relative to discard: max_discard_sectors is used instead of max_sectors; which fixes excessive discard splitting (e.g. max_sectors=128K vs max_discard_sectors=64M).
Tested by discarding an 1 Petabyte dm-thin device: lvcreate -V 1125899906842624B -T test/pool -n thin time blkdiscard /dev/test/thin
Before this fix (splitting discards every 128K): ~116m After this fix (splitting discards every 64M) : 0m33.460s
Reported-by: Zorro Lang <zlang@redhat.com> Fixes: 06961c487a33 ("dm: split discards further if target sets max_discard_granularity") Requires: 13f6facf3fae ("dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE") Fixes: e2dd8aca2d76 ("dm bio prison v1: improve concurrent IO performance") Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
Revision tags: v6.1.33, v6.1.32 |
|
#
2760904d |
| 01-Jun-2023 |
Li Lingfeng <lilingfeng3@huawei.com> |
dm: don't lock fs when the map is NULL during suspend or resume
As described in commit 38d11da522aa ("dm: don't lock fs when the map is NULL in process of resume"), a deadlock may be triggered betwe
dm: don't lock fs when the map is NULL during suspend or resume
As described in commit 38d11da522aa ("dm: don't lock fs when the map is NULL in process of resume"), a deadlock may be triggered between do_resume() and do_mount().
This commit preserves the fix from commit 38d11da522aa but moves it to where it also serves to fix a similar deadlock between do_suspend() and do_mount(). It does so, if the active map is NULL, by clearing DM_SUSPEND_LOCKFS_FLAG in dm_suspend() which is called by both do_suspend() and do_resume().
Fixes: 38d11da522aa ("dm: don't lock fs when the map is NULL in process of resume") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
#
05bdb996 |
| 08-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
block: replace fmode_t with a block-specific type for block open flags
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE.
block: replace fmode_t with a block-specific type for block open flags
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
2736e8ee |
| 08-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
block: use the holder as indication for exclusive opens
The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass F
block: use the holder as indication for exclusive opens
The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder.
For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
ae220766 |
| 08-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
block: remove the unused mode argument to ->release
The mode argument to the ->release block_device_operation is never used, so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by:
block: remove the unused mode argument to ->release
The mode argument to the ->release block_device_operation is never used, so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
d32e2bf8 |
| 08-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
block: pass a gendisk to ->open
->open is only called on the whole device. Make that explicit by passing a gendisk instead of the block_device.
Signed-off-by: Christoph Hellwig <hch@lst.de> Review
block: pass a gendisk to ->open
->open is only called on the whole device. Make that explicit by passing a gendisk instead of the block_device.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0718afd4 |
| 01-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
block: introduce holder ops
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer t
block: introduce holder ops
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.31 |
|
#
57bbf99c |
| 25-May-2023 |
Tejun Heo <tj@kernel.org> |
dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues
BACKGROUND ==========
When multiple work items are queued to a workqueue, their execution order doesn't match the queueing o
dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues
BACKGROUND ==========
When multiple work items are queued to a workqueue, their execution order doesn't match the queueing order. They may get executed in any order and simultaneously. When fully serialized execution - one by one in the queueing order - is needed, an ordered workqueue should be used which can be created with alloc_ordered_workqueue().
However, alloc_ordered_workqueue() was a later addition. Before it, an ordered workqueue could be obtained by creating an UNBOUND workqueue with @max_active==1. This originally was an implementation side-effect which was broken by 4c16bd327c74 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered"). Because there were users that depended on the ordered execution, 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") made workqueue allocation path to implicitly promote UNBOUND workqueues w/ @max_active==1 to ordered workqueues.
While this has worked okay, overloading the UNBOUND allocation interface this way creates other issues. It's difficult to tell whether a given workqueue actually needs to be ordered and users that legitimately want a min concurrency level wq unexpectedly gets an ordered one instead. With planned UNBOUND workqueue updates to improve execution locality and more prevalence of chiplet designs which can benefit from such improvements, this isn't a state we wanna be in forever.
This patch series audits all callsites that create an UNBOUND workqueue w/ @max_active==1 and converts them to alloc_ordered_workqueue() as necessary.
WHAT TO LOOK FOR ================
The conversions are from
alloc_workqueue(WQ_UNBOUND | flags, 1, args..)
to
alloc_ordered_workqueue(flags, args...)
which don't cause any functional changes. If you know that fully ordered execution is not necessary, please let me know. I'll drop the conversion and instead add a comment noting the fact to reduce confusion while conversion is in progress.
If you aren't fully sure, it's completely fine to let the conversion through. The behavior will stay exactly the same and we can always reconsider later.
As there are follow-up workqueue core changes, I'd really appreciate if the patch can be routed through the workqueue tree w/ your acks. Thanks.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: dm-devel@redhat.com Cc: linux-kernel@vger.kernel.org
show more ...
|
Revision tags: v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25 |
|
#
f7995089 |
| 14-Apr-2023 |
Mike Snitzer <snitzer@kernel.org> |
dm: unexport dm_get_queue_limits()
There are no dm_get_queue_limits() callers outside of DM core and there shouldn't be.
Also, remove its BUG_ON(!atomic_read(&md->holders)) to micro-optimize __proc
dm: unexport dm_get_queue_limits()
There are no dm_get_queue_limits() callers outside of DM core and there shouldn't be.
Also, remove its BUG_ON(!atomic_read(&md->holders)) to micro-optimize __process_abnormal_io().
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
#
13f6facf |
| 14-Apr-2023 |
Mike Snitzer <snitzer@kernel.org> |
dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE
Introduce max_write_zeroes_granularity and max_secure_erase_granularity flags in the dm_target struct.
If a target sets these th
dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE
Introduce max_write_zeroes_granularity and max_secure_erase_granularity flags in the dm_target struct.
If a target sets these then DM core will split IO of these operation types accordingly (in terms of max_write_zeroes_sectors and max_secure_erase_sectors respectively).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
Revision tags: v6.1.24 |
|
#
8a8da082 |
| 07-Apr-2023 |
Mike Christie <michael.christie@oracle.com> |
dm: Add support for block PR read keys/reservation
This adds support in dm for the block PR read keys and read reservation callouts.
Signed-off-by: Mike Christie <michael.christie@oracle.com> Link:
dm: Add support for block PR read keys/reservation
This adds support in dm for the block PR read keys and read reservation callouts.
Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-7-michael.christie@oracle.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
show more ...
|
Revision tags: v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15 |
|
#
06961c48 |
| 01-Mar-2023 |
Mike Snitzer <snitzer@kernel.org> |
dm: split discards further if target sets max_discard_granularity
The block core (bio_split_discard) will already split discards based on the 'discard_granularity' and 'max_discard_sectors' queue_li
dm: split discards further if target sets max_discard_granularity
The block core (bio_split_discard) will already split discards based on the 'discard_granularity' and 'max_discard_sectors' queue_limits. But the DM thin target also needs to ensure that it doesn't receive a discard that spans a 'max_discard_sectors' boundary.
Introduce a dm_target 'max_discard_granularity' flag that if set will cause DM core to split discard bios relative to 'max_discard_sectors'. This treats 'discard_granularity' as a "min_discard_granularity" and 'max_discard_sectors' as a "max_discard_granularity".
Requested-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
#
666eed46 |
| 30-Mar-2023 |
Mike Snitzer <snitzer@kernel.org> |
dm: fix __send_duplicate_bios() to always allow for splitting IO
Commit 7dd76d1feec70 ("dm: improve bio splitting and associated IO accounting") only called setup_split_accounting() from __send_dupl
dm: fix __send_duplicate_bios() to always allow for splitting IO
Commit 7dd76d1feec70 ("dm: improve bio splitting and associated IO accounting") only called setup_split_accounting() from __send_duplicate_bios() if a single bio were being issued. But the case where duplicate bios are issued must call it too.
Otherwise the bio won't be split and resubmitted (via recursion through block core back to DM) to submit the later portions of a bio (which may map to an entirely different target).
For example, when discarding an entire DM striped device with the following DM table: vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048 vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
Before (broken, discards the first striped target's devices twice): device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872 device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872 device-mapper: striped: target_stripe=0, bdev=7:0, start=2049 len=22528 device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=22528
After (works as expected): device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872 device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872 device-mapper: striped: target_stripe=0, bdev=7:2, start=2048 len=22528 device-mapper: striped: target_stripe=1, bdev=7:3, start=2048 len=22528
Fixes: 7dd76d1feec70 ("dm: improve bio splitting and associated IO accounting") Cc: stable@vger.kernel.org Reported-by: Orange Kao <orange@aiven.io> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
#
f7b58a69 |
| 30-Mar-2023 |
Mike Snitzer <snitzer@kernel.org> |
dm: fix improper splitting for abnormal bios
"Abnormal" bios include discards, write zeroes and secure erase. By no longer passing the calculated 'len' pointer, commit 7dd06a2548b2 ("dm: allow dm_ac
dm: fix improper splitting for abnormal bios
"Abnormal" bios include discards, write zeroes and secure erase. By no longer passing the calculated 'len' pointer, commit 7dd06a2548b2 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios") took a senseless approach to disallowing dm_accept_partial_bio() from working for duplicate bios processed using __send_duplicate_bios().
It inadvertently and incorrectly stopped the use of 'len' when initializing a target's io (in alloc_tio). As such the resulting tio could address more area of a device than it should.
For example, when discarding an entire DM striped device with the following DM table: vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048 vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
Before this fix:
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400 blkdiscard: attempt to access beyond end of device loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400 blkdiscard: attempt to access beyond end of device loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
After this fix;
device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872 device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
Fixes: 7dd06a2548b2 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios") Cc: stable@vger.kernel.org Reported-by: Orange Kao <orange@aiven.io> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
#
d3aa3e06 |
| 16-Mar-2023 |
Jiasheng Jiang <jiasheng@iscas.ac.cn> |
dm stats: check for and propagate alloc_percpu failure
Check alloc_precpu()'s return value and return an error from dm_stats_init() if it fails. Update alloc_dev() to fail if dm_stats_init() does.
dm stats: check for and propagate alloc_percpu failure
Check alloc_precpu()'s return value and return an error from dm_stats_init() if it fails. Update alloc_dev() to fail if dm_stats_init() does.
Otherwise, a NULL pointer dereference will occur in dm_stats_cleanup() even if dm-stats isn't being actively used.
Fixes: fd2ed4d25270 ("dm: add statistics support") Cc: stable@vger.kernel.org Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
Revision tags: v6.1.14 |
|
#
5f275713 |
| 23-Feb-2023 |
Yu Kuai <yukuai3@huawei.com> |
block: count 'ios' and 'sectors' when io is done for bio-based device
While using iostat for raid, I observed very strange 'await' occasionally, and turns out it's due to that 'ios' and 'sectors' is
block: count 'ios' and 'sectors' when io is done for bio-based device
While using iostat for raid, I observed very strange 'await' occasionally, and turns out it's due to that 'ios' and 'sectors' is counted in bdev_start_io_acct(), while 'nsecs' is counted in bdev_end_io_acct(). I'm not sure why they are ccounted like that but I think this behaviour is obviously wrong because user will get wrong disk stats.
Fix the problem by counting 'ios' and 'sectors' when io is done, like what rq-based device does.
Fixes: 394ffa503bc4 ("blk: introduce generic io stat accounting help function") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230223091226.1135678-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.13, v6.2 |
|
#
d695e441 |
| 16-Feb-2023 |
XU pengfei <xupengfei@nfschina.com> |
dm: remove unnecessary (void*) conversion in event_callback()
Pointer variables of void * type do not require type cast.
Signed-off-by: XU pengfei <xupengfei@nfschina.com> Signed-off-by: Mike Snitz
dm: remove unnecessary (void*) conversion in event_callback()
Pointer variables of void * type do not require type cast.
Signed-off-by: XU pengfei <xupengfei@nfschina.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
#
f77692d6 |
| 16-Feb-2023 |
Mike Snitzer <snitzer@kernel.org> |
dm: add cond_resched() to dm_wq_requeue_work()
Otherwise the while() loop in dm_wq_requeue_work() can result in a "dead loop" on systems that have preemption disabled. This is particularly problemat
dm: add cond_resched() to dm_wq_requeue_work()
Otherwise the while() loop in dm_wq_requeue_work() can result in a "dead loop" on systems that have preemption disabled. This is particularly problematic on single cpu systems.
Fixes: 8b211aaccb915 ("dm: add two stage requeue mechanism") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
#
0ca44fce |
| 15-Feb-2023 |
Pingfan Liu <piliu@redhat.com> |
dm: add cond_resched() to dm_wq_work()
Otherwise the while() loop in dm_wq_work() can result in a "dead loop" on systems that have preemption disabled. This is particularly problematic on single cpu
dm: add cond_resched() to dm_wq_work()
Otherwise the while() loop in dm_wq_work() can result in a "dead loop" on systems that have preemption disabled. This is particularly problematic on single cpu systems.
Cc: stable@vger.kernel.org Signed-off-by: Pingfan Liu <piliu@redhat.com> Acked-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|
Revision tags: v6.1.12 |
|
#
0b22ff53 |
| 14-Feb-2023 |
Mike Snitzer <snitzer@kernel.org> |
dm: remove flush_scheduled_work() during local_exit()
Commit acfe0ad74d2e1 ("dm: allocate a special workqueue for deferred device removal") switched from using system workqueue to a single workqueue
dm: remove flush_scheduled_work() during local_exit()
Commit acfe0ad74d2e1 ("dm: allocate a special workqueue for deferred device removal") switched from using system workqueue to a single workqueue local to DM. But it didn't eliminate the call to flush_scheduled_work() that was introduced purely for the benefit of deferred device removal with commit 2c140a246dc ("dm: allow remove to be deferred").
Since DM core uses its own workqueue (and queue_work) there is no need to call flush_scheduled_work() from local_exit(). local_exit()'s destroy_workqueue(deferred_remove_workqueue) handles flushing work started with queue_work().
Fixes: acfe0ad74d2e1 ("dm: allocate a special workqueue for deferred device removal") Signed-off-by: Mike Snitzer <snitzer@kernel.org>
show more ...
|