#
a782483c |
| 26-Nov-2020 |
Christoph Hellwig <hch@lst.de> |
block: remove the nr_sects field in struct hd_struct
Now that the hd_struct always has a block device attached to it, there is no need for having two size field that just get out of sync.
Additiona
block: remove the nr_sects field in struct hd_struct
Now that the hd_struct always has a block device attached to it, there is no need for having two size field that just get out of sync.
Additionally the field in hd_struct did not use proper serialization, possibly allowing for torn writes. By only using the block_device field this problem also gets fixed.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
37c3fc9a |
| 25-Nov-2020 |
Christoph Hellwig <hch@lst.de> |
block: simplify the block device claiming interface
Stop passing the whole device as a separate argument given that it can be trivially deducted and cleanup the !holder debug check.
Signed-off-by:
block: simplify the block device claiming interface
Stop passing the whole device as a separate argument given that it can be trivially deducted and cleanup the !holder debug check.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
a954ea81 |
| 23-Nov-2020 |
Christoph Hellwig <hch@lst.de> |
block: remove ->bd_contains
Now that each hd_struct has a reference to the corresponding block_device, there is no need for the bd_contains pointer. Add a bdev_whole() helper to look up the whole d
block: remove ->bd_contains
Now that each hd_struct has a reference to the corresponding block_device, there is no need for the bd_contains pointer. Add a bdev_whole() helper to look up the whole device block_device struture instead.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
4e7b5671 |
| 23-Nov-2020 |
Christoph Hellwig <hch@lst.de> |
block: remove i_bdev
Switch the block device lookup interfaces to directly work with a dev_t so that struct block_device references are only acquired by the blkdev_get variants (and the blk-cgroup s
block: remove i_bdev
Switch the block device lookup interfaces to directly work with a dev_t so that struct block_device references are only acquired by the blkdev_get variants (and the blk-cgroup special case). This means that we now don't need an extra reference in the inode and can generally simplify handling of struct block_device to keep the lookups contained in the core block layer code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Coly Li <colyli@suse.de> [bcache] Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
f46f2a31 |
| 16-Nov-2020 |
Christoph Hellwig <hch@lst.de> |
loop: do not call set_blocksize
set_blocksize is used by file systems to use their preferred buffer cache block size. Block drivers should not set it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
loop: do not call set_blocksize
set_blocksize is used by file systems to use their preferred buffer cache block size. Block drivers should not set it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
449f4ec9 |
| 16-Nov-2020 |
Christoph Hellwig <hch@lst.de> |
block: remove the update_bdev parameter to set_capacity_revalidate_and_notify
The update_bdev argument is always set to true, so remove it. Also rename the function to the slighly less verbose set_
block: remove the update_bdev parameter to set_capacity_revalidate_and_notify
The update_bdev argument is always set to true, so remove it. Also rename the function to the slighly less verbose set_capacity_and_notify, as propagating the disk size to the block device isn't really revalidation.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
3b4f85d0 |
| 16-Nov-2020 |
Christoph Hellwig <hch@lst.de> |
loop: let set_capacity_revalidate_and_notify update the bdev size
There is no good reason to call revalidate_disk_size separately.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes
loop: let set_capacity_revalidate_and_notify update the bdev size
There is no good reason to call revalidate_disk_size separately.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
8410d38c |
| 29-Oct-2020 |
Christoph Hellwig <hch@lst.de> |
loop: use __register_blkdev to allocate devices on demand
Use the simpler mechanism attached to major_name to allocate a brd device when a currently unregistered minor is accessed.
Signed-off-by: C
loop: use __register_blkdev to allocate devices on demand
Use the simpler mechanism attached to major_name to allocate a brd device when a currently unregistered minor is accessed.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
7a2f0ce1 |
| 03-Nov-2020 |
Christoph Hellwig <hch@lst.de> |
loop: use set_disk_ro
Use set_disk_ro instead of set_device_ro to match all other block drivers and to ensure all partitions mirror the read-only flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
loop: use set_disk_ro
Use set_disk_ro instead of set_device_ro to match all other block drivers and to ensure all partitions mirror the read-only flag.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
fd6625a1 |
| 22-Feb-2021 |
Mauricio Faria de Oliveira <mfo@canonical.com> |
loop: fix I/O error on fsync() in detached loop devices
commit 4ceddce55eb35d15b0f87f5dcf6f0058fd15d3a4 upstream.
There's an I/O error on fsync() in a detached loop device if it has been previously
loop: fix I/O error on fsync() in detached loop devices
commit 4ceddce55eb35d15b0f87f5dcf6f0058fd15d3a4 upstream.
There's an I/O error on fsync() in a detached loop device if it has been previously attached.
The issue is write cache is enabled in the attach path in loop_configure() but it isn't disabled in the detach path; thus it remains enabled in the block device regardless of whether it is attached or not.
Now fsync() can get an I/O request that will just be failed later in loop_queue_rq() as device's state is not 'Lo_bound'.
So, disable write cache in the detach path.
Do so based on the queue flag, not the loop device flag for read-only (used to enable) as the queue flag can be changed via sysfs even on read-only loop devices (e.g., losetup -r.)
Test-case:
# DEV=/dev/loop7
# IMG=/tmp/image # truncate --size 1M $IMG
# losetup $DEV $IMG # losetup -d $DEV
Before:
# strace -e fsync parted -s $DEV print 2>&1 | grep fsync fsync(3) = -1 EIO (Input/output error) Warning: Error fsyncing/closing /dev/loop7: Input/output error [ 982.529929] blk_update_request: I/O error, dev loop7, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
After:
# strace -e fsync parted -s $DEV print 2>&1 | grep fsync fsync(3) = 0
Co-developed-by: Eric Desrochers <eric.desrochers@canonical.com> Signed-off-by: Eric Desrochers <eric.desrochers@canonical.com> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Tested-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
a7e18f57 |
| 18-Jun-2021 |
Kristian Klausen <kristian@klausen.dk> |
loop: Fix missing discard support when using LOOP_CONFIGURE
commit 2b9ac22b12a266eb4fec246a07b504dd4983b16b upstream.
Without calling loop_config_discard() the discard flag and parameters aren't se
loop: Fix missing discard support when using LOOP_CONFIGURE
commit 2b9ac22b12a266eb4fec246a07b504dd4983b16b upstream.
Without calling loop_config_discard() the discard flag and parameters aren't set/updated for the loop device and worst-case they could indicate discard support when it isn't the case (ex: if the LOOP_SET_STATUS ioctl was used with a different file prior to LOOP_CONFIGURE).
Cc: <stable@vger.kernel.org> # 5.8.x- Fixes: 3448914e8cc5 ("loop: Add LOOP_CONFIGURE ioctl") Signed-off-by: Kristian Klausen <kristian@klausen.dk> Link: https://lore.kernel.org/r/20210618115157.31452-1-kristian@klausen.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
c01a21b7 |
| 12-Nov-2020 |
Petr Vorel <pvorel@suse.cz> |
loop: Fix occasional uevent drop
Commit 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify") causes an occasional drop of loop device uevent, which are no longer triggered in loop_set
loop: Fix occasional uevent drop
Commit 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify") causes an occasional drop of loop device uevent, which are no longer triggered in loop_set_size() but in a different part of code.
Bug is reproducible with LTP test uevent01 [1]:
i=0; while true; do i=$((i+1)); echo "== $i ==" lsmod |grep -q loop && rmmod -f loop ./uevent01 || break done
Put back triggering through code called in loop_set_size().
Fix required to add yet another parameter to set_capacity_revalidate_and_notify().
[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/uevents/uevent01.c
[hch: rebased on a different change to the prototype of set_capacity_revalidate_and_notify]
Cc: stable@vger.kernel.org # v5.9 Fixes: 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify") Reported-by: <ltp@lists.linux.it> Signed-off-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61 |
|
#
611bee52 |
| 23-Aug-2020 |
Christoph Hellwig <hch@lst.de> |
block: replace bd_set_size with bd_set_nr_sectors
Replace bd_set_size with a version that takes the number of sectors instead, as that fits most of the current and future callers much better.
Signe
block: replace bd_set_size with bd_set_nr_sectors
Replace bd_set_size with a version that takes the number of sectors instead, as that fits most of the current and future callers much better.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
79e5dc59 |
| 25-Aug-2020 |
Martijn Coenen <maco@android.com> |
loop: Set correct device size when using LOOP_CONFIGURE
The device size calculation was done before processing the loop configuration, which meant that the we set the size on the underlying block de
loop: Set correct device size when using LOOP_CONFIGURE
The device size calculation was done before processing the loop configuration, which meant that the we set the size on the underlying block device incorrectly in case lo_offset/lo_sizelimit were set in the configuration. Delay computing the size until we've setup the device parameters correctly.
Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl") Reported-by: Lennart Poettering <mzxreary@0pointer.de> Tested-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com> Signed-off-by: Martijn Coenen <maco@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
df561f66 |
| 23-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through mar
treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
show more ...
|
Revision tags: v5.8.3, v5.4.60, v5.8.2, v5.4.59 |
|
#
bcb21c8c |
| 17-Aug-2020 |
Ming Lei <ming.lei@redhat.com> |
block: loop: set discard granularity and alignment for block device backed loop
In case of block device backend, if the backend supports write zeros, the loop device will set queue flag of QUEUE_FLA
block: loop: set discard granularity and alignment for block device backed loop
In case of block device backend, if the backend supports write zeros, the loop device will set queue flag of QUEUE_FLAG_DISCARD. However, limits.discard_granularity isn't setup, and this way is wrong, see the following description in Documentation/ABI/testing/sysfs-block:
A discard_granularity of 0 means that the device does not support discard functionality.
Especially 9b15d109a6b2 ("block: improve discard bio alignment in __blkdev_issue_discard()") starts to take q->limits.discard_granularity for computing max discard sectors. And zero discard granularity may cause kernel oops, or fail discard request even though the loop queue claims discard support via QUEUE_FLAG_DISCARD.
Fix the issue by setup discard granularity and alignment.
Fixes: c52abf563049 ("loop: Better discard support for block devices") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Coly Li <colyli@suse.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Xiao Ni <xni@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Evan Green <evgreen@chromium.org> Cc: Gwendal Grignou <gwendal@chromium.org> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.8.1, v5.4.58 |
|
#
fe6a8fc5 |
| 10-Aug-2020 |
Lennart Poettering <mzxreary@0pointer.de> |
loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
When LOOP_CONFIGURE is used with LO_FLAGS_PARTSCAN we need to propagate this into the GENHD_FL_NO_PART_SCAN. LOOP_SETSTATUS does this, LOOP_CONFIG
loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
When LOOP_CONFIGURE is used with LO_FLAGS_PARTSCAN we need to propagate this into the GENHD_FL_NO_PART_SCAN. LOOP_SETSTATUS does this, LOOP_CONFIGURE doesn't so far. Effect is that setting up a loopback device with partition scanning doesn't actually work when LOOP_CONFIGURE is issued, though it works fine with LOOP_SETSTATUS.
Let's correct that and propagate the flag in LOOP_CONFIGURE too.
Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl")
Signed-off-by: Lennart Poettering <lennart@poettering.net> Acked-by: Martijn Coenen <maco@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53 |
|
#
ecbe6bc0 |
| 16-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
block: use bd_prepare_to_claim directly in the loop driver
The arcane magic in bd_start_claiming is only needed to be able to claim a block_device that hasn't been fully set up. Switch the loop dri
block: use bd_prepare_to_claim directly in the loop driver
The arcane magic in bd_start_claiming is only needed to be able to claim a block_device that hasn't been fully set up. Switch the loop driver that claims from the ioctl path with a fully set up struct block_device to just use the much simpler bd_prepare_to_claim directly.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48 |
|
#
200f9337 |
| 19-Jun-2020 |
Luis Chamberlain <mcgrof@kernel.org> |
loop: be paranoid on exit and prevent new additions / removals
Be pedantic on removal as well and hold the mutex. This should prevent uses of addition while we exit.
Signed-off-by: Luis Chamberlain
loop: be paranoid on exit and prevent new additions / removals
Be pedantic on removal as well and hold the mutex. This should prevent uses of addition while we exit.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.7.4, v5.7.3, v5.4.47 |
|
#
15f73f5b |
| 11-Jun-2020 |
Christoph Hellwig <hch@lst.de> |
blk-mq: move failure injection out of blk_mq_complete_request
Move the call to blk_should_fake_timeout out of blk_mq_complete_request and into the drivers, skipping call sites that are obvious error
blk-mq: move failure injection out of blk_mq_complete_request
Move the call to blk_should_fake_timeout out of blk_mq_complete_request and into the drivers, skipping call sites that are obvious error handlers, and remove the now superflous blk_mq_force_complete_rq helper. This ensures we don't keep injecting errors into completions that just terminate the Linux request after the hardware has been reset or the command has been aborted.
Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
f4bd34b1 |
| 17-Jun-2020 |
Zheng Bin <zhengbin13@huawei.com> |
loop: replace kill_bdev with invalidate_bdev
When a filesystem is mounted on a loop device and on a loop ioctl LOOP_SET_STATUS64, because of kill_bdev, buffer_head mappings are getting destroyed. ki
loop: replace kill_bdev with invalidate_bdev
When a filesystem is mounted on a loop device and on a loop ioctl LOOP_SET_STATUS64, because of kill_bdev, buffer_head mappings are getting destroyed. kill_bdev truncate_inode_pages truncate_inode_pages_range do_invalidatepage block_invalidatepage discard_buffer -->clear BH_Mapped flag
sb_bread __bread_gfp bh = __getblk_gfp -->discard_buffer clear BH_Mapped flag __bread_slow submit_bh submit_bh_wbc BUG_ON(!buffer_mapped(bh)) --> hit this BUG_ON
Fixes: 5db470e229e2 ("loop: drop caches if offset or block_size are changed") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.4.46, v5.7.2, v5.4.45, v5.7.1 |
|
#
6ac92fb5 |
| 04-Jun-2020 |
Martijn Coenen <maco@android.com> |
loop: Fix wrong masking of status flags
In faf1d25440d6, loop_set_status() now assigns lo_status directly from the passed in lo_flags, but then fixes it up by masking out flags that can't be set by
loop: Fix wrong masking of status flags
In faf1d25440d6, loop_set_status() now assigns lo_status directly from the passed in lo_flags, but then fixes it up by masking out flags that can't be set by LOOP_SET_STATUS; unfortunately the mask was negated.
Re-ran all ltp ioctl_loop tests, and they all passed.
Pass run of the previously failing one:
tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s tst_device.c:88: INFO: Found free device 0 '/dev/loop0' ioctl_loop01.c:49: PASS: /sys/block/loop0/loop/partscan = 0 ioctl_loop01.c:50: PASS: /sys/block/loop0/loop/autoclear = 0 ioctl_loop01.c:51: PASS: /sys/block/loop0/loop/backing_file = '/tmp/ZRJ6H4/test.img' ioctl_loop01.c:65: PASS: get expected lo_flag 12 ioctl_loop01.c:67: PASS: /sys/block/loop0/loop/partscan = 1 ioctl_loop01.c:68: PASS: /sys/block/loop0/loop/autoclear = 1 ioctl_loop01.c:77: PASS: access /dev/loop0p1 succeeds ioctl_loop01.c:83: PASS: access /sys/block/loop0/loop0p1 succeeds
Summary: passed 8 failed 0 skipped 0 warnings 0
Fixes: faf1d25440d6 ("loop: Clean up LOOP_SET_STATUS lo_flags handling") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Martijn Coenen <maco@android.com> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.4.44 |
|
#
a37b0715 |
| 01-Jun-2020 |
NeilBrown <neilb@suse.de> |
mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where
mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE
PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a daemon needs to write to one bdi (the final bdi) in order to free up writes queued to another bdi (the client bdi).
The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty pages, so that it can still dirty pages after other processses have been throttled. The purpose of this is to avoid deadlock that happen when the PF_LESS_THROTTLE process must write for any dirty pages to be freed, but it is being thottled and cannot write.
This approach was designed when all threads were blocked equally, independently on which device they were writing to, or how fast it was. Since that time the writeback algorithm has changed substantially with different threads getting different allowances based on non-trivial heuristics. This means the simple "add 25%" heuristic is no longer reliable.
The important issue is not that the daemon needs a *larger* dirty page allowance, but that it needs a *private* dirty page allowance, so that dirty pages for the "client" bdi that it is helping to clear (the bdi for an NFS filesystem or loop block device etc) do not affect the throttling of the daemon writing to the "final" bdi.
This patch changes the heuristic so that the task is not throttled when the bdi it is writing to has a dirty page count below below (or equal to) the free-run threshold for that bdi. This ensures it will always be able to have some pages in flight, and so will not deadlock.
In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might still be throttled by global threshold, but that is acceptable as it is only the deadlock state that is interesting for this flag.
This approach of "only throttle when target bdi is busy" is consistent with the other use of PF_LESS_THROTTLE in current_may_throttle(), were it causes attention to be focussed only on the target bdi.
So this patch - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE, - removes the 25% bonus that that flag gives, and - If PF_LOCAL_THROTTLE is set, don't delay at all unless the global and the local free-run thresholds are exceeded.
Note that previously realtime threads were treated the same as PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour for real-time threads, so it is now different from the behaviour of nfsd and loop tasks. I don't know what is wanted for realtime.
[akpm@linux-foundation.org: coding style fixes] Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Chuck Lever <chuck.lever@oracle.com> [nfsd] Cc: Christoph Hellwig <hch@lst.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.7 |
|
#
bf0beec0 |
| 29-May-2020 |
Ming Lei <ming.lei@redhat.com> |
blk-mq: drain I/O when all CPUs in a hctx are offline
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup up queue mapping. Thomas mentioned the following point[1]:
"That was the
blk-mq: drain I/O when all CPUs in a hctx are offline
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup up queue mapping. Thomas mentioned the following point[1]:
"That was the constraint of managed interrupts from the very beginning:
The driver/subsystem has to quiesce the interrupt line and the associated queue _before_ it gets shutdown in CPU unplug and not fiddle with it until it's restarted by the core when the CPU is plugged in again."
However, current blk-mq implementation doesn't quiesce hw queue before the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a cpuhp state handled after the CPU is down, so there isn't any chance to quiesce the hctx before shutting down the CPU.
Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs where the last CPU goes away, and wait for completion of in-flight requests. This guarantees that there is no inflight I/O before shutting down the managed IRQ.
Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need to wait for completion of in-flight requests from these drivers to avoid a potential dead-lock. It is safe to do this for stacking drivers as those do not use interrupts at all and their I/O completions are triggered by underlying devices I/O completion.
[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[hch: different retry mechanism, merged two patches, minor cleanups]
Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.4.43 |
|
#
d29b92f5 |
| 24-May-2020 |
Colin Ian King <colin.king@canonical.com> |
loop: remove redundant assignment to variable error
The variable error is being assigned a value that is never read so the assignment is redundant and can be removed.
Addresses-Coverity: ("Unused v
loop: remove redundant assignment to variable error
The variable error is being assigned a value that is never read so the assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|