Revision tags: v4.13.16, v4.14, v4.13.5 |
|
#
0b508bc9 |
| 26-Sep-2017 |
Shaohua Li <shli@fb.com> |
block: fix a build error
The code is only for blkcg not for all cgroups
Fixes: d4478e92d618 ("block/loop: make loop cgroup aware") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off
block: fix a build error
The code is only for blkcg not for all cgroups
Fixes: d4478e92d618 ("block/loop: make loop cgroup aware") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
d4478e92 |
| 25-Sep-2017 |
Shaohua Li <shli@fb.com> |
block/loop: make loop cgroup aware
loop block device handles IO in a separate thread. The actual IO dispatched isn't cloned from the IO loop device received, so the dispatched IO loses the cgroup co
block/loop: make loop cgroup aware
loop block device handles IO in a separate thread. The actual IO dispatched isn't cloned from the IO loop device received, so the dispatched IO loses the cgroup context.
I'm ignoring buffer IO case now, which is quite complicated. Making the loop thread aware cgroup context doesn't really help. The loop device only writes to a single file. In current writeback cgroup implementation, the file can only belong to one cgroup.
For direct IO case, we could workaround the issue in theory. For example, say we assign cgroup1 5M/s BW for loop device and cgroup2 10M/s. We can create a special cgroup for loop thread and assign at least 15M/s for the underlayer disk. In this way, we correctly throttle the two cgroups. But this is tricky to setup.
This patch tries to address the issue. We record bio's css in loop command. When loop thread is handling the command, we then use the API provided in patch 1 to set the css for current task. The bio layer will use the css for new IO (from patch 3).
Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
bf093753 |
| 05-Sep-2017 |
Omar Sandoval <osandov@fb.com> |
loop: set physical block size to logical block size
Commit 6c6b6f28b333 ("loop: set physical block size to PAGE_SIZE") caused mkfs.xfs to barf on ppc64 [1]. Always using PAGE_SIZE as the physical bl
loop: set physical block size to logical block size
Commit 6c6b6f28b333 ("loop: set physical block size to PAGE_SIZE") caused mkfs.xfs to barf on ppc64 [1]. Always using PAGE_SIZE as the physical block size still makes the most sense semantically, but let's just lie and always set it to the same value as the logical block size (same goes for io_min). In the future we might want to at least bump up io_min to PAGE_SIZE but I'm sick of these stupid changes so let's play it safe.
1: https://marc.info/?l=linux-xfs&m=150459024723753&w=2
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v4.13 |
|
#
92d77332 |
| 01-Sep-2017 |
Shaohua Li <shli@fb.com> |
block/loop: fix use after free
lo_rw_aio->call_read_iter-> 1 aops->direct_IO 2 iov_iter_revert lo_rw_aio_complete could happen between 1 and 2, the bio and bvec could be freed before 2,
block/loop: fix use after free
lo_rw_aio->call_read_iter-> 1 aops->direct_IO 2 iov_iter_revert lo_rw_aio_complete could happen between 1 and 2, the bio and bvec could be freed before 2, which accesses bvec.
Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
40326d8a |
| 01-Sep-2017 |
Shaohua Li <shli@fb.com> |
block/loop: allow request merge for directio mode
Currently loop disables merge. While it makes sense for buffer IO mode, directio mode can benefit from request merge. Without merge, loop could send
block/loop: allow request merge for directio mode
Currently loop disables merge. While it makes sense for buffer IO mode, directio mode can benefit from request merge. Without merge, loop could send small size IO to underlayer disk and harm performance.
Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
54bb0ade |
| 01-Sep-2017 |
Shaohua Li <shli@fb.com> |
block/loop: set hw_sectors
Loop can handle any size of request. Limiting it to 255 sectors just burns the CPU for bio split and request merge for underlayer disk and also cause bad fs block allocati
block/loop: set hw_sectors
Loop can handle any size of request. Limiting it to 255 sectors just burns the CPU for bio split and request merge for underlayer disk and also cause bad fs block allocation in directio mode.
Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
43cade80 |
| 24-Aug-2017 |
Omar Sandoval <osandov@fb.com> |
loop: fold loop_switch() into callers
The comments here are really outdated, and blk-mq made flushing much simpler, so just fold the two cases into the callers.
Reviewed-by: Ming Lei <ming.lei@redh
loop: fold loop_switch() into callers
The comments here are really outdated, and blk-mq made flushing much simpler, so just fold the two cases into the callers.
Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
89e4fdec |
| 24-Aug-2017 |
Omar Sandoval <osandov@fb.com> |
loop: add ioctl for changing logical block size
This is a different approach from the first attempt in f2c6df7dbf9a ("loop: support 4k physical blocksize"). Rather than extending LOOP_{GET,SET}_STAT
loop: add ioctl for changing logical block size
This is a different approach from the first attempt in f2c6df7dbf9a ("loop: support 4k physical blocksize"). Rather than extending LOOP_{GET,SET}_STATUS, add a separate ioctl just for setting the block size.
Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
6c6b6f28 |
| 24-Aug-2017 |
Omar Sandoval <osandov@fb.com> |
loop: set physical block size to PAGE_SIZE
The physical block size is "the lowest possible sector size that the hardware can operate on without reverting to read-modify-write operations" (from the c
loop: set physical block size to PAGE_SIZE
The physical block size is "the lowest possible sector size that the hardware can operate on without reverting to read-modify-write operations" (from the comment on blk_queue_physical_block_size()). Since loop does buffered I/O on the backing file by default, the RMW unit is a page. This isn't the case for direct I/O mode, but let's keep it simple.
Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
8a0740c4 |
| 24-Aug-2017 |
Omar Sandoval <osandov@fb.com> |
loop: get rid of lo_blocksize
This is only used for setting the soft block size on the struct block_device once and then never used again.
Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: H
loop: get rid of lo_blocksize
This is only used for setting the soft block size on the struct block_device once and then never used again.
Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
1e6ec9ea |
| 23-Aug-2017 |
Omar Sandoval <osandov@fb.com> |
Revert "loop: support 4k physical blocksize"
There's some stuff still up in the air, let's not get stuck with a subpar ABI. I'll follow up with something better for 4.14.
Signed-off-by: Omar Sandov
Revert "loop: support 4k physical blocksize"
There's some stuff still up in the air, let's not get stuck with a subpar ABI. I'll follow up with something better for 4.14.
Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
a8c1d064 |
| 07-Aug-2017 |
Anton Volkov <avolkov@ispras.ru> |
loop: fix to a race condition due to the early registration of device
The early device registration made possible a race leading to allocations of disks with wrong minors.
This patch moves the devi
loop: fix to a race condition due to the early registration of device
The early device registration made possible a race leading to allocations of disks with wrong minors.
This patch moves the device registration further down the loop_init function to make the race infeasible.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Volkov <avolkov@ispras.ru> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v4.12 |
|
#
abbb6589 |
| 27-May-2017 |
Christoph Hellwig <hch@lst.de> |
fs: implement vfs_iter_write using do_iter_write
De-dupliate some code and allow for passing the flags argument to vfs_iter_write. Additionally it now properly updates timestamps.
Signed-off-by: C
fs: implement vfs_iter_write using do_iter_write
De-dupliate some code and allow for passing the flags argument to vfs_iter_write. Additionally it now properly updates timestamps.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
18e9710e |
| 27-May-2017 |
Christoph Hellwig <hch@lst.de> |
fs: implement vfs_iter_read using do_iter_read
De-dupliate some code and allow for passing the flags argument to vfs_iter_read. Additional it properly updates atime now.
Signed-off-by: Christoph H
fs: implement vfs_iter_read using do_iter_read
De-dupliate some code and allow for passing the flags argument to vfs_iter_read. Additional it properly updates atime now.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
b2ee7d46 |
| 16-Jun-2017 |
NeilBrown <neilb@suse.com> |
loop: Add PF_LESS_THROTTLE to block/loop device thread.
When a filesystem is mounted from a loop device, writes are throttled by balance_dirty_pages() twice: once when writing to the filesystem and
loop: Add PF_LESS_THROTTLE to block/loop device thread.
When a filesystem is mounted from a loop device, writes are throttled by balance_dirty_pages() twice: once when writing to the filesystem and once when the loop_handle_cmd() writes to the backing file. This double-throttling can trigger positive feedback loops that create significant delays. The throttling at the lower level is seen by the upper level as a slow device, so it throttles extra hard.
The PF_LESS_THROTTLE flag was created to handle exactly this circumstance, though with an NFS filesystem mounted from a local NFS server. It reduces the throttling on the lower layer so that it can proceed largely unthrottled.
To demonstrate this, create a filesystem on a loop device and write (e.g. with dd) several large files which combine to consume significantly more than the limit set by /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total time taken.
When I do this directly on a device (no loop device) the total time for several runs (mkfs, mount, write 200 files, umount) is fairly stable: 28-35 seconds. When I do this over a loop device the times are much worse and less stable. 52-460 seconds. Half below 100seconds, half above. When I apply this patch, the times become stable again, though not as fast as the no-loop-back case: 53-72 seconds.
There may be room for further improvement as the total overhead still seems too high, but this is a big improvement.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
fc17b653 |
| 03-Jun-2017 |
Christoph Hellwig <hch@lst.de> |
blk-mq: switch ->queue_rq return value to blk_status_t
Use the same values for use for request completion errors as the return value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause a re
blk-mq: switch ->queue_rq return value to blk_status_t
Use the same values for use for request completion errors as the return value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause a requeue, and all the others are completed as-is.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
2a842aca |
| 03-Jun-2017 |
Christoph Hellwig <hch@lst.de> |
block: introduce new block status code type
Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead int
block: introduce new block status code type
Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead introduces a new blk_status_t value that holds block layer specific status codes and explicitly explains their meaning. Helpers to convert from and to the previous special meanings are provided for now, but I suspect we want to get rid of them in the long run - those drivers that have a errno input (e.g. networking) usually get errnos that don't know about the special block layer overloads, and similarly returning them to userspace will usually return somethings that strictly speaking isn't correct for file system operations, but that's left as an exercise for later.
For now the set of errors is a very limited set that closely corresponds to the previous overloaded errno values, but there is some low hanging fruite to improve it.
blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse typechecking, so that we can easily catch places passing the wrong values.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
b040ad9c |
| 09-Jun-2017 |
Arnd Bergmann <arnd@arndb.de> |
loop: fix error handling regression
gcc points out an unusual indentation:
drivers/block/loop.c: In function 'loop_set_status': drivers/block/loop.c:1149:3: error: this 'if' clause does not guard..
loop: fix error handling regression
gcc points out an unusual indentation:
drivers/block/loop.c: In function 'loop_set_status': drivers/block/loop.c:1149:3: error: this 'if' clause does not guard... [-Werror=misleading-indentation] if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit, ^~ drivers/block/loop.c:1152:4: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if' goto exit;
This was introduced by a new feature that accidentally moved the opening braces from one condition to another. Adding a second pair of braces makes it work correctly again and also more readable.
Fixes: f2c6df7dbf9a ("loop: support 4k physical blocksize") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
f2c6df7d |
| 08-Jun-2017 |
Hannes Reinecke <hare@suse.de> |
loop: support 4k physical blocksize
When generating bootable VM images certain systems (most notably s390x) require devices with 4k blocksize. This patch implements a new flag 'LO_FLAGS_BLOCKSIZE' w
loop: support 4k physical blocksize
When generating bootable VM images certain systems (most notably s390x) require devices with 4k blocksize. This patch implements a new flag 'LO_FLAGS_BLOCKSIZE' which will set the physical blocksize to that of the underlying device, and allow to change the logical blocksize for up to the physical blocksize.
Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
51001b7d |
| 08-Jun-2017 |
Hannes Reinecke <hare@suse.de> |
loop: Remove unused 'bdev' argument from loop_set_capacity
Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
64604957 |
| 08-Jun-2017 |
James Wang <jnwang@suse.com> |
Fix loop device flush before configure v3
While installing SLES-12 (based on v4.4), I found that the installer will stall for 60+ seconds during LVM disk scan. The root cause was determined to be t
Fix loop device flush before configure v3
While installing SLES-12 (based on v4.4), I found that the installer will stall for 60+ seconds during LVM disk scan. The root cause was determined to be the removal of a bound device check in loop_flush() by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq").
Restoring this check, examining ->lo_state as set by loop_set_fd() eliminates the bad behavior.
Test method: modprobe loop max_loop=64 dd if=/dev/zero of=disk bs=512 count=200K for((i=0;i<4;i++))do losetup -f disk; done mkfs.ext4 -F /dev/loop0 for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done for f in `ls /dev/loop[0-9]*|sort`; do \ echo $f; dd if=$f of=/dev/null bs=512 count=1; \ done
Test output: stock patched /dev/loop0 18.1217e-05 8.3842e-05 /dev/loop1 6.1114e-05 0.000147979 /dev/loop10 0.414701 0.000116564 /dev/loop11 0.7474 6.7942e-05 /dev/loop12 0.747986 8.9082e-05 /dev/loop13 0.746532 7.4799e-05 /dev/loop14 0.480041 9.3926e-05 /dev/loop15 1.26453 7.2522e-05
Note that from loop10 onward, the device is not mounted, yet the stock kernel consumes several orders of magnitude more wall time than it does for a mounted device. (Thanks for Mike Galbraith <efault@gmx.de>, give a changelog review.)
Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: James Wang <jnwang@suse.com> Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq") Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
Revision tags: v4.10.17, v4.10.16, v4.10.15, v4.10.14 |
|
#
d6296d39 |
| 01-May-2017 |
Christoph Hellwig <hch@lst.de> |
blk-mq: update ->init_request and ->exit_request prototypes
Remove the request_idx parameter, which can't be used safely now that we support I/O schedulers with blk-mq. Except for a superflous chec
blk-mq: update ->init_request and ->exit_request prototypes
Remove the request_idx parameter, which can't be used safely now that we support I/O schedulers with blk-mq. Except for a superflous check in mtip32xx it was unused anyway.
Also pass the tag_set instead of just the driver data - this allows drivers to avoid some code duplication in a follow on cleanup.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
Revision tags: v4.10.13, v4.10.12 |
|
#
08e0029a |
| 20-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
blk-mq: remove the error argument to blk_mq_complete_request
Now that all drivers that call blk_mq_complete_requests have a ->complete callback we can remove the direct call to blk_mq_end_request, a
blk-mq: remove the error argument to blk_mq_complete_request
Now that all drivers that call blk_mq_complete_requests have a ->complete callback we can remove the direct call to blk_mq_end_request, as well as the error argument to blk_mq_complete_request.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
fe2cb290 |
| 20-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
loop: zero-fill bio on the submitting cpu
In thruth I've just audited which blk-mq drivers don't currently have a complete callback, but I think this change is at least borderline useful.
Signed-of
loop: zero-fill bio on the submitting cpu
In thruth I've just audited which blk-mq drivers don't currently have a complete callback, but I think this change is at least borderline useful.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
Revision tags: v4.10.11, v4.10.10, v4.10.9 |
|
#
48920ff2 |
| 05-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
block: remove the discard_zeroes_data flag
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can kill this hack.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mart
block: remove the discard_zeroes_data flag
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can kill this hack.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|