Searched hist:b5dd2f6047ca108001328aac0e8588edd15f1778 (Results 1 – 1 of 1) sorted by relevance
/openbmc/linux/drivers/block/ |
H A D | loop.c | diff 4d4e41aef9429872ea3b105e83426941f7185ab6 Tue May 05 06:49:55 CDT 2015 Ming Lei <ming.lei@canonical.com> block: loop: avoiding too many pending per work I/O
If there are too many pending per work I/O, too many high priority work thread can be generated so that system performance can be effected.
This patch limits the max_active parameter of workqueue as 16.
This patch fixes Fedora 22 live booting performance regression when it is booted from squashfs over dm based on loop, and looks the following reasons are related with the problem:
- not like other filesyststems(such as ext4), squashfs is a bit special, and I observed that increasing I/O jobs to access file in squashfs only improve I/O performance a little, but it can make big difference for ext4
- nested loop: both squashfs.img and ext3fs.img are mounted as loop block, and ext3fs.img is inside the squashfs
- during booting, lots of tasks may run concurrently
Fixes: b5dd2f6047ca108001328aac0e8588edd15f1778 Cc: stable@vger.kernel.org (v4.0) Cc: Justin M. Forbes <jforbes@fedoraproject.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> diff f4aa4c7bbac6c4afdd4adccf90898c1a3685396d Tue May 05 06:49:54 CDT 2015 Ming Lei <ming.lei@canonical.com> block: loop: convert to per-device workqueue
Documentation/workqueue.txt: If there is dependency among multiple work items used during memory reclaim, they should be queued to separate wq each with WQ_MEM_RECLAIM.
Loop devices can be stacked, so we have to convert to per-device workqueue. One example is Fedora live CD.
Fixes: b5dd2f6047ca108001328aac0e8588edd15f1778 Cc: stable@vger.kernel.org (v4.0) Cc: Justin M. Forbes <jforbes@fedoraproject.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com> diff b5dd2f6047ca108001328aac0e8588edd15f1778 Wed Dec 31 07:22:57 CST 2014 Ming Lei <ming.lei@canonical.com> block: loop: improve performance via blk-mq
The conversion is a bit straightforward, and use work queue to dispatch requests of loop block, and one big change is that requests is submitted to backend file/device concurrently with work queue, so throughput may get improved much. Given write requests over same file are often run exclusively, so don't handle them concurrently for avoiding extra context switch cost, possible lock contention and work schedule cost. Also with blk-mq, there is opportunity to get loop I/O merged before submitting to backend file/device.
In the following test: - base: v3.19-rc2-2041231 - loop over file in ext4 file system on SSD disk - bs: 4k, libaio, io depth: 64, O_DIRECT, num of jobs: 1 - throughput: IOPS
------------------------------------------------------ | | base | base with loop-mq | delta | ------------------------------------------------------ | randread | 1740 | 25318 | +1355%| ------------------------------------------------------ | read | 42196 | 51771 | +22.6%| ----------------------------------------------------- | randwrite | 35709 | 34624 | -3% | ----------------------------------------------------- | write | 39137 | 40326 | +3% | -----------------------------------------------------
So loop-mq can improve throughput for both read and randread, meantime, performance of write and randwrite isn't hurted basically.
Another benefit is that loop driver code gets simplified much after blk-mq conversion, and the patch can be thought as cleanup too.
Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|