#
963ab9e5 |
| 02-Aug-2012 |
Asias He <asias@redhat.com> |
block: Introduce __blk_segment_map_sg() helper
Split the mapping code in blk_rq_map_sg() to a helper __blk_segment_map_sg(), so that other mapping function, e.g. blk_bio_map_sg(), can share the code
block: Introduce __blk_segment_map_sg() helper
Split the mapping code in blk_rq_map_sg() to a helper __blk_segment_map_sg(), so that other mapping function, e.g. blk_bio_map_sg(), can share the code.
Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Hellwig <hch@lst.de> Cc: Tejun Heo <tj@kernel.org> Cc: Shaohua Li <shli@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Suggested-by: Jens Axboe <axboe@kernel.dk> Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v3.5, v3.5-rc7, v3.5-rc6, v3.5-rc5, v3.5-rc4, v3.5-rc3, v3.5-rc2, v3.5-rc1, v3.4, v3.4-rc7, v3.4-rc6, v3.4-rc5, v3.4-rc4, v3.4-rc3, v3.4-rc2, v3.4-rc1, v3.3, v3.3-rc7, v3.3-rc6, v3.3-rc5, v3.3-rc4, v3.3-rc3 |
|
#
050c8ea8 |
| 08-Feb-2012 |
Tejun Heo <tj@kernel.org> |
block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions
blk_rq_merge_ok() is the elevator-neutral part of merge eligibility test. blk_try_merge() determines merge directio
block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions
blk_rq_merge_ok() is the elevator-neutral part of merge eligibility test. blk_try_merge() determines merge direction and expects the caller to have tested elv_rq_merge_ok() previously.
elv_rq_merge_ok() now wraps blk_rq_merge_ok() and then calls elv_iosched_allow_merge(). elv_try_merge() is removed and the two callers are updated to call elv_rq_merge_ok() explicitly followed by blk_try_merge(). While at it, make rq_merge_ok() functions return bool.
This is to prepare for plug merge update and doesn't introduce any behavior change.
This is based on Jens' patch to skip elevator_allow_merge_fn() from plug merge.
Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <4F16F3CA.90904@kernel.dk> Original-patch-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v3.3-rc2, v3.3-rc1, v3.2, v3.2-rc7, v3.2-rc6, v3.2-rc5, v3.2-rc4, v3.2-rc3, v3.2-rc2, v3.2-rc1, v3.1, v3.1-rc10, v3.1-rc9, v3.1-rc8, v3.1-rc7, v3.1-rc6, v3.1-rc5, v3.1-rc4, v3.1-rc3, v3.1-rc2, v3.1-rc1, v3.0, v3.0-rc7, v3.0-rc6, v3.0-rc5, v3.0-rc4, v3.0-rc3, v3.0-rc2, v3.0-rc1, v2.6.39, v2.6.39-rc7, v2.6.39-rc6, v2.6.39-rc5, v2.6.39-rc4, v2.6.39-rc3, v2.6.39-rc2, v2.6.39-rc1 |
|
#
5e84ea3a |
| 21-Mar-2011 |
Jens Axboe <jaxboe@fusionio.com> |
block: attempt to merge with existing requests on plug flush
One of the disadvantages of on-stack plugging is that we potentially lose out on merging since all pending IO isn't always visible to eve
block: attempt to merge with existing requests on plug flush
One of the disadvantages of on-stack plugging is that we potentially lose out on merging since all pending IO isn't always visible to everybody. When we flush the on-stack plugs, right now we don't do any checks to see if potential merge candidates could be utilized.
Correct this by adding a new insert variant, ELEVATOR_INSERT_SORT_MERGE. It works just ELEVATOR_INSERT_SORT, but first checks whether we can merge with an existing request before doing the insertion (if we fail merging).
This fixes a regression with multiple processes issuing IO that can be merged.
Thanks to Shaohua Li <shaohua.li@intel.com> for testing and fixing an accounting bug.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
show more ...
|
Revision tags: v2.6.38, v2.6.38-rc8, v2.6.38-rc7, v2.6.38-rc6, v2.6.38-rc5, v2.6.38-rc4, v2.6.38-rc3, v2.6.38-rc2, v2.6.38-rc1 |
|
#
6c23a968 |
| 07-Jan-2011 |
Jens Axboe <jaxboe@fusionio.com> |
block: add internal hd part table references
We can't use krefs since it's apparently restricted to very basic reference counting.
This reverts commit e4a683c8.
Signed-off-by: Jens Axboe <jaxboe@f
block: add internal hd part table references
We can't use krefs since it's apparently restricted to very basic reference counting.
This reverts commit e4a683c8.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
show more ...
|
#
09e099d4 |
| 05-Jan-2011 |
Jerome Marchand <jmarchan@redhat.com> |
block: fix accounting bug on cross partition merges
/proc/diskstats would display a strange output as follows.
$ cat /proc/diskstats |grep sda 8 0 sda 90524 7579 102154 20464 0 0 0 0 0 140
block: fix accounting bug on cross partition merges
/proc/diskstats would display a strange output as follows.
$ cat /proc/diskstats |grep sda 8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089 8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691 ~~~~~~~~~~ 8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390 8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92 8 4 sda4 4 0 8 0 0 0 0 0 0 0 0 8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137
Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.
The detailed root cause is as follows.
Assuming that there are two partition, sda1 and sda2.
1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight is 0 and sda2's one is 1.
| hd_struct->in_flight --------------------------- sda1 | 0 sda2 | 1 ---------------------------
2. A bio belongs to sda1 is issued and is merged into the request mentioned on step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed from sda2 region to sda1 region. However the two partition's hd_struct->in_flight are not changed.
| hd_struct->in_flight --------------------------- sda1 | 0 sda2 | 1 ---------------------------
3. The request is finished and blk_account_io_done() is called. In this case, sda2's hd_struct->in_flight, not a sda1's one, is decremented.
| hd_struct->in_flight --------------------------- sda1 | -1 sda2 | 1 ---------------------------
The patch fixes the problem by caching the partition lookup inside the request structure, hence making sure that the increment and decrement will always happen on the same partition struct. This also speeds up IO with accounting enabled, since it cuts down on the number of lookups we have to do.
Also add a refcount to struct hd_struct to keep the partition in memory as long as users exist. We use kref_test_and_get() to ensure we don't add a reference to a partition which is going away.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
show more ...
|
Revision tags: v2.6.37, v2.6.37-rc8, v2.6.37-rc7, v2.6.37-rc6, v2.6.37-rc5 |
|
#
e692cb66 |
| 01-Dec-2010 |
Martin K. Petersen <martin.petersen@oracle.com> |
block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
When stacking devices, a request_queue is not always available. This forced us to have a no_cluster flag in the queue_limits that cou
block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
When stacking devices, a request_queue is not always available. This forced us to have a no_cluster flag in the queue_limits that could be used as a carrier until the request_queue had been set up for a metadevice.
There were several problems with that approach. First of all it was up to the stacking device to remember to set queue flag after stacking had completed. Also, the queue flag and the queue limits had to be kept in sync at all times. We got that wrong, which could lead to us issuing commands that went beyond the max scatterlist limit set by the driver.
The proper fix is to avoid having two flags for tracking the same thing. We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the block layer merging functions. The queue_limit 'no_cluster' is turned into 'cluster' to avoid double negatives and to ease stacking. Clustering defaults to being enabled as before. The queue flag logic is removed from the stacking function, and explicitly setting the cluster flag is no longer necessary in DM and MD.
Reported-by: Ed Lin <ed.lin@promise.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
show more ...
|
Revision tags: v2.6.37-rc4, v2.6.37-rc3, v2.6.37-rc2, v2.6.37-rc1 |
|
#
f253b86b |
| 24-Oct-2010 |
Jens Axboe <jaxboe@fusionio.com> |
Revert "block: fix accounting bug on cross partition merges"
This reverts commit 7681bfeeccff5efa9eb29bf09249a3c400b15327.
Conflicts:
include/linux/genhd.h
It has numerous issues with the cleanu
Revert "block: fix accounting bug on cross partition merges"
This reverts commit 7681bfeeccff5efa9eb29bf09249a3c400b15327.
Conflicts:
include/linux/genhd.h
It has numerous issues with the cleanup path and non-elevator devices. Revert it for now so we can come up with a clean version without rushing things.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
show more ...
|
Revision tags: v2.6.36 |
|
#
7681bfee |
| 19-Oct-2010 |
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> |
block: fix accounting bug on cross partition merges
/proc/diskstats would display a strange output as follows.
$ cat /proc/diskstats |grep sda 8 0 sda 90524 7579 102154 20464 0 0 0 0 0 140
block: fix accounting bug on cross partition merges
/proc/diskstats would display a strange output as follows.
$ cat /proc/diskstats |grep sda 8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089 8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691 ~~~~~~~~~~ 8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390 8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92 8 4 sda4 4 0 8 0 0 0 0 0 0 0 0 8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137
Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.
The detailed root cause is as follows.
Assuming that there are two partition, sda1 and sda2.
1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight is 0 and sda2's one is 1.
| hd_struct->in_flight --------------------------- sda1 | 0 sda2 | 1 ---------------------------
2. A bio belongs to sda1 is issued and is merged into the request mentioned on step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed from sda2 region to sda1 region. However the two partition's hd_struct->in_flight are not changed.
| hd_struct->in_flight --------------------------- sda1 | 0 sda2 | 1 ---------------------------
3. The request is finished and blk_account_io_done() is called. In this case, sda2's hd_struct->in_flight, not a sda1's one, is decremented.
| hd_struct->in_flight --------------------------- sda1 | -1 sda2 | 1 ---------------------------
The patch fixes the problem by caching the partition lookup inside the request structure, hence making sure that the increment and decrement will always happen on the same partition struct. This also speeds up IO with accounting enabled, since it cuts down on the number of lookups we have to do.
When reloading partition tables, quiesce IO to ensure that no request references to the partition struct exists. When it is safe to free the partition table, the IO for that device is restarted again.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
show more ...
|
Revision tags: v2.6.36-rc8, v2.6.36-rc7, v2.6.36-rc6 |
|
#
f281fb5f |
| 25-Sep-2010 |
Adrian Hunter <adrian.hunter@nokia.com> |
block: prevent merges of discard and write requests
Add logic to prevent two I/O requests being merged if only one of them is a discard. Ditto secure discard.
Without this fix, it is possible for
block: prevent merges of discard and write requests
Add logic to prevent two I/O requests being merged if only one of them is a discard. Ditto secure discard.
Without this fix, it is possible for write requests to transform into discard requests. For example:
Submit bio 1 to discard 8 sectors from sector n Submit bio 2 to write 8 sectors from sector n + 16 Submit bio 3 to write 8 sectors from sector n + 8
Bio 1 becomes request 1. Bio 2 becomes request 2. Bio 3 is merged with request 2, and then subsequently request 2 is merged with request 1 resulting in just one I/O request which discards all 24 sectors.
Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com>
(Moved the checks above the position checks /Jens)
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
show more ...
|
Revision tags: v2.6.36-rc5, v2.6.36-rc4 |
|
#
13f05c8d |
| 10-Sep-2010 |
Martin K. Petersen <martin.petersen@oracle.com> |
block/scsi: Provide a limit on the number of integrity segments
Some controllers have a hardware limit on the number of protection information scatter-gather list segments they can handle.
Introduc
block/scsi: Provide a limit on the number of integrity segments
Some controllers have a hardware limit on the number of protection information scatter-gather list segments they can handle.
Introduce a max_integrity_segments limit in the block layer and provide a new scsi_host_template setting that allows HBA drivers to provide a value suitable for the hardware.
Add support for honoring the integrity segment limit when merging both bios and requests.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
show more ...
|
Revision tags: v2.6.36-rc3, v2.6.36-rc2, v2.6.36-rc1, v2.6.35, v2.6.35-rc6, v2.6.35-rc5, v2.6.35-rc4 |
|
#
2c8919de |
| 21-Jun-2010 |
Andi Kleen <andi@firstfloor.org> |
gcc-4.6: block: fix unused but set variables in blk-merge
Just some dead code.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
gcc-4.6: block: fix unused but set variables in blk-merge
Just some dead code.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
show more ...
|
#
7b6d91da |
| 07-Aug-2010 |
Christoph Hellwig <hch@lst.de> |
block: unify flags for struct bio and struct request
Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem d
block: unify flags for struct bio and struct request
Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them.
Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
show more ...
|
#
33659ebb |
| 07-Aug-2010 |
Christoph Hellwig <hch@lst.de> |
block: remove wrappers for request type/flags
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in struct requests. This allows much easier grepping for different request types
block: remove wrappers for request type/flags
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in struct requests. This allows much easier grepping for different request types instead of unwinding through macros.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
show more ...
|
Revision tags: v2.6.35-rc3, v2.6.35-rc2, v2.6.35-rc1, v2.6.34, v2.6.34-rc7, v2.6.34-rc6, v2.6.34-rc5, v2.6.34-rc4, v2.6.34-rc3, v2.6.34-rc2, v2.6.34-rc1 |
|
#
8a78362c |
| 25-Feb-2010 |
Martin K. Petersen <martin.petersen@oracle.com> |
block: Consolidate phys_segment and hw_segment limits
Except for SCSI no device drivers distinguish between physical and hardware segment limits. Consolidate the two into a single segment limit.
S
block: Consolidate phys_segment and hw_segment limits
Except for SCSI no device drivers distinguish between physical and hardware segment limits. Consolidate the two into a single segment limit.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
show more ...
|
Revision tags: v2.6.33, v2.6.33-rc8, v2.6.33-rc7, v2.6.33-rc6, v2.6.33-rc5, v2.6.33-rc4, v2.6.33-rc3, v2.6.33-rc2, v2.6.33-rc1, v2.6.32, v2.6.32-rc8, v2.6.32-rc7, v2.6.32-rc6, v2.6.32-rc5, v2.6.32-rc4 |
|
#
316d315b |
| 06-Oct-2009 |
Nikanth Karthikesan <knikanth@suse.de> |
block: Seperate read and write statistics of in_flight requests v2
Commit a9327cac440be4d8333bba975cbbf76045096275 added seperate read and write statistics of in_flight requests. And exported the nu
block: Seperate read and write statistics of in_flight requests v2
Commit a9327cac440be4d8333bba975cbbf76045096275 added seperate read and write statistics of in_flight requests. And exported the number of read and write requests in progress seperately through sysfs.
But Corrado Zoccolo <czoccolo@gmail.com> reported getting strange output from "iostat -kx 2". Global values for service time and utilization were garbage. For interval values, utilization was always 100%, and service time is higher than normal.
So this was reverted by commit 0f78ab9899e9d6acb09d5465def618704255963b
The problem was in part_round_stats_single(), I missed the following: if (now == part->stamp) return;
- if (part->in_flight) { + if (part_in_flight(part)) { __part_stat_add(cpu, part, time_in_queue, part_in_flight(part) * (now - part->stamp)); __part_stat_add(cpu, part, io_ticks, (now - part->stamp));
With this chunk included, the reported regression gets fixed.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
-- Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
show more ...
|
Revision tags: v2.6.32-rc3 |
|
#
0f78ab98 |
| 04-Oct-2009 |
Jens Axboe <jens.axboe@oracle.com> |
Revert "Seperate read and write statistics of in_flight requests"
This reverts commit a9327cac440be4d8333bba975cbbf76045096275.
Corrado Zoccolo <czoccolo@gmail.com> reports:
"with 2.6.32-rc1 I sta
Revert "Seperate read and write statistics of in_flight requests"
This reverts commit a9327cac440be4d8333bba975cbbf76045096275.
Corrado Zoccolo <czoccolo@gmail.com> reports:
"with 2.6.32-rc1 I started getting the following strange output from "iostat -kx 2": Linux 2.6.31bisect (et2) 04/10/2009 _i686_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle 10,70 0,00 3,16 15,75 0,00 70,38
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 18,22 0,00 0,67 0,01 14,77 0,02 43,94 0,01 10,53 39043915,03 2629219,87 sdb 60,89 9,68 50,79 3,04 1724,43 50,52 65,95 0,70 13,06 488437,47 2629219,87
avg-cpu: %user %nice %system %iowait %steal %idle 2,72 0,00 0,74 0,00 0,00 96,53
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00
avg-cpu: %user %nice %system %iowait %steal %idle 6,68 0,00 0,99 0,00 0,00 92,33
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00
avg-cpu: %user %nice %system %iowait %steal %idle 4,40 0,00 0,73 1,47 0,00 93,40
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 4,00 0,00 3,00 0,00 28,00 18,67 0,06 19,50 333,33 100,00
Global values for service time and utilization are garbage. For interval values, utilization is always 100%, and service time is higher than normal.
I bisected it down to: [a9327cac440be4d8333bba975cbbf76045096275] Seperate read and write statistics of in_flight requests and verified that reverting just that commit indeed solves the issue on 2.6.32-rc1."
So until this is debugged, revert the bad commit.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
show more ...
|
Revision tags: v2.6.32-rc1, v2.6.32-rc2 |
|
#
a9327cac |
| 11-Sep-2009 |
Nikanth Karthikesan <knikanth@suse.de> |
Seperate read and write statistics of in_flight requests
Currently, there is a single in_flight counter measuring the number of requests in the request_queue. But some monitoring tools would like to
Seperate read and write statistics of in_flight requests
Currently, there is a single in_flight counter measuring the number of requests in the request_queue. But some monitoring tools would like to know how many read requests and write requests are in progress. Split the current in_flight counter into two seperate counters for read and write.
This information is exported as a sysfs attribute, as changing the currently available stat files would break the existing tools.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
show more ...
|
#
da6c5c72 |
| 11-Sep-2009 |
Tejun Heo <tj@kernel.org> |
scsi,block: update SCSI to handle mixed merge failures
Update scsi_io_completion() such that it only fails requests till the next error boundary and retry the leftover. This enables block layer to
scsi,block: update SCSI to handle mixed merge failures
Update scsi_io_completion() such that it only fails requests till the next error boundary and retry the leftover. This enables block layer to merge requests with different failfast settings and still behave correctly on errors. Allow merge of requests of different failfast settings.
As SCSI is currently the only subsystem which follows failfast status, there's no need to worry about other block drivers for now.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Niel Lambrechts <niel.lambrechts@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
show more ...
|
Revision tags: v2.6.31, v2.6.31-rc9, v2.6.31-rc8, v2.6.31-rc7, v2.6.31-rc6, v2.6.31-rc5, v2.6.31-rc4, v2.6.31-rc3, v2.6.31-rc2 |
|
#
80a761fd |
| 03-Jul-2009 |
Tejun Heo <tj@kernel.org> |
block: implement mixed merge of different failfast requests
Failfast has characteristics from other attributes. When issuing, executing and successuflly completing requests, failfast doesn't make a
block: implement mixed merge of different failfast requests
Failfast has characteristics from other attributes. When issuing, executing and successuflly completing requests, failfast doesn't make any difference. It only affects how a request is handled on failure. Allowing requests with different failfast settings to be merged cause normal IOs to fail prematurely while not allowing has performance penalties as failfast is used for read aheads which are likely to be located near in-flight or to-be-issued normal IOs.
This patch introduces the concept of 'mixed merge'. A request is a mixed merge if it is merge of segments which require different handling on failure. Currently the only mixable attributes are failfast ones (or lack thereof).
When a bio with different failfast settings is added to an existing request or requests of different failfast settings are merged, the merged request is marked mixed. Each bio carries failfast settings and the request always tracks failfast state of the first bio. When the request fails, blk_rq_err_bytes() can be used to determine how many bytes can be safely failed without crossing into an area which requires further retrials.
This allows request merging regardless of failfast settings while keeping the failure handling correct.
This patch only implements mixed merge but doesn't enable it. The next one will update SCSI to make use of mixed merge.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Niel Lambrechts <niel.lambrechts@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
show more ...
|
#
ab0fd1de |
| 03-Jul-2009 |
Tejun Heo <tj@kernel.org> |
block: don't merge requests of different failfast settings
Block layer used to merge requests and bios with different failfast settings. This caused regular IOs to fail prematurely when they were m
block: don't merge requests of different failfast settings
Block layer used to merge requests and bios with different failfast settings. This caused regular IOs to fail prematurely when they were merged into failfast requests for readahead.
Niel Lambrechts could trigger the problem semi-reliably on ext4 when resuming from STR. ext4 uses readahead when reading inodes and combined with the deterministic extra SATA PHY exception cycle during resume on the specific configuration, non-readahead inode read would fail causing ext4 errors. Please read the following thread for details.
http://lkml.org/lkml/2009/5/23/21
This patch makes block layer reject merging if the failfast settings don't match. This is correct but likely to lower IO performance by preventing regular IOs from mingling into surrounding readahead requests. Changes to allow such mixed merges and handle errors correctly will be added later.
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Niel Lambrechts <niel.lambrechts@gmail.com> Cc: Theodore Tso <tytso@mit.edu> Signed-off-by: Jens Axboe <axboe@carl.(none)>
show more ...
|
Revision tags: v2.6.31-rc1, v2.6.30, v2.6.30-rc8, v2.6.30-rc7 |
|
#
ae03bf63 |
| 22-May-2009 |
Martin K. Petersen <martin.petersen@oracle.com> |
block: Use accessor functions for queue limits
Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly.
Signed-off-by: Martin K.
block: Use accessor functions for queue limits
Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
show more ...
|
Revision tags: v2.6.30-rc6, v2.6.30-rc5 |
|
#
a2dec7b3 |
| 07-May-2009 |
Tejun Heo <tj@kernel.org> |
block: hide request sector and data_len
Block low level drivers for some reason have been pretty good at abusing block layer API. Especially struct request's fields tend to get violated in all poss
block: hide request sector and data_len
Block low level drivers for some reason have been pretty good at abusing block layer API. Especially struct request's fields tend to get violated in all possible ways. Make it clear that low level drivers MUST NOT access or manipulate rq->sector and rq->data_len directly by prefixing them with double underscores.
This change is also necessary to break build of out-of-tree codes which assume the previous block API where internal fields can be manipulated and rq->data_len carries residual count on completion.
[ Impact: hide internal fields, block API change ]
Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
show more ...
|
#
2e46e8b2 |
| 07-May-2009 |
Tejun Heo <tj@kernel.org> |
block: drop request->hard_* and *nr_sectors
struct request has had a few different ways to represent some properties of a request. ->hard_* represent block layer's view of the request progress (com
block: drop request->hard_* and *nr_sectors
struct request has had a few different ways to represent some properties of a request. ->hard_* represent block layer's view of the request progress (completion cursor) and the ones without the prefix are supposed to represent the issue cursor and allowed to be updated as necessary by the low level drivers. The thing is that as block layer supports partial completion, the two cursors really aren't necessary and only cause confusion. In addition, manual management of request detail from low level drivers is cumbersome and error-prone at the very least.
Another interesting duplicate fields are rq->[hard_]nr_sectors and rq->{hard_cur|current}_nr_sectors against rq->data_len and rq->bio->bi_size. This is more convoluted than the hard_ case.
rq->[hard_]nr_sectors are initialized for requests with bio but blk_rq_bytes() uses it only for !pc requests. rq->data_len is initialized for all request but blk_rq_bytes() uses it only for pc requests. This causes good amount of confusion throughout block layer and its drivers and determining the request length has been a bit of black magic which may or may not work depending on circumstances and what the specific LLD is actually doing.
rq->{hard_cur|current}_nr_sectors represent the number of sectors in the contiguous data area at the front. This is mainly used by drivers which transfers data by walking request segment-by-segment. This value always equals rq->bio->bi_size >> 9. However, data length for pc requests may not be multiple of 512 bytes and using this field becomes a bit confusing.
In general, having multiple fields to represent the same property leads only to confusion and subtle bugs. With recent block low level driver cleanups, no driver is accessing or manipulating these duplicate fields directly. Drop all the duplicates. Now rq->sector means the current sector, rq->data_len the current total length and rq->bio->bi_size the current segment length. Everything else is defined in terms of these three and available only through accessors.
* blk_recalc_rq_sectors() is collapsed into blk_update_request() and now handles pc and fs requests equally other than rq->sector update. This means that now pc requests can use partial completion too (no in-kernel user yet tho).
* bio_cur_sectors() is replaced with bio_cur_bytes() as block layer now uses byte count as the primary data length.
* blk_rq_pos() is now guranteed to be always correct. In-block users converted.
* blk_rq_bytes() is now guaranteed to be always valid as is blk_rq_sectors(). In-block users converted.
* blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9. More convenient one is used.
* blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const pointer to request.
[ Impact: API cleanup, single way to represent one property of a request ]
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
show more ...
|
#
83096ebf |
| 07-May-2009 |
Tejun Heo <tj@kernel.org> |
block: convert to pos and nr_sectors accessors
With recent cleanups, there is no place where low level driver directly manipulates request fields. This means that the 'hard' request fields always e
block: convert to pos and nr_sectors accessors
With recent cleanups, there is no place where low level driver directly manipulates request fields. This means that the 'hard' request fields always equal the !hard fields. Convert all rq->sectors, nr_sectors and current_nr_sectors references to accessors.
While at it, drop superflous blk_rq_pos() < 0 test in swim.c.
[ Impact: use pos and nr_sectors accessors ]
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Tested-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Grant Likely <grant.likely@secretlab.ca> Tested-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Mike Miller <mike.miller@hp.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Dario Ballabio <ballabio_dario@emc.com> Cc: David S. Miller <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: unsik Kim <donari75@gmail.com> Cc: Laurent Vivier <Laurent@lvivier.info> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
show more ...
|
Revision tags: v2.6.30-rc4 |
|
#
42dad764 |
| 22-Apr-2009 |
Jerome Marchand <jmarchan@redhat.com> |
block: simplify I/O stat accounting
This simplifies I/O stat accounting switching code and separates it completely from I/O scheduler switch code.
Requests are accounted according to the state of t
block: simplify I/O stat accounting
This simplifies I/O stat accounting switching code and separates it completely from I/O scheduler switch code.
Requests are accounted according to the state of their request queue at the time of the request allocation. There is no need anymore to flush the request queue when switching I/O accounting state.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
show more ...
|