#
6d2809d5 |
| 27-Feb-2017 |
Omar Sandoval <osandov@fb.com> |
blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request
blk_mq_alloc_request_hctx() allocates a driver request directly, unlike its blk_mq_alloc_request() counterpart. It also crashes
blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request
blk_mq_alloc_request_hctx() allocates a driver request directly, unlike its blk_mq_alloc_request() counterpart. It also crashes because it doesn't update the tags->rqs map.
Fix it by making it allocate a scheduler request.
Reported-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com> Tested-by: Sagi Grimberg <sagi@grimberg.me>
show more ...
|
#
415b806d |
| 27-Feb-2017 |
Sagi Grimberg <sagi@grimberg.me> |
blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Modified by me to also check at driver tag allocation time if th
blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Modified by me to also check at driver tag allocation time if the original request was reserved, so we can be sure to allocate a properly reserved tag at that point in time, too.
Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
0871d5a6 |
| 01-Mar-2017 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into WIP.x86/boot, to fix up conflicts and to pick up updates
Conflicts: arch/x86/xen/setup.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
Revision tags: v4.10.1 |
|
#
1802979a |
| 24-Feb-2017 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block updates and fixes from Jens Axboe:
- NVMe updates and fixes that missed the first pull request. This includes bug fixes, a
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block updates and fixes from Jens Axboe:
- NVMe updates and fixes that missed the first pull request. This includes bug fixes, and support for autonomous power management.
- Fix from Christoph for missing clear of the request payload, causing a problem with (at least) the storvsc driver.
- Further fixes for the queue/bdi life time issues from Jan.
- The Kconfig mq scheduler update from me.
- Fixing a use-after-free in dm-rq, spotted by Bart, introduced in this merge window.
- Three fixes for nbd from Josef.
- Bug fix from Omar, fixing a bug in sas transport code that oopses when bsg ioctls were used. From Omar.
- Improvements to the queue restart and tag wait from from Omar.
- Set of fixes for the sed/opal code from Scott.
- Three trivial patches to cciss from Tobin
* 'for-linus' of git://git.kernel.dk/linux-block: (41 commits) dm-rq: don't dereference request payload after ending request blk-mq-sched: separate mark hctx and queue restart operations blk-mq: use sbq wait queues instead of restart for driver tags block/sed-opal: Propagate original error message to userland. nvme/pci: re-check security protocol support after reset block/sed-opal: Introduce free_opal_dev to free the structure and clean up state nvme: detect NVMe controller in recent MacBooks nvme-rdma: add support for host_traddr nvmet-rdma: Fix error handling nvmet-rdma: use nvme cm status helper nvme-rdma: move nvme cm status helper to .h file nvme-fc: don't bother to validate ioccsz and iorcsz nvme/pci: No special case for queue busy on IO nvme/core: Fix race kicking freed request_queue nvme/pci: Disable on removal when disconnected nvme: Enable autonomous power state transitions nvme: Add a quirk mechanism that uses identify_ctrl nvme: make nvmf_register_transport require a create_ctrl callback nvme: Use CNS as 8-bit field and avoid endianness conversion nvme: add semicolon in nvme_command setting ...
show more ...
|
#
d38d3515 |
| 22-Feb-2017 |
Omar Sandoval <osandov@fb.com> |
blk-mq-sched: separate mark hctx and queue restart operations
In blk_mq_sched_dispatch_requests(), we call blk_mq_sched_mark_restart() after we dispatch requests left over on our hardware queue disp
blk-mq-sched: separate mark hctx and queue restart operations
In blk_mq_sched_dispatch_requests(), we call blk_mq_sched_mark_restart() after we dispatch requests left over on our hardware queue dispatch list. This is so we'll go back and dispatch requests from the scheduler. In this case, it's only necessary to restart the hardware queue that we are running; there's no reason to run other hardware queues just because we are using shared tags.
So, split out blk_mq_sched_mark_restart() into two operations, one for just the hardware queue and one for the whole request queue. The core code only needs the hctx variant, but I/O schedulers will want to use both.
This also requires adjusting blk_mq_sched_restart_queues() to always check the queue restart flag, not just when using shared tags.
Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
b86dd815 |
| 22-Feb-2017 |
Jens Axboe <axboe@fb.com> |
block: get rid of blk-mq default scheduler choice Kconfig entries
The wording in the entries were poor and not understandable by even deities. Kill the selection for default block scheduler, and imp
block: get rid of blk-mq default scheduler choice Kconfig entries
The wording in the entries were poor and not understandable by even deities. Kill the selection for default block scheduler, and impose a policy with sane defaults.
Architected-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
772c8f6f |
| 21-Feb-2017 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
- blk-mq scheduling framework from me and Omar, with a port of the deadline
Merge tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
- blk-mq scheduling framework from me and Omar, with a port of the deadline scheduler for this framework. A port of BFQ from Paolo is in the works, and should be ready for 4.12.
- Various fixups and improvements to the above scheduling framework from Omar, Paolo, Bart, me, others.
- Cleanup of the exported sysfs blk-mq data into debugfs, from Omar. This allows us to export more information that helps debug hangs or performance issues, without cluttering or abusing the sysfs API.
- Fixes for the sbitmap code, the scalable bitmap code that was migrated from blk-mq, from Omar.
- Removal of the BLOCK_PC support in struct request, and refactoring of carrying SCSI payloads in the block layer. This cleans up the code nicely, and enables us to kill the SCSI specific parts of struct request, shrinking it down nicely. From Christoph mainly, with help from Hannes.
- Support for ranged discard requests and discard merging, also from Christoph.
- Support for OPAL in the block layer, and for NVMe as well. Mainly from Scott Bauer, with fixes/updates from various others folks.
- Error code fixup for gdrom from Christophe.
- cciss pci irq allocation cleanup from Christoph.
- Making the cdrom device operations read only, from Kees Cook.
- Fixes for duplicate bdi registrations and bdi/queue life time problems from Jan and Dan.
- Set of fixes and updates for lightnvm, from Matias and Javier.
- A few fixes for nbd from Josef, using idr to name devices and a workqueue deadlock fix on receive. Also marks Josef as the current maintainer of nbd.
- Fix from Josef, overwriting queue settings when the number of hardware queues is updated for a blk-mq device.
- NVMe fix from Keith, ensuring that we don't repeatedly mark and IO aborted, if we didn't end up aborting it.
- SG gap merging fix from Ming Lei for block.
- Loop fix also from Ming, fixing a race and crash between setting loop status and IO.
- Two block race fixes from Tahsin, fixing request list iteration and fixing a race between device registration and udev device add notifiations.
- Double free fix from cgroup writeback, from Tejun.
- Another double free fix in blkcg, from Hou Tao.
- Partition overflow fix for EFI from Alden Tondettar.
* tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block: (156 commits) nvme: Check for Security send/recv support before issuing commands. block/sed-opal: allocate struct opal_dev dynamically block/sed-opal: tone down not supported warnings block: don't defer flushes on blk-mq + scheduling blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers blk-mq: don't special case flush inserts for blk-mq-sched blk-mq-sched: don't add flushes to the head of requeue queue blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not block: do not allow updates through sysfs until registration completes lightnvm: set default lun range when no luns are specified lightnvm: fix off-by-one error on target initialization Maintainers: Modify SED list from nvme to block Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN uapi: sed-opal fix IOW for activate lsp to use correct struct cdrom: Make device operations read-only elevator: fix loading wrong elevator type for blk-mq devices cciss: switch to pci_irq_alloc_vectors block/loop: fix race between I/O and set_status blk-mq-sched: don't hold queue_lock when calling exit_icq block: set make_request_fn manually in blk_mq_update_nr_hw_queues ...
show more ...
|
Revision tags: v4.10 |
|
#
818551e2 |
| 17-Feb-2017 |
Jens Axboe <axboe@fb.com> |
Merge branch 'for-4.11/next' into for-4.11/linus-merge
Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
6010720d |
| 17-Feb-2017 |
Jens Axboe <axboe@fb.com> |
Merge branch 'for-4.11/block' into for-4.11/linus-merge
Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
64765a75 |
| 17-Feb-2017 |
Jens Axboe <axboe@fb.com> |
blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers
Usually we don't ask the scheduler for work, if we already have leftovers on the dispatch list. This is done to leave work on
blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers
Usually we don't ask the scheduler for work, if we already have leftovers on the dispatch list. This is done to leave work on the scheduler side for as long as possible, for proper merging. But if we do have work leftover but didn't dispatch anything, then we should ask the scheduler since we could potentially issue requests from that.
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
show more ...
|
#
c7a571b4 |
| 17-Feb-2017 |
Jens Axboe <axboe@fb.com> |
blk-mq-sched: don't add flushes to the head of requeue queue
If we are currently out of driver tags, we don't want to add a new flush (without a tag) to the head of the requeue list. We want to add
blk-mq-sched: don't add flushes to the head of requeue queue
If we are currently out of driver tags, we don't want to add a new flush (without a tag) to the head of the requeue list. We want to add it to the back, behind the others that are potentially also waiting for a tag.
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
show more ...
|
#
f1ba8261 |
| 07-Feb-2017 |
Paolo Valente <paolo.valente@linaro.org> |
blk-mq: pass bio to blk_mq_sched_get_rq_priv
bio is used in bfq-mq's get_rq_priv, to get the request group. We could pass directly the group here, but I thought that passing the bio was more general
blk-mq: pass bio to blk_mq_sched_get_rq_priv
bio is used in bfq-mq's get_rq_priv, to get the request group. We could pass directly the group here, but I thought that passing the bio was more general, giving the possibility to get other pieces of information if needed.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
34fe7c05 |
| 08-Feb-2017 |
Christoph Hellwig <hch@lst.de> |
block: enumify ELEVATOR_*_MERGE
Switch these constants to an enum, and make let the compiler ensure that all callers of blk_try_merge and elv_merge handle all potential values.
Signed-off-by: Chris
block: enumify ELEVATOR_*_MERGE
Switch these constants to an enum, and make let the compiler ensure that all callers of blk_try_merge and elv_merge handle all potential values.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
e4d750c9 |
| 03-Feb-2017 |
Jens Axboe <axboe@fb.com> |
block: free merged request in the caller
If we end up doing a request-to-request merge when we have completed a bio-to-request merge, we free the request from deep down in that path. For blk-mq-sche
block: free merged request in the caller
If we end up doing a request-to-request merge when we have completed a bio-to-request merge, we free the request from deep down in that path. For blk-mq-sched, the merge path has to hold the appropriate lock, but we don't need it for freeing the request. And in fact holding the lock is problematic, since we are now calling the mq sched put_rq_private() hook with the lock held. Other call paths do not hold this lock.
Fix this inconsistency by ensuring that the caller frees a merged request. Then we can do it outside of the lock, making it both more efficient and fixing the blk-mq-sched problem of invoking parts of the scheduler with an unknown lock state.
Reported-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
show more ...
|
#
0cacba6c |
| 02-Feb-2017 |
Omar Sandoval <osandov@fb.com> |
blk-mq-sched: bypass the scheduler for flushes entirely
There's a weird inconsistency that flushes are mostly hidden from the scheduler, but it needs to be aware of them in ->insert_requests(). Inst
blk-mq-sched: bypass the scheduler for flushes entirely
There's a weird inconsistency that flushes are mostly hidden from the scheduler, but it needs to be aware of them in ->insert_requests(). Instead of having every scheduler call blk_mq_sched_bypass_insert(), let's do it in the common framework.
Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
f924ba70 |
| 27-Jan-2017 |
Jens Axboe <axboe@fb.com> |
Merge branch 'for-4.11/block' into for-4.11/rq-refactor
Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
f3a8ab7d |
| 27-Jan-2017 |
Jens Axboe <axboe@fb.com> |
block: cleanup remaining manual checks for PREFLUSH|FUA
Use op_is_flush() where applicable.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
bd6737f1 |
| 27-Jan-2017 |
Jens Axboe <axboe@fb.com> |
blk-mq-sched: add flush insertion into blk_mq_sched_insert_request()
Instead of letting the caller check this and handle the details of inserting a flush request, put the logic in the scheduler inse
blk-mq-sched: add flush insertion into blk_mq_sched_insert_request()
Instead of letting the caller check this and handle the details of inserting a flush request, put the logic in the scheduler insertion function. This fixes direct flush insertion outside of the usual make_request_fn calls, like from dm via blk_insert_cloned_request().
Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
f73f44eb |
| 27-Jan-2017 |
Christoph Hellwig <hch@lst.de> |
block: add a op_is_flush helper
This centralizes the checks for bios that needs to be go into the flush state machine.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen
block: add a op_is_flush helper
This centralizes the checks for bios that needs to be go into the flush state machine.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
c13660a0 |
| 26-Jan-2017 |
Jens Axboe <axboe@fb.com> |
blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()
When we invoke dispatch_requests(), the scheduler empties everything into the passed in list. This isn't always a good thing, since
blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()
When we invoke dispatch_requests(), the scheduler empties everything into the passed in list. This isn't always a good thing, since it means that we remove items that we could have potentially merged with.
Change the function to dispatch single requests at the time. If we do that, we can backoff exactly at the point where the device can't consume more IO, and leave the rest with the scheduler for better merging and future dispatch decision making.
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
show more ...
|
#
50e1dab8 |
| 26-Jan-2017 |
Jens Axboe <axboe@fb.com> |
blk-mq-sched: fix starvation for multiple hardware queues and shared tags
If we have both multiple hardware queues and shared tag map between devices, we need to ensure that we propagate the hardwar
blk-mq-sched: fix starvation for multiple hardware queues and shared tags
If we have both multiple hardware queues and shared tag map between devices, we need to ensure that we propagate the hardware queue restart bit higher up. This is because we can get into a situation where we don't have any IO pending on a hardware queue, yet we fail getting a tag to start new IO. If that happens, it's not enough to mark the hardware queue as needing a restart, we need to bubble that up to the higher level queue as well.
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
show more ...
|
#
b48fda09 |
| 26-Jan-2017 |
Jens Axboe <axboe@fb.com> |
blk-mq-sched: check for successful allocation before assigning tag
We don't trigger this from the normal IO path, since we always use blocking allocations from there. But Bart saw it testing multipa
blk-mq-sched: check for successful allocation before assigning tag
We don't trigger this from the normal IO path, since we always use blocking allocations from there. But Bart saw it testing multipath dm, since that is a heavy user of atomic request allocations in the map and clone path.
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
5a797e00 |
| 26-Jan-2017 |
Jens Axboe <axboe@fb.com> |
blk-mq: don't lose flags passed in to blk_mq_alloc_request()
If we come in from blk_mq_alloc_requst() with NOWAIT set in flags, we must ensure that we don't later overwrite that in blk_mq_sched_get_
blk-mq: don't lose flags passed in to blk_mq_alloc_request()
If we come in from blk_mq_alloc_requst() with NOWAIT set in flags, we must ensure that we don't later overwrite that in blk_mq_sched_get_request(). Initialize alloc_data->flags before passing it in.
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
d3484991 |
| 13-Jan-2017 |
Jens Axboe <axboe@fb.com> |
blk-mq-sched: allow setting of default IO scheduler
Add Kconfig entries to manage what devices get assigned an MQ scheduler, and add a blk-mq flag for drivers to opt out of scheduling. The latter is
blk-mq-sched: allow setting of default IO scheduler
Add Kconfig entries to manage what devices get assigned an MQ scheduler, and add a blk-mq flag for drivers to opt out of scheduling. The latter is useful for admin type queues that still allocate a blk-mq queue and tag set, but aren't use for normal IO.
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
show more ...
|
#
bd166ef1 |
| 17-Jan-2017 |
Jens Axboe <axboe@fb.com> |
blk-mq-sched: add framework for MQ capable IO schedulers
This adds a set of hooks that intercepts the blk-mq path of allocating/inserting/issuing/completing requests, allowing us to develop a schedu
blk-mq-sched: add framework for MQ capable IO schedulers
This adds a set of hooks that intercepts the blk-mq path of allocating/inserting/issuing/completing requests, allowing us to develop a scheduler within that framework.
We reuse the existing elevator scheduler API on the registration side, but augment that with the scheduler flagging support for the blk-mq interfce, and with a separate set of ops hooks for MQ devices.
We split driver and scheduler tags, so we can run the scheduling independently of device queue depth.
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
show more ...
|