#
a7a7cbe3 |
| 16-Oct-2017 |
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> |
nvme-pci: add SGL support
This adds SGL support for NVMe PCIe driver, based on an earlier patch from Rajiv Shanmugam Madeswaran <smrajiv15 at gmail.com>. This patch refactors the original code and a
nvme-pci: add SGL support
This adds SGL support for NVMe PCIe driver, based on an earlier patch from Rajiv Shanmugam Madeswaran <smrajiv15 at gmail.com>. This patch refactors the original code and adds new module parameter sgl_threshold to determine whether to use SGL or PRP for IOs.
The usage of SGLs is controlled by the sgl_threshold module parameter, which allows to conditionally use SGLs if average request segment size (avg_seg_size) is greater than sgl_threshold. In the original patch, the decision of using SGLs was dependent only on the IO size, with the new approach we consider not only IO size but also the number of physical segments present in the IO.
We calculate avg_seg_size based on request payload bytes and number of physical segments present in the request.
For e.g.:-
1. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 8k avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.
2. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 64k avg_seg_size = 32K use sgl if avg_seg_size >= sgl_threshold.
3. blk_rq_nr_phys_segments = 16 blk_rq_payload_bytes = 64k avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
16772ae6 |
| 18-Oct-2017 |
Minwoo Im <dn3108@gmail.com> |
nvme-pci: fix typos in comments
fixed comment typos in adapter_alloc_cq() and adapter_alloc_sq(). 'the the' duplications are replaced with 'that the'.
Signed-off-by: Minwoo Im <dn3108@gmail.com> Si
nvme-pci: fix typos in comments
fixed comment typos in adapter_alloc_cq() and adapter_alloc_sq(). 'the the' duplications are replaced with 'that the'.
Signed-off-by: Minwoo Im <dn3108@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
8969f1f8 |
| 01-Oct-2017 |
Christoph Hellwig <hch@lst.de> |
nvme-pci: Use PCI bus address for data/queues in CMB
Currently, NVMe PCI host driver is programming CMB dma address as I/O SQs addresses. This results in failures on systems where 1:1 outbound mappi
nvme-pci: Use PCI bus address for data/queues in CMB
Currently, NVMe PCI host driver is programming CMB dma address as I/O SQs addresses. This results in failures on systems where 1:1 outbound mapping is not used (example Broadcom iProc SOCs) because CMB BAR will be progammed with PCI bus address but NVMe PCI EP will try to access CMB using dma address.
To have CMB working on systems without 1:1 outbound mapping, we program PCI bus address for I/O SQs instead of dma address. This approach will work on systems with/without 1:1 outbound mapping.
Based on a report and previous patch from Abhishek Shah.
Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available") Cc: stable@vger.kernel.org Reported-by: Abhishek Shah <abhishek.shah@broadcom.com> Tested-by: Abhishek Shah <abhishek.shah@broadcom.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
d0877473 |
| 15-Sep-2017 |
Keith Busch <keith.busch@intel.com> |
nvme-pci: Print invalid SGL only once
The WARN_ONCE macro returns true if the condition is true, not if the warn was raised, so we're printing the scatter list every time it's invalid. This is exces
nvme-pci: Print invalid SGL only once
The WARN_ONCE macro returns true if the condition is true, not if the warn was raised, so we're printing the scatter list every time it's invalid. This is excessive and makes debugging harder, so this patch prints it just once.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
161b8be2 |
| 14-Sep-2017 |
Keith Busch <keith.busch@intel.com> |
nvme-pci: initialize queue memory before interrupts
A spurious interrupt before the nvme driver has initialized the completion queue may inadvertently cause the driver to believe it has a completion
nvme-pci: initialize queue memory before interrupts
A spurious interrupt before the nvme driver has initialized the completion queue may inadvertently cause the driver to believe it has a completion to process. This may result in a NULL dereference since the nvmeq's tags are not set at this point.
The patch initializes the host's CQ memory so that a spurious interrupt isn't mistaken for a real completion.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
044a9df1 |
| 11-Sep-2017 |
Christoph Hellwig <hch@lst.de> |
nvme-pci: implement the HMB entry number and size limitations
Adds support for the new Host Memory Buffer Minimum Descriptor Entry Size and Host Memory Maximum Descriptors Entries field that were ad
nvme-pci: implement the HMB entry number and size limitations
Adds support for the new Host Memory Buffer Minimum Descriptor Entry Size and Host Memory Maximum Descriptors Entries field that were added in TP 4002 HMB Enhancements. These allow the controller to advertise limits for the usual number of segments in the host memory buffer, as well as a minimum usable per-segment size.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>
show more ...
|
#
9620cfba |
| 06-Sep-2017 |
Christoph Hellwig <hch@lst.de> |
nvme-pci: propagate (some) errors from host memory buffer setup
We want to catch command execution errors when resetting the device, so propagate errors from the Set Features when setting up the hos
nvme-pci: propagate (some) errors from host memory buffer setup
We want to catch command execution errors when resetting the device, so propagate errors from the Set Features when setting up the host memory buffer. We keep ignoring memory allocation failures, as the spec clearly says that the controller must work without a host memory buffer.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org
show more ...
|
#
30f92d62 |
| 06-Sep-2017 |
Akinobu Mita <akinobu.mita@gmail.com> |
nvme-pci: use appropriate initial chunk size for HMB allocation
The initial chunk size for host memory buffer allocation is currently PAGE_SIZE << MAX_ORDER. MAX_ORDER order allocation is usually f
nvme-pci: use appropriate initial chunk size for HMB allocation
The initial chunk size for host memory buffer allocation is currently PAGE_SIZE << MAX_ORDER. MAX_ORDER order allocation is usually failed without CONFIG_DMA_CMA. So the HMB allocation is retried with chunk size PAGE_SIZE << (MAX_ORDER - 1) in general, but there is no problem if the retry allocation works correctly.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> [hch: rebased] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org
show more ...
|
#
92dc6895 |
| 11-Sep-2017 |
Christoph Hellwig <hch@lst.de> |
nvme-pci: fix host memory buffer allocation fallback
nvme_alloc_host_mem currently contains two loops that are interwinded, and the outer retry loop turns out to be broken. Fix this by untangling t
nvme-pci: fix host memory buffer allocation fallback
nvme_alloc_host_mem currently contains two loops that are interwinded, and the outer retry loop turns out to be broken. Fix this by untangling the two.
Based on a report an initial patch from Akinobu Mita.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Akinobu Mita <akinobu.mita@gmail.com> Tested-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org
show more ...
|
#
608cc4b1 |
| 06-Sep-2017 |
Christoph Hellwig <hch@lst.de> |
nvme: fix lightnvm check
nvme_nvm_ns_supported assumes every device is a pci_dev, which leads to reading an incorrect field, or possible even a dereference of unallocated memory for fabrics controll
nvme: fix lightnvm check
nvme_nvm_ns_supported assumes every device is a pci_dev, which leads to reading an incorrect field, or possible even a dereference of unallocated memory for fabrics controllers.
Fix this by introducing a quirk for lighnvm capable devices instead.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
show more ...
|
#
b5d8af5b |
| 29-Aug-2017 |
Keith Busch <keith.busch@intel.com> |
nvme/pci: Use req_op to determine DIF remapping
Only read and write commands need DIF remapping. Everything else uses a passthrough integrity payload.
Signed-off-by: Keith Busch <keith.busch@intel.
nvme/pci: Use req_op to determine DIF remapping
Only read and write commands need DIF remapping. Everything else uses a passthrough integrity payload.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
4033f35d |
| 28-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
nvme-pci: use dma memory for the host memory buffer descriptors
The NVMe 1.3 specification says in section 5.21.1.13:
"After a successful completion of a Set Features enabling the host memory buff
nvme-pci: use dma memory for the host memory buffer descriptors
The NVMe 1.3 specification says in section 5.21.1.13:
"After a successful completion of a Set Features enabling the host memory buffer, the host shall not write to the associated host memory region, buffer size, or descriptor list until the host memory buffer has been disabled."
While this doesn't state that the descriptor list must remain accessible to the device it certainly implies it must remaing readable by the device.
So switch to a dma coherent allocation for the descriptor list just to be safe - it's not like the cost for it matters compared to the actual memory buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Fixes: 87ad72a59a38 ("nvme-pci: implement host memory buffer support")
show more ...
|
#
5228b328 |
| 27-Aug-2017 |
Jan H. Schönherr <jschoenh@amazon.de> |
nvme: fix uninitialized prp2 value on small transfers
The value of iod->first_dma ends up as prp2 in NVMe commands. In case there is not enough data to cross a page boundary, iod->first_dma is never
nvme: fix uninitialized prp2 value on small transfers
The value of iod->first_dma ends up as prp2 in NVMe commands. In case there is not enough data to cross a page boundary, iod->first_dma is never initialized and contains random data.
Comply with the NVMe specification and fill in 0 in that case.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
34b6c231 |
| 10-Jul-2017 |
Sagi Grimberg <sagi@grimberg.me> |
nvme: Add admin_tagset pointer to nvme_ctrl
Will be used when we centralize control flows.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
e9d8a0fd |
| 17-Aug-2017 |
Keith Busch <keith.busch@intel.com> |
nvme-pci: set cqe_seen on polled completions
Fixes: 920d13a884 ("nvme-pci: factor out the cqe reading mechanics from __nvme_process_cq") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keit
nvme-pci: set cqe_seen on polled completions
Fixes: 920d13a884 ("nvme-pci: factor out the cqe reading mechanics from __nvme_process_cq") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
1c78f773 |
| 29-Jul-2017 |
Max Gurtovoy <maxg@mellanox.com> |
nvme-pci: fix CMB sysfs file removal in reset path
Currently we create the sysfs entry even if we fail mapping it. In that case, the unmapping will not remove the sysfs created file. There is no goo
nvme-pci: fix CMB sysfs file removal in reset path
Currently we create the sysfs entry even if we fail mapping it. In that case, the unmapping will not remove the sysfs created file. There is no good reason to create a sysfs entry for a non working CMB and show his characteristics.
Fixes: f63572dff ("nvme: unmap CMB and remove sysfs file in reset path") Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Stephen Bates <sbates@raithlin.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
50cdb7c6 |
| 25-Jul-2017 |
Christoph Hellwig <hch@lst.de> |
nvme-pci: fix HMB size calculation
It's possible the preferred HMB size may not be a multiple of the chunk_size. This patch moves len to function scope and uses that in the for loop increment so the
nvme-pci: fix HMB size calculation
It's possible the preferred HMB size may not be a multiple of the chunk_size. This patch moves len to function scope and uses that in the for loop increment so the last iteration doesn't cause the total size to exceed the allocated HMB size.
Based on an earlier patch from Keith Busch.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Fixes: 87ad72a59a38 ("nvme-pci: implement host memory buffer support")
show more ...
|
#
b00c9b7a |
| 16-Jul-2017 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
nvme-pci: Fix an error handling path in 'nvme_probe()'
Release resources in the correct order in order not to miss a 'put_device()' if 'nvme_dev_map()' fails.
Fixes: b00a726a9fd8 ("NVMe: Don't unma
nvme-pci: Fix an error handling path in 'nvme_probe()'
Release resources in the correct order in order not to miss a 'put_device()' if 'nvme_dev_map()' fails.
Fixes: b00a726a9fd8 ("NVMe: Don't unmap controller registers on reset") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
86eea289 |
| 12-Jul-2017 |
Keith Busch <keith.busch@intel.com> |
nvme-pci: Remove nvme_setup_prps BUG_ON
This patch replaces the invalid nvme SGL kernel panic with a warning, and returns an appropriate error. The warning will occur only on the first occurance, an
nvme-pci: Remove nvme_setup_prps BUG_ON
This patch replaces the invalid nvme SGL kernel panic with a warning, and returns an appropriate error. The warning will occur only on the first occurance, and sgl details will be printed to help debug how the request was allowed to form.
Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
f99cb7af |
| 10-Jul-2017 |
David Wayne Fugate <david.fugate@intel.com> |
nvme-pci: add another device ID with stripe quirk
Adds a fourth Intel controller which has the "stripe" quirk.
Signed-off-by: David Wayne Fugate <david.fugate@intel.com> Acked-by: Keith Busch <keit
nvme-pci: add another device ID with stripe quirk
Adds a fourth Intel controller which has the "stripe" quirk.
Signed-off-by: David Wayne Fugate <david.fugate@intel.com> Acked-by: Keith Busch <keith.busch@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
b27c1e68 |
| 10-Jul-2017 |
weiping zhang <zhangweiping@didichuxing.com> |
nvme-pci: add module parameter for io queue depth
Adjust io queue depth more easily, and make sure io queue depth >= 2.
Signed-off-by: weiping zhang <zhangweiping@didichuxing.com> Signed-off-by: Sa
nvme-pci: add module parameter for io queue depth
Adjust io queue depth more easily, and make sure io queue depth >= 2.
Signed-off-by: weiping zhang <zhangweiping@didichuxing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
show more ...
|
#
2ee0e4ed |
| 06-Jul-2017 |
Dan Carpenter <dan.carpenter@oracle.com> |
nvme-pci: compile warnings in nvme_alloc_host_mem()
"i" should be signed or it could cause a forever loop on the cleanup path. "size" can be used uninitialized.
Fixes: 87ad72a59a38 ("nvme-pci: impl
nvme-pci: compile warnings in nvme_alloc_host_mem()
"i" should be signed or it could cause a forever loop on the cleanup path. "size" can be used uninitialized.
Fixes: 87ad72a59a38 ("nvme-pci: implement host memory buffer support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
show more ...
|
#
d09f2b45 |
| 02-Jul-2017 |
Sagi Grimberg <sagi@grimberg.me> |
nvme: split nvme_uninit_ctrl into stop and uninit
Usually before we teardown the controller we want to: 1. complete/cancel any ctrl inflight works 2. remove ctrl namespaces (only for removal though,
nvme: split nvme_uninit_ctrl into stop and uninit
Usually before we teardown the controller we want to: 1. complete/cancel any ctrl inflight works 2. remove ctrl namespaces (only for removal though, resets shouldn't remove any namespaces).
but we do not want to destroy the controller device as we might use it for logging during the teardown stage.
This patch adds nvme_start_ctrl() which queues inflight controller works (aen, ns scan, queue start and keep-alive if kato is set) and nvme_stop_ctrl() which cancels the works namespace removal is left to the callers to handle.
Move nvme_uninit_ctrl after we are done with the controller device.
Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
show more ...
|
#
c81545f9 |
| 02-Jul-2017 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-pci: quiesce/unquiesce admin_q instead of start/stop its hw queues
unlike blk_mq_stop_hw_queues and blk_mq_start_stopped_hw_queues quiescing/unquiescing respects the submission path rcu grace.
nvme-pci: quiesce/unquiesce admin_q instead of start/stop its hw queues
unlike blk_mq_stop_hw_queues and blk_mq_start_stopped_hw_queues quiescing/unquiescing respects the submission path rcu grace.
Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
show more ...
|
#
775755ed |
| 01-Jun-2017 |
Christoph Hellwig <hch@lst.de> |
PCI: Split ->reset_notify() method into ->reset_prepare() and ->reset_done()
The pci_error_handlers->reset_notify() method had a flag to indicate whether to prepare for or clean up after a reset. T
PCI: Split ->reset_notify() method into ->reset_prepare() and ->reset_done()
The pci_error_handlers->reset_notify() method had a flag to indicate whether to prepare for or clean up after a reset. The prepare and done cases have no shared functionality whatsoever, so split them into separate methods.
[bhelgaas: changelog, update locking comments] Link: http://lkml.kernel.org/r/20170601111039.8913-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|