#
2bd64307 |
| 08-Mar-2021 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: use NVME_CTRL_CMIC_ANA macro
Use the proper macro instead of hard-coded value.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
nvme: use NVME_CTRL_CMIC_ANA macro
Use the proper macro instead of hard-coded value.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
4bdf2603 |
| 09-Feb-2021 |
Filippo Sironi <sironi@amazon.de> |
nvme: add 48-bit DMA address quirk for Amazon NVMe controllers
Some Amazon NVMe controllers do not follow the NVMe specification and are limited to 48-bit DMA addresses. Add a quirk to force bounce
nvme: add 48-bit DMA address quirk for Amazon NVMe controllers
Some Amazon NVMe controllers do not follow the NVMe specification and are limited to 48-bit DMA addresses. Add a quirk to force bounce buffering if needed and limit the IOVA allocation for these devices.
This affects all current Amazon NVMe controllers that expose EBS volumes (0x0061, 0x0065, 0x8061) and local instance storage (0xcd00, 0xcd01, 0xcd02).
Signed-off-by: Filippo Sironi <sironi@amazon.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
ed7770f6 |
| 19-Jan-2021 |
Hannes Reinecke <hare@suse.de> |
nvme-hwmon: rework to avoid devm allocation
The original design to use device-managed resource allocation doesn't really work as the NVMe controller has a vastly different lifetime than the hwmon sy
nvme-hwmon: rework to avoid devm allocation
The original design to use device-managed resource allocation doesn't really work as the NVMe controller has a vastly different lifetime than the hwmon sysfs attributes, causing warning about duplicate sysfs entries upon reconnection. This patch reworks the hwmon allocation to avoid device-managed resource allocation, and uses the NVMe controller as parent for the sysfs attributes.
Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hannes Reinecke <hare@suse.de> Tested-by: Enzo Matsumiya <ematsumiya@suse.de> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
dda3248e |
| 04-Feb-2021 |
Chao Leng <lengchao@huawei.com> |
nvme: introduce a nvme_host_path_error helper
When using nvme native multipathing, if a path related error occurs during ->queue_rq, the request needs to be completed with NVME_SC_HOST_PATH_ERROR so
nvme: introduce a nvme_host_path_error helper
When using nvme native multipathing, if a path related error occurs during ->queue_rq, the request needs to be completed with NVME_SC_HOST_PATH_ERROR so that the request can be failed over.
Introduce a helper to complete the command from ->queue_rq in a wait that invokes nvme_complete_rq.
Signed-off-by: Chao Leng <lengchao@huawei.com> [hch: renamed, added a return value to clean up the callers a bit] Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
25479069 |
| 20-Jan-2021 |
Chao Leng <lengchao@huawei.com> |
nvme-core: add cancel tagset helpers
Add nvme_cancel_tagset and nvme_cancel_admin_tagset for tear down and reconnection error handling.
Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by:
nvme-core: add cancel tagset helpers
Add nvme_cancel_tagset and nvme_cancel_admin_tagset for tear down and reconnection error handling.
Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
2b59787a |
| 05-Jan-2021 |
Max Gurtovoy <mgurtovoy@nvidia.com> |
nvme: remove the unused status argument from nvme_trace_bio_complete
The only used argument in this function is the "req".
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Minwoo Im
nvme: remove the unused status argument from nvme_trace_bio_complete
The only used argument in this function is the "req".
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
9b66fc02 |
| 30-Dec-2020 |
Minwoo Im <minwoo.im.dev@gmail.com> |
nvme: unexport functions with no external caller
There are no callers for nvme_reset_ctrl_sync() and nvme_alloc_request_qid() so that we keep the symbols exported.
Unexport those functions, mark th
nvme: unexport functions with no external caller
There are no callers for nvme_reset_ctrl_sync() and nvme_alloc_request_qid() so that we keep the symbols exported.
Unexport those functions, mark them static and update the header file respectively.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
2f4c9ba2 |
| 01-Dec-2020 |
Javier González <javier.gonz@samsung.com> |
nvme: export zoned namespaces without Zone Append support read-only
Allow ZNS NVMe SSDs to present a read-only namespace when append is not supported, instead of rejecting the namespace directly.
T
nvme: export zoned namespaces without Zone Append support read-only
Allow ZNS NVMe SSDs to present a read-only namespace when append is not supported, instead of rejecting the namespace directly.
This allows (i) the namespace to be used in read-only mode, which is not a problem as the append command only affects the write path, and (ii) to use standard management tools such as nvme-cli to choose a different format or firmware slot that is compatible with the Linux zoned block device.
Signed-off-by: Javier González <javier.gonz@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
8c4dfea9 |
| 24-Nov-2020 |
Victor Gladkov <Victor.Gladkov@kioxia.com> |
nvme-fabrics: reject I/O to offline device
Commands get stuck while Host NVMe-oF controller is in reconnect state. The controller enters into reconnect state when it loses connection with the target
nvme-fabrics: reject I/O to offline device
Commands get stuck while Host NVMe-oF controller is in reconnect state. The controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until a successful reconnect or until the reconnect time-out is reached. The default reconnect time out is 10 minutes.
Applications are expecting commands to complete with success or error within a certain timeout (30 seconds by default). The NVMe host is enforcing that timeout while it is connected, but during reconnect the timeout is not enforced and commands may get stuck for a long period or even forever.
To fix this long delay due to the default timeout, introduce new "fast_io_fail_tmo" session parameter. The timeout is measured in seconds from the controller reconnect and any command beyond that timeout is rejected. The new parameter value may be passed during 'connect'. The default value of -1 means no timeout (similar to current behavior).
Signed-off-by: Victor Gladkov <victor.gladkov@kioxia.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
39dfe844 |
| 09-Nov-2020 |
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> |
nvme: split nvme_alloc_request()
Right now nvme_alloc_request() allocates a request from block layer based on the value of the qid. When qid set to NVME_QID_ANY it used blk_mq_alloc_request() else b
nvme: split nvme_alloc_request()
Right now nvme_alloc_request() allocates a request from block layer based on the value of the qid. When qid set to NVME_QID_ANY it used blk_mq_alloc_request() else blk_mq_alloc_request_hctx().
The function nvme_alloc_request() is called from different context, The only place where it uses non NVME_QID_ANY value is for fabrics connect commands :-
nvme_submit_sync_cmd() NVME_QID_ANY nvme_features() NVME_QID_ANY nvme_sec_submit() NVME_QID_ANY nvmf_reg_read32() NVME_QID_ANY nvmf_reg_read64() NVME_QID_ANY nvmf_reg_write32() NVME_QID_ANY nvmf_connect_admin_queue() NVME_QID_ANY nvme_submit_user_cmd() NVME_QID_ANY nvme_alloc_request() nvme_keep_alive() NVME_QID_ANY nvme_alloc_request() nvme_timeout() NVME_QID_ANY nvme_alloc_request() nvme_delete_queue() NVME_QID_ANY nvme_alloc_request() nvmet_passthru_execute_cmd() NVME_QID_ANY nvme_alloc_request() nvmf_connect_io_queue() QID __nvme_submit_sync_cmd() nvme_alloc_request()
With passthru nvme_alloc_request() now falls into the I/O fast path such that blk_mq_alloc_request_hctx() is never gets called and that adds additional branch check in fast path.
Split the nvme_alloc_request() into nvme_alloc_request() and nvme_alloc_request_qid().
Replace each call of the nvme_alloc_request() with NVME_QID_ANY param with a call to newly added nvme_alloc_request() without NVME_QID_ANY.
Replace a call to nvme_alloc_request() with QID param with a call to newly added nvme_alloc_request() and nvme_alloc_request_qid() based on the qid value set in the __nvme_submit_sync_cmd().
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
dc96f938 |
| 09-Nov-2020 |
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> |
nvme: use consistent macro name for timeout
This is purely a clenaup patch, add prefix NVME to the ADMIN_TIMEOUT to make consistent with NVME_IO_TIMEOUT.
Signed-off-by: Chaitanya Kulkarni <chaitany
nvme: use consistent macro name for timeout
This is purely a clenaup patch, add prefix NVME to the ADMIN_TIMEOUT to make consistent with NVME_IO_TIMEOUT.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
84115d6d |
| 27-Oct-2020 |
Baolin Wang <baolin.wang@linux.alibaba.com> |
nvme: simplify nvme_req_qid()
Use the request's '->mq_hctx->queue_num' directly to simplify the nvme_req_qid() function.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Ch
nvme: simplify nvme_req_qid()
Use the request's '->mq_hctx->queue_num' directly to simplify the nvme_req_qid() function.
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
f6224b86 |
| 13-Nov-2020 |
Keith Busch <kbusch@kernel.org> |
nvme: directly cache command effects log
Remove the struct used for tracking known command effects logs in a list. This is now saved in an xarray that doesn't use these elements. Instead, store the
nvme: directly cache command effects log
Remove the struct used for tracking known command effects logs in a list. This is now saved in an xarray that doesn't use these elements. Instead, store the log directly instead of the wrapper struct.
Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
04800fbf |
| 21-Oct-2020 |
Chao Leng <lengchao@huawei.com> |
nvme: introduce nvme_sync_io_queues
Introduce sync io queues for some scenarios which just only need sync io queues not sync all queues.
Signed-off-by: Chao Leng <lengchao@huawei.com> Reviewed-by:
nvme: introduce nvme_sync_io_queues
Introduce sync io queues for some scenarios which just only need sync io queues not sync all queues.
Signed-off-by: Chao Leng <lengchao@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
643c476d |
| 15-Oct-2020 |
Keith Busch <kbusch@kernel.org> |
nvme: use queuedata for nvme_req_qid
The request's rq_disk isn't set for passthrough IO commands, so tracing uses qid 0 for these which incorrectly decodes as an admin command. Use the request_queue
nvme: use queuedata for nvme_req_qid
The request's rq_disk isn't set for passthrough IO commands, so tracing uses qid 0 for these which incorrectly decodes as an admin command. Use the request_queue's queuedata instead since that value is always set for the IO queues, and never set for the admin queue.
Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
d525c3c0 |
| 20-Aug-2020 |
Christoph Hellwig <hch@lst.de> |
nvme: remove the disk argument to nvme_update_zone_info
The queue can trivially be derived from the nvme_ns structure.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch
nvme: remove the disk argument to nvme_update_zone_info
The queue can trivially be derived from the nvme_ns structure.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
show more ...
|
#
7fad20dd |
| 20-Aug-2020 |
Christoph Hellwig <hch@lst.de> |
nvme: fix initialization of the zone bitmaps
The removal of the ->revalidate_disk method broke the initialization of the zone bitmaps, as nvme_revalidate_disk now never gets called during initializa
nvme: fix initialization of the zone bitmaps
The removal of the ->revalidate_disk method broke the initialization of the zone bitmaps, as nvme_revalidate_disk now never gets called during initialization.
Move the zone related code from nvme_revalidate_disk into a new helper in zns.c, and call it from nvme_alloc_ns in addition to nvme_validate_ns to ensure the zone bitmaps are initialized during probe.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
show more ...
|
#
1cf7a12e |
| 22-Sep-2020 |
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> |
nvme: use an xarray to lookup the Commands Supported and Effects log
When using linked list we have to open code the locking, search, and destroy operations with the loops even if data structure doe
nvme: use an xarray to lookup the Commands Supported and Effects log
When using linked list we have to open code the locking, search, and destroy operations with the loops even if data structure doesn't fall into the fast path.
One of the main advantage of having XArray to store, search, and remove items is that it handles all the locking by itself, avoids the loops when using linked lists, provides clear API to replace the linked list's search and destroy loops.
This patch replaces the ctrl->cel list with XArray and removes :-
a. Extra code needed for the linked list for ctrl->cel item management such as nvme_find_cel(). b. Destroy loop in the nvme_free_ctrl(). c. Explicit insertion locking in the nvme_get_effects_log().
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
b2702aaa |
| 16-Sep-2020 |
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> |
nvme: lift the file open code from nvme_ctrl_get_by_path
Lift opening the file open/close code from nvme_ctrl_get_by_path into the caller, just keeping a simple nvme_ctrl_from_file() helper.
Signed
nvme: lift the file open code from nvme_ctrl_get_by_path
Lift opening the file open/close code from nvme_ctrl_get_by_path into the caller, just keeping a simple nvme_ctrl_from_file() helper.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: refactored a bit, split the bug fixes into a separate prep patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
show more ...
|
#
59e330f8 |
| 17-Sep-2020 |
Keith Busch <kbusch@kernel.org> |
nvme: return errors for hwmon init
Initializing the nvme hwmon retrieves a log from the controller. If the controller is broken, we need to return the appropriate error so that subsequent initializa
nvme: return errors for hwmon init
Initializing the nvme hwmon retrieves a log from the controller. If the controller is broken, we need to return the appropriate error so that subsequent initialization doesn't attempt to continue.
Reported-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
b63de840 |
| 28-Aug-2020 |
James Smart <james.smart@broadcom.com> |
nvme: Revert: Fix controller creation races with teardown flow
The indicated patch introduced a barrier in the sysfs_delete attribute for the controller that rejects the request if the controller is
nvme: Revert: Fix controller creation races with teardown flow
The indicated patch introduced a barrier in the sysfs_delete attribute for the controller that rejects the request if the controller isn't created. "Created" is defined as at least 1 call to nvme_start_ctrl().
This is problematic in error-injection testing. If an error occurs on the initial attempt to create an association and the controller enters reconnect(s) attempts, the admin cannot delete the controller until either there is a successful association created or ctrl_loss_tmo times out.
Where this issue is particularly hurtful is when the "admin" is the nvme-cli, it is performing a connection to a discovery controller, and it is initiated via auto-connect scripts. With the FC transport, if the first connection attempt fails, the controller enters a normal reconnect state but returns control to the cli thread that created the controller. In this scenario, the cli attempts to read the discovery log via ioctl, which fails, causing the cli to see it as an empty log and then proceeds to delete the discovery controller. The delete is rejected and the controller is left live. If the discovery controller reconnect then succeeds, there is no action to delete it, and it sits live doing nothing.
Cc: <stable@vger.kernel.org> # v5.7+ Fixes: ce1518139e69 ("nvme: Fix controller creation races with teardown flow") Signed-off-by: James Smart <james.smart@broadcom.com> CC: Israel Rukshin <israelr@mellanox.com> CC: Max Gurtovoy <maxg@mellanox.com> CC: Christoph Hellwig <hch@lst.de> CC: Keith Busch <kbusch@kernel.org> CC: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
c13f0fbc |
| 23-Aug-2020 |
Christoph Hellwig <hch@lst.de> |
nvme: don't call revalidate_disk from nvme_set_queue_dying
In nvme_set_queue_dying we really just want to ensure the disk and bdev sizes are set to zero. Going through revalidate_disk leads to a so
nvme: don't call revalidate_disk from nvme_set_queue_dying
In nvme_set_queue_dying we really just want to ensure the disk and bdev sizes are set to zero. Going through revalidate_disk leads to a somewhat arcance and complex callchain relying on special behavior in a few places. Instead just lift the set_capacity directly to nvme_set_queue_dying, and rename and move the nvme_mpath_update_disk_size helper so that we can use it in nvme_set_queue_dying to propagate the size to the bdev without detours.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
611bee52 |
| 23-Aug-2020 |
Christoph Hellwig <hch@lst.de> |
block: replace bd_set_size with bd_set_nr_sectors
Replace bd_set_size with a version that takes the number of sectors instead, as that fits most of the current and future callers much better.
Signe
block: replace bd_set_size with bd_set_nr_sectors
Replace bd_set_size with a version that takes the number of sectors instead, as that fits most of the current and future callers much better.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
7cf0d7c0 |
| 30-Jul-2020 |
Sagi Grimberg <sagi@grimberg.me> |
nvme: have nvme_wait_freeze_timeout return if it timed out
Users can detect if the wait has completed or not and take appropriate actions based on this information (e.g. weather to continue initiali
nvme: have nvme_wait_freeze_timeout return if it timed out
Users can detect if the wait has completed or not and take appropriate actions based on this information (e.g. weather to continue initialization or rather fail and schedule another initialization attempt).
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
show more ...
|
#
1e41f3bd |
| 18-Aug-2020 |
Christoph Hellwig <hch@lst.de> |
nvme: just check the status code type in nvme_is_path_error
Check the SCT sub-field for a path related status instead of enumerating invididual status code. As of NVMe 1.4 this adds "Internal Path
nvme: just check the status code type in nvme_is_path_error
Check the SCT sub-field for a path related status instead of enumerating invididual status code. As of NVMe 1.4 this adds "Internal Path Error" and "Controller Pathing Error" to the list, but it also future proofs for additional status codes added to the category.
Suggested-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|