Revision tags: v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7 |
|
#
e5ea42fa |
| 22-Sep-2021 |
Hannes Reinecke <hare@suse.de> |
nvme: display correct subsystem NQN
With discovery controllers supporting unique subsystem NQNs the actual subsystem NQN might be different from that one passed in via the connect args. So add a hel
nvme: display correct subsystem NQN
With discovery controllers supporting unique subsystem NQNs the actual subsystem NQN might be different from that one passed in via the connect args. So add a helper to display the resulting subsystem NQN.
Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
6ca1d902 |
| 14-Oct-2021 |
Ming Lei <ming.lei@redhat.com> |
nvme: apply nvme API to quiesce/unquiesce admin queue
Apply the added two APIs to quiesce/unquiesce admin queue.
Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@ls
nvme: apply nvme API to quiesce/unquiesce admin queue
Apply the added two APIs to quiesce/unquiesce admin queue.
Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211014081710.1871747-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
5a72e899 |
| 12-Oct-2021 |
Jens Axboe <axboe@kernel.dk> |
block: add a struct io_comp_batch argument to fops->iopoll()
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of
block: add a struct io_comp_batch argument to fops->iopoll()
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO.
For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
8589bbfa |
| 05-Sep-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: fix regression that causes sporadic requests to time out
[ Upstream commit 3770a42bb8ceb856877699257a43c0585a5d2996 ]
When we queue requests, we strive to batch as much as possible and al
nvme-tcp: fix regression that causes sporadic requests to time out
[ Upstream commit 3770a42bb8ceb856877699257a43c0585a5d2996 ]
When we queue requests, we strive to batch as much as possible and also signal the network stack that more data is about to be sent over a socket with MSG_SENDPAGE_NOTLAST. This flag looks at the pending requests queued as well as queue->more_requests that is derived from the block layer last-in-batch indication.
We set more_request=true when we flush the request directly from .queue_rq submission context (in nvme_tcp_send_all), however this is wrongly assuming that no other requests may be queued during the execution of nvme_tcp_send_all.
Due to this, a race condition may happen where:
1. request X is queued as !last-in-batch 2. request X submission context calls nvme_tcp_send_all directly 3. nvme_tcp_send_all is preempted and schedules to a different cpu 4. request Y is queued as last-in-batch 5. nvme_tcp_send_all context sends request X+Y, however signals for both MSG_SENDPAGE_NOTLAST because queue->more_requests=true.
==> none of the requests is pushed down to the wire as the network stack is waiting for more data, both requests timeout.
To fix this, we eliminate queue->more_requests and only rely on the queue req_list and send_list to be not-empty.
Fixes: 122e5b9f3d37 ("nvme-tcp: optimize network stack with setting msg flags according to batch size") Reported-by: Jonathan Nicklin <jnicklin@blockbridge.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Jonathan Nicklin <jnicklin@blockbridge.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
13c80a6c |
| 05-Sep-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: fix UAF when detecting digest errors
[ Upstream commit 160f3549a907a50e51a8518678ba2dcf2541abea ]
We should also bail from the io_work loop when we set rd_enabled to true, so we don't att
nvme-tcp: fix UAF when detecting digest errors
[ Upstream commit 160f3549a907a50e51a8518678ba2dcf2541abea ]
We should also bail from the io_work loop when we set rd_enabled to true, so we don't attempt to read data from the socket when the TCP stream is already out-of-sync or corrupted.
Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7a2294c5 |
| 23-Jun-2022 |
Ruozhu Li <liruozhu@huawei.com> |
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the l
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever
This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem.
Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1e4427aa |
| 26-Jun-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work an
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work and thus failing a request from this context. Hence we don't need to try to guess from the socket retcode if this failure is because the queue is about to be torn down or not.
We are perfectly safe to just fail it, the request will not be cancelled later on.
This solves possible very long shutdown delays when the users issues a 'nvme disconnect-all'
Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7a2294c5 |
| 23-Jun-2022 |
Ruozhu Li <liruozhu@huawei.com> |
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the l
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever
This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem.
Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1e4427aa |
| 26-Jun-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work an
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work and thus failing a request from this context. Hence we don't need to try to guess from the socket retcode if this failure is because the queue is about to be torn down or not.
We are perfectly safe to just fail it, the request will not be cancelled later on.
This solves possible very long shutdown delays when the users issues a 'nvme disconnect-all'
Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7a2294c5 |
| 23-Jun-2022 |
Ruozhu Li <liruozhu@huawei.com> |
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the l
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever
This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem.
Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1e4427aa |
| 26-Jun-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work an
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work and thus failing a request from this context. Hence we don't need to try to guess from the socket retcode if this failure is because the queue is about to be torn down or not.
We are perfectly safe to just fail it, the request will not be cancelled later on.
This solves possible very long shutdown delays when the users issues a 'nvme disconnect-all'
Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7a2294c5 |
| 23-Jun-2022 |
Ruozhu Li <liruozhu@huawei.com> |
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the l
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever
This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem.
Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1e4427aa |
| 26-Jun-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work an
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work and thus failing a request from this context. Hence we don't need to try to guess from the socket retcode if this failure is because the queue is about to be torn down or not.
We are perfectly safe to just fail it, the request will not be cancelled later on.
This solves possible very long shutdown delays when the users issues a 'nvme disconnect-all'
Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7a2294c5 |
| 23-Jun-2022 |
Ruozhu Li <liruozhu@huawei.com> |
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the l
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever
This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem.
Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1e4427aa |
| 26-Jun-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work an
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work and thus failing a request from this context. Hence we don't need to try to guess from the socket retcode if this failure is because the queue is about to be torn down or not.
We are perfectly safe to just fail it, the request will not be cancelled later on.
This solves possible very long shutdown delays when the users issues a 'nvme disconnect-all'
Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7a2294c5 |
| 23-Jun-2022 |
Ruozhu Li <liruozhu@huawei.com> |
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the l
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever
This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem.
Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1e4427aa |
| 26-Jun-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work an
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work and thus failing a request from this context. Hence we don't need to try to guess from the socket retcode if this failure is because the queue is about to be torn down or not.
We are perfectly safe to just fail it, the request will not be cancelled later on.
This solves possible very long shutdown delays when the users issues a 'nvme disconnect-all'
Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7a2294c5 |
| 23-Jun-2022 |
Ruozhu Li <liruozhu@huawei.com> |
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the l
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever
This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem.
Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1e4427aa |
| 26-Jun-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work an
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work and thus failing a request from this context. Hence we don't need to try to guess from the socket retcode if this failure is because the queue is about to be torn down or not.
We are perfectly safe to just fail it, the request will not be cancelled later on.
This solves possible very long shutdown delays when the users issues a 'nvme disconnect-all'
Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7a2294c5 |
| 23-Jun-2022 |
Ruozhu Li <liruozhu@huawei.com> |
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the l
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever
This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem.
Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1e4427aa |
| 26-Jun-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work an
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work and thus failing a request from this context. Hence we don't need to try to guess from the socket retcode if this failure is because the queue is about to be torn down or not.
We are perfectly safe to just fail it, the request will not be cancelled later on.
This solves possible very long shutdown delays when the users issues a 'nvme disconnect-all'
Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7a2294c5 |
| 23-Jun-2022 |
Ruozhu Li <liruozhu@huawei.com> |
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the l
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever
This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem.
Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1e4427aa |
| 26-Jun-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work an
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work and thus failing a request from this context. Hence we don't need to try to guess from the socket retcode if this failure is because the queue is about to be torn down or not.
We are perfectly safe to just fail it, the request will not be cancelled later on.
This solves possible very long shutdown delays when the users issues a 'nvme disconnect-all'
Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7a2294c5 |
| 23-Jun-2022 |
Ruozhu Li <liruozhu@huawei.com> |
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the l
nvme: fix regression when disconnect a recovering ctrl
[ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ]
We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever
This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem.
Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1e4427aa |
| 26-Jun-2022 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work an
nvme-tcp: always fail a request when sending it failed
[ Upstream commit 41d07df7de841bfbc32725ce21d933ad358f2844 ]
queue stoppage and inflight requests cancellation is fully fenced from io_work and thus failing a request from this context. Hence we don't need to try to guess from the socket retcode if this failure is because the queue is about to be torn down or not.
We are perfectly safe to just fail it, the request will not be cancelled later on.
This solves possible very long shutdown delays when the users issues a 'nvme disconnect-all'
Reported-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|