Searched hist:"093108 cb" (Results 1 – 1 of 1) sorted by relevance
/openbmc/linux/drivers/infiniband/hw/cxgb4/ |
H A D | cm.c | 093108cb Fri May 06 11:48:09 CDT 2016 Hariprasad S <hariprasad@chelsio.com> RDMA/iw_cxgb4: Always wake up waiter in c4iw_peer_abort_intr()
Currently c4iw_peer_abort_intr() does not wake up the waiter if the endpoint state indicates we're using MPAv2 and we're currently trying to connect. This was introduced with commit 7c0a33d61187a ("RDMA/cxgb4: Don't wakeup threads for MPAv2")
However, this original fix is flawed because it introduces a race that can cause a deadlock of the iwarp stack. Here is the race:
->local side sets up an active offload connection.
->local side sends MPA_START request.
->peer sends MPA_START response.
->local side ingress cpl thread begins processing the MPA_START response, but before it changes the state from MPA_REQ_SENT to FPDU_MODE:
->peer sends a RST which results in a ABORT_REQ_RSS. This triggers peer_abort_intr() which sees the state in MPA_REQ_SENT and since mpa_rev is 2, it will avoid waking up the endpoint with -ECONNRESET, assuming the stack will re-attempt the connection using MPAv1.
->Meanwhile, the cpl thread moves the state to FPDU_MODE and calls c4iw_modify_rc_qp() which calls rdma_init() which sends a RI_WR/INIT WR to firmware. But since HW sent an abort, FW correctly drops the RI_WR/INIT WR.
->So the cpl thread is stuck waiting for a reply and cannot process the ABORT_REQ_RSS cpl sitting in its input queue. Thus everything comes to a halt because no more ingress cpls are processed by the stack...
The correct fix for the issue is to always do the wake up in c4iw_abort_intr() but reinitialize the wait object in c4iw_reconnect().
Fixes: 7c0a33d61187a ("RDMA/cxgb4: Don't wakeup threads for MPAv2") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|