/openbmc/linux/drivers/infiniband/ulp/ipoib/ |
H A D | ipoib_multicast.c | e8224e4b Tue Sep 16 13:57:45 CDT 2008 Yossi Etigin <yossi.openib@gmail.com> IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held.
The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop().
This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()").
Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> a77a57a1 Tue Aug 19 17:01:32 CDT 2008 Roland Dreier <rolandd@cisco.com> IPoIB: Fix deadlock on RTNL in ipoib_stop()
Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue. However, ipoib_stop() (which is run inside rtnl_lock()) flushes this workqueue, which leads to a deadlock if the join task is pending.
Fix this by simply not flushing the workqueue from ipoib_stop(). It turns out that we really don't care about workqueue tasks running during or after ipoib_stop(), as long as we make sure to flush the workqueue before unregistering a netdev.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>.
Signed-off-by: Roland Dreier <rolandd@cisco.com> e8224e4b Tue Sep 16 13:57:45 CDT 2008 Yossi Etigin <yossi.openib@gmail.com> IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held. The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop(). This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()"). Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> a77a57a1 Tue Aug 19 17:01:32 CDT 2008 Roland Dreier <rolandd@cisco.com> IPoIB: Fix deadlock on RTNL in ipoib_stop() Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue. However, ipoib_stop() (which is run inside rtnl_lock()) flushes this workqueue, which leads to a deadlock if the join task is pending. Fix this by simply not flushing the workqueue from ipoib_stop(). It turns out that we really don't care about workqueue tasks running during or after ipoib_stop(), as long as we make sure to flush the workqueue before unregistering a netdev. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
H A D | ipoib.h | e8224e4b Tue Sep 16 13:57:45 CDT 2008 Yossi Etigin <yossi.openib@gmail.com> IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held.
The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop().
This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()").
Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> e8224e4b Tue Sep 16 13:57:45 CDT 2008 Yossi Etigin <yossi.openib@gmail.com> IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held. The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop(). This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()"). Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
H A D | ipoib_main.c | e8224e4b Tue Sep 16 13:57:45 CDT 2008 Yossi Etigin <yossi.openib@gmail.com> IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held.
The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop().
This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()").
Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> a77a57a1 Tue Aug 19 17:01:32 CDT 2008 Roland Dreier <rolandd@cisco.com> IPoIB: Fix deadlock on RTNL in ipoib_stop()
Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue. However, ipoib_stop() (which is run inside rtnl_lock()) flushes this workqueue, which leads to a deadlock if the join task is pending.
Fix this by simply not flushing the workqueue from ipoib_stop(). It turns out that we really don't care about workqueue tasks running during or after ipoib_stop(), as long as we make sure to flush the workqueue before unregistering a netdev.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>.
Signed-off-by: Roland Dreier <rolandd@cisco.com> e8224e4b Tue Sep 16 13:57:45 CDT 2008 Yossi Etigin <yossi.openib@gmail.com> IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held. The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop(). This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()"). Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> a77a57a1 Tue Aug 19 17:01:32 CDT 2008 Roland Dreier <rolandd@cisco.com> IPoIB: Fix deadlock on RTNL in ipoib_stop() Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue. However, ipoib_stop() (which is run inside rtnl_lock()) flushes this workqueue, which leads to a deadlock if the join task is pending. Fix this by simply not flushing the workqueue from ipoib_stop(). It turns out that we really don't care about workqueue tasks running during or after ipoib_stop(), as long as we make sure to flush the workqueue before unregistering a netdev. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|