Home
last modified time | relevance | path

Searched hist:fc2af28b (Results 1 – 1 of 1) sorted by relevance

/openbmc/linux/fs/ocfs2/cluster/
H A Dquorum.cfc2af28b Wed Jan 31 18:14:33 CST 2018 Yang Zhang <zhang.yangB@h3c.com> ocfs2/cluster: close a race that fence can't be triggered

When some nodes of cluster face with TCP connection fault, ocfs2 will
pick up a quorum to continue to work and other nodes will be fenced by
resetting host.

In order to decide which node should be fenced, ocfs2 leverages
o2quo_state::qs_holds. If that variable is reduced to zero, then a try
to decide if fence local node is performed. However, under a specific
scenario that local node is not disconnected from others at the same
time, above method has a problem to reduce ::qs_holds to zero.

Because, o2net 90s idle timer corresponding to different nodes is
triggered one after another.

node 2 node 3
90s idle timer elapses
clear ::qs_conn_bm
set hold
40s is passed
90 idle timer elapses
clear ::qs_conn_bm
set hold
still up timer elapses
clear hold (NOT to zero )
90s idle timer elapses AGAIN
still up timer elapses.
clear hold
still up timer elapses

To solve this issue, a node which has already be evicted from
::qs_conn_bm can't set hold again and again invoked from idle timer.

Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F1F3F93B@H3CMLB12-EX.srv.huawei-3com.com
Signed-off-by: Yang Zhang <zhang.yangB@h3c.com>
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fc2af28b Wed Jan 31 18:14:33 CST 2018 Yang Zhang <zhang.yangB@h3c.com> ocfs2/cluster: close a race that fence can't be triggered

When some nodes of cluster face with TCP connection fault, ocfs2 will
pick up a quorum to continue to work and other nodes will be fenced by
resetting host.

In order to decide which node should be fenced, ocfs2 leverages
o2quo_state::qs_holds. If that variable is reduced to zero, then a try
to decide if fence local node is performed. However, under a specific
scenario that local node is not disconnected from others at the same
time, above method has a problem to reduce ::qs_holds to zero.

Because, o2net 90s idle timer corresponding to different nodes is
triggered one after another.

node 2 node 3
90s idle timer elapses
clear ::qs_conn_bm
set hold
40s is passed
90 idle timer elapses
clear ::qs_conn_bm
set hold
still up timer elapses
clear hold (NOT to zero )
90s idle timer elapses AGAIN
still up timer elapses.
clear hold
still up timer elapses

To solve this issue, a node which has already be evicted from
::qs_conn_bm can't set hold again and again invoked from idle timer.

Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F1F3F93B@H3CMLB12-EX.srv.huawei-3com.com
Signed-off-by: Yang Zhang <zhang.yangB@h3c.com>
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>