#
487cf914 |
| 12-Oct-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: remove unnecessary setting for slot Since slot will be set within _sendmsg, we can remove the redundant code in resync_info_update. Signed-off-by: Guoqing Jiang <gqj
md-cluster: remove unnecessary setting for slot Since slot will be set within _sendmsg, we can remove the redundant code in resync_info_update. Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
show more ...
|
#
faeff83f |
| 12-Oct-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: make other members of cluster_msg is handled by little endian funcs Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
|
#
d216711b |
| 12-Oct-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
md-cluster: Do not printk() every received message The receive daemon prints kernel messages for every network message received. This would fill the kernel message log with unnecessary m
md-cluster: Do not printk() every received message The receive daemon prints kernel messages for every network message received. This would fill the kernel message log with unnecessary messages. Remove the pr_info() messages. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
show more ...
|
#
dbb64f86 |
| 01-Oct-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
md-cluster: Fix adding of new disk with new reload code Adding the disk worked incorrectly with the new reload code. Fix it: - No operation should be performed on rdev marked as Ca
md-cluster: Fix adding of new disk with new reload code Adding the disk worked incorrectly with the new reload code. Fix it: - No operation should be performed on rdev marked as Candidate - After a metadata update operation, kick disk if role is 0xfffe else clear Candidate bit and continue with the regular change check. - Saving the mode of the lock resource to check if token lock is already locked, because it can be called twice while adding a disk. However, unlock_comm() must be called only once. - add_new_disk() is called by the node initiating the --add operation. If it needs to be canceled, call add_new_disk_cancel(). The operation is completed by md_update_sb() which will write and unlock the communication. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
show more ...
|
#
c186b128 |
| 30-Sep-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
md-cluster: Perform resync/recovery under a DLM lock Resync or recovery must be performed by only one node at a time. A DLM lock resource, resync_lockres provides the mutual exclusion
md-cluster: Perform resync/recovery under a DLM lock Resync or recovery must be performed by only one node at a time. A DLM lock resource, resync_lockres provides the mutual exclusion so that only one node performs the recovery/resync at a time. If a node is unable to get the resync_lockres, because recovery is being performed by another node, it set MD_RECOVER_NEEDED so as to schedule recovery in the future. Remove the debug message in resync_info_update() used during development. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
show more ...
|
Revision tags: v4.3-rc1, v4.2, v4.2-rc8 |
|
#
70bcecdb |
| 21-Aug-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
md-cluster: Improve md_reload_sb to be less error prone md_reload_sb is too simplistic and it explicitly needs to determine the changes made by the writing node. However, there are multi
md-cluster: Improve md_reload_sb to be less error prone md_reload_sb is too simplistic and it explicitly needs to determine the changes made by the writing node. However, there are multiple areas where a simple reload could fail. Instead, read the superblock of one of the "good" rdevs and update the necessary information: - read the superblock into a newly allocated page, by temporarily swapping out rdev->sb_page and calling ->load_super. - if that fails return - if it succeeds, call check_sb_changes 1. iterates over list of active devices and checks the matching dev_roles[] value. If that is 'faulty', the device must be marked as faulty - call md_error to mark the device as faulty. Make sure not to set CHANGE_DEVS and wakeup mddev->thread or else it would initiate a resync process, which is the responsibility of the "primary" node. - clear the Blocked bit - Call remove_and_add_spares() to hot remove the device. If the device is 'spare': - call remove_and_add_spares() to get the number of spares added in this operation. - Reduce mddev->degraded to mark the array as not degraded. 2. reset recovery_cp - read the rest of the rdevs to update recovery_offset. If recovery_offset is equal to MaxSector, call spare_active() to set it In_sync This required that recovery_offset be initialized to MaxSector, as opposed to zero so as to communicate the end of sync for a rdev. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
show more ...
|
#
b8ca846e |
| 09-Oct-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
md-cluster: Wake up suspended process When the suspended_area is deleted, the suspended processes must be woken up in order to complete their I/O. Signed-off-by: Goldwyn Rodrigu
md-cluster: Wake up suspended process When the suspended_area is deleted, the suspended processes must be woken up in order to complete their I/O. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
show more ...
|
#
09995411 |
| 30-Sep-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: send BITMAP_NEEDS_SYNC when node is leaving cluster Previously, BITMAP_NEEDS_SYNC message is sent when the resyc aborts, but it could abort for different reasons, and not all
md-cluster: send BITMAP_NEEDS_SYNC when node is leaving cluster Previously, BITMAP_NEEDS_SYNC message is sent when the resyc aborts, but it could abort for different reasons, and not all of reasons require another node to take over the resync ownship. It is better make BITMAP_NEEDS_SYNC message only be sent when the node is leaving cluster with dirty bitmap. And we also need to ensure dlm connection is ok. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
c40f341f |
| 18-Aug-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
md-cluster: Use a small window for resync Suspending the entire device for resync could take too long. Resync in small chunks. cluster's resync window (32M) is maintained in r1c
md-cluster: Use a small window for resync Suspending the entire device for resync could take too long. Resync in small chunks. cluster's resync window (32M) is maintained in r1conf as cluster_sync_low and cluster_sync_high and processed in raid1's sync_request(). If the current resync is outside the cluster resync window: 1. Set the cluster_sync_low to curr_resync_completed. 2. Check if the sync will fit in the new window, if not issue a wait_barrier() and set cluster_sync_low to sector_nr. 3. Set cluster_sync_high to cluster_sync_low + resync_window. 4. Send a message to all nodes so they may add it in their suspension list. bitmap_cond_end_sync is modified to allow to force a sync inorder to get the curr_resync_completed uptodate with the sector passed. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>
show more ...
|
Revision tags: v4.2-rc7 |
|
#
9ed38ff5 |
| 14-Aug-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
md-cluster: complete all write requests before adding suspend_info process_suspend_info - which handles the RESYNCING request - must not reply until all writes which were initiated befor
md-cluster: complete all write requests before adding suspend_info process_suspend_info - which handles the RESYNCING request - must not reply until all writes which were initiated before the request arrived, have completed. As a by-product, all process_* functions now take mddev as their first arguement making it uniform. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
18b9f679 |
| 13-Aug-2015 |
NeilBrown <neilb@suse.com> |
md-cluster: remove inappropriate try_module_get from join() md_setup_cluster already calls try_module_get(), so this try_module_get isn't needed. Also, there is no matching module_pu
md-cluster: remove inappropriate try_module_get from join() md_setup_cluster already calls try_module_get(), so this try_module_get isn't needed. Also, there is no matching module_put (except in error patch), so this leaves an unbalanced module count. Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
Revision tags: v4.2-rc6, v4.2-rc5, v4.2-rc4, v4.2-rc3, v4.2-rc2 |
|
#
abb9b22a |
| 10-Jul-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: Read the disk bitmap sb and check if it needs recovery In gather_all_resync_info, we need to read the disk bitmap sb and check if it needs recovery. Reviewed-by: Gol
md-cluster: Read the disk bitmap sb and check if it needs recovery In gather_all_resync_info, we need to read the disk bitmap sb and check if it needs recovery. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
eece075c |
| 10-Jul-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: only call complete(&cinfo->completion) when node join cluster Introduce MD_CLUSTER_BEGIN_JOIN_CLUSTER flag to make sure complete(&cinfo->completion) is only be invoked when n
md-cluster: only call complete(&cinfo->completion) when node join cluster Introduce MD_CLUSTER_BEGIN_JOIN_CLUSTER flag to make sure complete(&cinfo->completion) is only be invoked when node join cluster. Otherwise node failure could also call the complete, and it doesn't make sense to do it. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
6e6d9f2c |
| 10-Jul-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: add missed lockres_free We also need to free the lock resource before goto out. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqji
md-cluster: add missed lockres_free We also need to free the lock resource before goto out. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
b2b9bfff |
| 10-Jul-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: remove the unused sb_lock The sb_lock is not used anywhere, so let's remove it. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqji
md-cluster: remove the unused sb_lock The sb_lock is not used anywhere, so let's remove it. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
9e3072e3 |
| 10-Jul-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: init suspend_list and suspend_lock early in join If the node just join the cluster, and receive the msg from other nodes before init suspend_list, it will cause kernel crash
md-cluster: init suspend_list and suspend_lock early in join If the node just join the cluster, and receive the msg from other nodes before init suspend_list, it will cause kernel crash due to NULL pointer dereference, so move the initializations early to fix the bug. md-cluster: Joined cluster 3578507b-e0cb-6d4f-6322-696cd7b1b10c slot 3 BUG: unable to handle kernel NULL pointer dereference at (null) ... ... ... Call Trace: [<ffffffffa0444924>] process_recvd_msg+0x2e4/0x330 [md_cluster] [<ffffffffa0444a06>] recv_daemon+0x96/0x170 [md_cluster] [<ffffffffa045189d>] md_thread+0x11d/0x170 [md_mod] [<ffffffff810768c4>] kthread+0xb4/0xc0 [<ffffffff8151927c>] ret_from_fork+0x7c/0xb0 ... ... ... RIP [<ffffffffa0443581>] __remove_suspend_info+0x11/0xa0 [md_cluster] Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
b5ef5678 |
| 10-Jul-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: add the error check if failed to get dlm lock In complicated cluster environment, it is possible that the dlm lock couldn't be get/convert on purpose, the related err inf
md-cluster: add the error check if failed to get dlm lock In complicated cluster environment, it is possible that the dlm lock couldn't be get/convert on purpose, the related err info is added for better debug potential issue. For lockres_free, if the lock is blocking by a lock request or conversion request, then dlm_unlock just put it back to grant queue, so need to ensure the lock is free finally. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
b83d51c0 |
| 10-Jul-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: init completion within lockres_init We should init completion within lockres_init, otherwise completion could be initialized more than one time during it's life cycle.
md-cluster: init completion within lockres_init We should init completion within lockres_init, otherwise completion could be initialized more than one time during it's life cycle. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
66099bb0 |
| 10-Jul-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: fix deadlock issue on message lock There is problem with previous communication mechanism, and we got below deadlock scenario with cluster which has 3 nodes.
md-cluster: fix deadlock issue on message lock There is problem with previous communication mechanism, and we got below deadlock scenario with cluster which has 3 nodes. Sender Receiver Receiver token(EX) message(EX) writes message downconverts message(CR) requests ack(EX) get message(CR) gets message(CR) reads message reads message requests EX on message requests EX on message To fix this problem, we do the following changes: 1. the sender downconverts MESSAGE to CW rather than CR. 2. and the receiver request PR lock not EX lock on message. And in case we failed to down-convert EX to CW on message, it is better to unlock message otherthan still hold the lock. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Lidong Zhong <ldzhong@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
dc737d7c |
| 10-Jul-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: transfer the resync ownership to another node When node A stops an array while the array is doing a resync, we need to let another node B take over the resync task.
md-cluster: transfer the resync ownership to another node When node A stops an array while the array is doing a resync, we need to let another node B take over the resync task. To achieve the goal, we need the A send an explicit BITMAP_NEEDS_SYNC message to the cluster. And the node B which received that message will invoke __recover_slot to do resync. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
05cd0e51 |
| 10-Jul-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: split recover_slot for future code reuse Make recover_slot as a wraper to __recover_slot, since the logic of __recover_slot can be reused for the condition when other nod
md-cluster: split recover_slot for future code reuse Make recover_slot as a wraper to __recover_slot, since the logic of __recover_slot can be reused for the condition when other nodes need to take over the resync job. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
b89f704a |
| 10-Jul-2015 |
Guoqing Jiang <gqjiang@suse.com> |
md-cluster: use %pU to print UUIDs Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>
|
Revision tags: v4.2-rc1 |
|
#
90382ed9 |
| 24-Jun-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
Fix read-balancing during node failure During a node failure, We need to suspend read balancing so that the reads are directed to the first device and stale data is not read. Suspend
Fix read-balancing during node failure During a node failure, We need to suspend read balancing so that the reads are directed to the first device and stale data is not read. Suspending writes is not required because these would be recorded and synced eventually. A new flag MD_CLUSTER_SUSPEND_READ_BALANCING is set in recover_prep(). area_resyncing() will respond true for the entire devices if this flag is set and the request type is READ. The flag is cleared in recover_done(). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reported-By: David Teigland <teigland@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
Revision tags: v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4, v4.1-rc3, v4.1-rc2, v4.1-rc1 |
|
#
97f6cd39 |
| 14-Apr-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.de> |
md-cluster: re-add capabilities When "re-add" is writted to /sys/block/mdXX/md/dev-YYY/state, the clustered md: 1. Sends RE_ADD message with the desc_nr. Nodes receiving the mes
md-cluster: re-add capabilities When "re-add" is writted to /sys/block/mdXX/md/dev-YYY/state, the clustered md: 1. Sends RE_ADD message with the desc_nr. Nodes receiving the message clear the Faulty bit in their respective rdev->flags. 2. The node initiating re-add, gathers the bitmaps of all nodes and copies them into the local bitmap. It does not clear the bitmap from which it is copying. 3. Initiating node schedules a md recovery to sync the devices. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>
show more ...
|
#
88bcfef7 |
| 14-Apr-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.de> |
md-cluster: remove capabilities This adds "remove" capabilities for the clustered environment. When a user initiates removal of a device from the array, a REMOVE message with disk nu
md-cluster: remove capabilities This adds "remove" capabilities for the clustered environment. When a user initiates removal of a device from the array, a REMOVE message with disk number in the array is sent to all the nodes which kick the respective device in their own array. This facilitates the removal of failed devices. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>
show more ...
|