raid1.c - OpenGrok history log for /openbmc/linux/drivers/md/raid1.c

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# c186b128	30-Sep-2015	Goldwyn Rodrigues <rgoldwyn@suse.com>	md-cluster: Perform resync/recovery under a DLM lock Resync or recovery must be performed by only one node at a time. A DLM lock resource, resync_lockres provides the mutual exclusion so that only o md-cluster: Perform resync/recovery under a DLM lock Resync or recovery must be performed by only one node at a time. A DLM lock resource, resync_lockres provides the mutual exclusion so that only one node performs the recovery/resync at a time. If a node is unable to get the resync_lockres, because recovery is being performed by another node, it set MD_RECOVER_NEEDED so as to schedule recovery in the future. Remove the debug message in resync_info_update() used during development. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> show more ...
Revision tags: v4.3-rc1, v4.2, v4.2-rc8
# 70bcecdb	21-Aug-2015	Goldwyn Rodrigues <rgoldwyn@suse.com>	md-cluster: Improve md_reload_sb to be less error prone md_reload_sb is too simplistic and it explicitly needs to determine the changes made by the writing node. However, there are multiple areas wh md-cluster: Improve md_reload_sb to be less error prone md_reload_sb is too simplistic and it explicitly needs to determine the changes made by the writing node. However, there are multiple areas where a simple reload could fail. Instead, read the superblock of one of the "good" rdevs and update the necessary information: - read the superblock into a newly allocated page, by temporarily swapping out rdev->sb_page and calling ->load_super. - if that fails return - if it succeeds, call check_sb_changes 1. iterates over list of active devices and checks the matching dev_roles[] value. If that is 'faulty', the device must be marked as faulty - call md_error to mark the device as faulty. Make sure not to set CHANGE_DEVS and wakeup mddev->thread or else it would initiate a resync process, which is the responsibility of the "primary" node. - clear the Blocked bit - Call remove_and_add_spares() to hot remove the device. If the device is 'spare': - call remove_and_add_spares() to get the number of spares added in this operation. - Reduce mddev->degraded to mark the array as not degraded. 2. reset recovery_cp - read the rest of the rdevs to update recovery_offset. If recovery_offset is equal to MaxSector, call spare_active() to set it In_sync This required that recovery_offset be initialized to MaxSector, as opposed to zero so as to communicate the end of sync for a rdev. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> show more ...
# c40f341f	18-Aug-2015	Goldwyn Rodrigues <rgoldwyn@suse.com>	md-cluster: Use a small window for resync Suspending the entire device for resync could take too long. Resync in small chunks. cluster's resync window (32M) is maintained in r1conf as cluster_sync_ md-cluster: Use a small window for resync Suspending the entire device for resync could take too long. Resync in small chunks. cluster's resync window (32M) is maintained in r1conf as cluster_sync_low and cluster_sync_high and processed in raid1's sync_request(). If the current resync is outside the cluster resync window: 1. Set the cluster_sync_low to curr_resync_completed. 2. Check if the sync will fit in the new window, if not issue a wait_barrier() and set cluster_sync_low to sector_nr. 3. Set cluster_sync_high to cluster_sync_low + resync_window. 4. Send a message to all nodes so they may add it in their suspension list. bitmap_cond_end_sync is modified to allow to force a sync inorder to get the curr_resync_completed uptodate with the sector passed. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.de> show more ...
# a452744b	01-Oct-2015	Mikulas Patocka <mpatocka@redhat.com>	crash in md-raid1 and md-raid10 due to incorrect list manipulation The commit 55ce74d4bfe1b9444436264c637f39a152d1e5ac (md/raid1: ensure device failure recorded before write request returns) is caus crash in md-raid1 and md-raid10 due to incorrect list manipulation The commit 55ce74d4bfe1b9444436264c637f39a152d1e5ac (md/raid1: ensure device failure recorded before write request returns) is causing crash in the LVM2 testsuite test shell/lvchange-raid.sh. For me the crash is 100% reproducible. The reason for the crash is that the newly added code in raid1d moves the list from conf->bio_end_io_list to tmp, then tests if tmp is non-empty and then incorrectly pops the bio from conf->bio_end_io_list (which is empty because the list was alrady moved). Raid-10 has a similar bug. Kernel Fault: Code=15 regs=000000006ccb8640 (Addr=0000000100000000) CPU: 3 PID: 1930 Comm: mdX_raid1 Not tainted 4.2.0-rc5-bisect+ #35 task: 000000006cc1f258 ti: 000000006ccb8000 task.ti: 000000006ccb8000 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001001111111000001111 Not tainted r00-03 000000ff0804fe0f 000000001059d000 000000001059f818 000000007f16be38 r04-07 000000001059d000 000000007f16be08 0000000000200200 0000000000000001 r08-11 000000006ccb8260 000000007b7934d0 0000000000000001 0000000000000000 r12-15 000000004056f320 0000000000000000 0000000000013dd0 0000000000000000 r16-19 00000000f0d00ae0 0000000000000000 0000000000000000 0000000000000001 r20-23 000000000800000f 0000000042200390 0000000000000000 0000000000000000 r24-27 0000000000000001 000000000800000f 000000007f16be08 000000001059d000 r28-31 0000000100000000 000000006ccb8560 000000006ccb8640 0000000000000000 sr00-03 0000000000249800 0000000000000000 0000000000000000 0000000000249800 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 000000001059f61c 000000001059f620 IIR: 0f8010c6 ISR: 0000000000000000 IOR: 0000000100000000 CPU: 3 CR30: 000000006ccb8000 CR31: 0000000000000000 ORIG_R28: 000000001059d000 IAOQ[0]: call_bio_endio+0x34/0x1a8 [raid1] IAOQ[1]: call_bio_endio+0x38/0x1a8 [raid1] RP(r2): raid_end_bio_io+0x88/0x168 [raid1] Backtrace: [<000000001059f818>] raid_end_bio_io+0x88/0x168 [raid1] [<00000000105a4f64>] raid1d+0x144/0x1640 [raid1] [<000000004017fd5c>] kthread+0x144/0x160 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.") Fixes: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.") Signed-off-by: NeilBrown <neilb@suse.com> show more ...
# e8ff8bf0	16-Sep-2015	Jes Sorensen <Jes.Sorensen@redhat.com>	md/raid1: Avoid raid1 resync getting stuck close_sync() needs to set conf->next_resync to a large, but safe value below MaxSector and use it to determine whether or not to set start_next_window in w md/raid1: Avoid raid1 resync getting stuck close_sync() needs to set conf->next_resync to a large, but safe value below MaxSector and use it to determine whether or not to set start_next_window in wait_barrier() Solution suggested by Neil Brown. Reported-by: Nate Dailey <nate.dailey@stratus.com> Tested-by: Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> show more ...
# 644df1a8	13-Sep-2015	Julia Lawall <Julia.Lawall@lip6.fr>	md: drop null test before destroy functions Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != md: drop null test before destroy functions Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\\|mempool_destroy\\|dma_pool_destroy\)(x); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NeilBrown <neilb@suse.com> show more ...
Revision tags: v4.2-rc7
# 55ce74d4	13-Aug-2015	NeilBrown <neilb@suse.com>	md/raid1: ensure device failure recorded before write request returns. When a write to one of the legs of a RAID1 fails, the failure is recorded in the metadata of the other leg(s) so that after a r md/raid1: ensure device failure recorded before write request returns. When a write to one of the legs of a RAID1 fails, the failure is recorded in the metadata of the other leg(s) so that after a restart the data on the failed drive wont be trusted even if that drive seems to be working again (maybe a cable was unplugged). Similarly when we record a bad-block in response to a write failure, we must not let the write complete until the bad-block update is safe. Currently there is no interlock between the write request completing and the metadata update. So it is possible that the write will complete, the app will confirm success in some way, and then the machine will crash before the metadata update completes. This is an extremely small hole for a racy to fit in, but it is theoretically possible and so should be closed. So: - set MD_CHANGE_PENDING when requesting a metadata update for a failed device, so we can know with certainty when it completes - queue requests that experienced an error on a new queue which is only processed after the metadata update completes - call raid_end_bio_io() on bios in that queue when the time comes. Signed-off-by: NeilBrown <neilb@suse.com> show more ...
Revision tags: v4.2-rc6, v4.2-rc5, v4.2-rc4, v4.2-rc3, v4.2-rc2
# 985ca973	05-Jul-2015	NeilBrown <neilb@suse.com>	md: close some races between setting and checking sync_action. When checking sync_action in a script, we want to be sure it is as accurate as possible. As resync/reshape etc doesn't always start imm md: close some races between setting and checking sync_action. When checking sync_action in a script, we want to be sure it is as accurate as possible. As resync/reshape etc doesn't always start immediately (a separate thread is scheduled to do it), it is best if 'action_show' checks if MD_RECOVER_NEEDED is set (which it does) and in that case reports what is likely to start soon (which it only sometimes does). So: - report 'reshape' if reshape_position suggests one might start. - set MD_RECOVERY_RECOVER in raid1_reshape(), because that is very likely to happen next. Signed-off-by: NeilBrown <neilb@suse.com> show more ...
Revision tags: v4.2-rc1, v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4, v4.1-rc3, v4.1-rc2
# 8ae12666	28-Apr-2015	Kent Overstreet <kent.overstreet@gmail.com>	block: kill merge_bvec_fn() completely As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bve block: kill merge_bvec_fn() completely As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com> show more ...
# 423f04d6	26-Jul-2015	NeilBrown <neilb@suse.com>	md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies raid1_end_read_request() assumes that the In_sync bits are consistent with the ->degaded count. raid1_spare_active md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies raid1_end_read_request() assumes that the In_sync bits are consistent with the ->degaded count. raid1_spare_active updates the In_sync bit before the ->degraded count and so exposes an inconsistency, as does error() So extend the spinlock in raid1_spare_active() and error() to hide those inconsistencies. This should probably be part of Commit: 34cab6f42003 ("md/raid1: fix test for 'was read error from last working device'.") as it addresses the same issue. It fixes the same bug and should go to -stable for same reasons. Fixes: 76073054c95b ("md/raid1: clean up read_balance.") Cc: stable@vger.kernel.org (v3.0+) Signed-off-by: NeilBrown <neilb@suse.com> show more ...
# b7c44ed9	24-Jul-2015	Jens Axboe <axboe@fb.com>	block: manipulate bio->bi_flags through helpers Some places use helpers now, others don't. We only have the 'is set' helper, add helpers for setting and clearing flags too. It was a bit of a mess o block: manipulate bio->bi_flags through helpers Some places use helpers now, others don't. We only have the 'is set' helper, add helpers for setting and clearing flags too. It was a bit of a mess of atomic vs non-atomic access. With BIO_UPTODATE gone, we don't have any risk of concurrent access to the flags. So relax the restriction and don't make any of them atomic. The flags that do have serialization issues (reffed and chained), we already handle those separately. Signed-off-by: Jens Axboe <axboe@fb.com> show more ...
# 4246a0b6	20-Jul-2015	Christoph Hellwig <hch@lst.de>	block: add a bi_error field to struct bio Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the b block: add a bi_error field to struct bio Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com> show more ...
# 90382ed9	24-Jun-2015	Goldwyn Rodrigues <rgoldwyn@suse.com>	Fix read-balancing during node failure During a node failure, We need to suspend read balancing so that the reads are directed to the first device and stale data is not read. Suspending writes is no Fix read-balancing during node failure During a node failure, We need to suspend read balancing so that the reads are directed to the first device and stale data is not read. Suspending writes is not required because these would be recorded and synced eventually. A new flag MD_CLUSTER_SUSPEND_READ_BALANCING is set in recover_prep(). area_resyncing() will respond true for the entire devices if this flag is set and the request type is READ. The flag is cleared in recover_done(). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reported-By: David Teigland <teigland@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> show more ...
# 34cab6f4	23-Jul-2015	NeilBrown <neilb@suse.com>	md/raid1: fix test for 'was read error from last working device'. When we get a read error from the last working device, we don't try to repair it, and don't fail the device. We simple report a rea md/raid1: fix test for 'was read error from last working device'. When we get a read error from the last working device, we don't try to repair it, and don't fail the device. We simple report a read error to the caller. However the current test for 'is this the last working device' is wrong. When there is only one fully working device, it assumes that a non-faulty device is that device. However a spare which is rebuilding would be non-faulty but so not the only working device. So change the test from "!Faulty" to "In_sync". If ->degraded says there is only one fully working device and this device is in_sync, this must be the one. This bug has existed since we allowed read_balance to read from a recovering spare in v3.0 Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com> Fixes: 76073054c95b ("md/raid1: clean up read_balance.") Cc: stable@vger.kernel.org (v3.0+) Signed-off-by: NeilBrown <neilb@suse.com> show more ...
# 4452226e	22-May-2015	Tejun Heo <tj@kernel.org>	writeback: move backing_dev_info->state into bdi_writeback Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for wri writeback: move backing_dev_info->state into bdi_writeback Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bdi->state into wb. * enum bdi_state is renamed to wb_state and the prefix of all enums is changed from BDI_ to WB_. * Explicit zeroing of bdi->state is removed without adding zeoring of wb->state as the whole data structure is zeroed on init anyway. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->state are mechanically replaced with bdi->wb.state introducing no behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: drbd-dev@lists.linbit.com Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com> show more ...
Revision tags: v4.1-rc1, v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5, v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1
# 09314799	18-Feb-2015	NeilBrown <neilb@suse.de>	md: remove 'go_faster' option from ->sync_request() This option is not well justified and testing suggests that it hardly ever makes any difference. The comment suggests there might be a need to wa md: remove 'go_faster' option from ->sync_request() This option is not well justified and testing suggests that it hardly ever makes any difference. The comment suggests there might be a need to wait for non-resync activity indicated by ->nr_waiting, however raise_barrier() already waits for all of that. So just remove it to simplify reasoning about speed limiting. This allows us to remove a 'FIXME' comment from raid5.c as that never used the flag. Signed-off-by: NeilBrown <neilb@suse.de> show more ...
# d1901ef0	22-Feb-2015	Tomáš Hodek <tomas.hodek@volny.cz>	md/raid1: fix read balance when a drive is write-mostly. When a drive is marked write-mostly it should only be the target of reads if there is no other option. This behaviour was broken by commit md/raid1: fix read balance when a drive is write-mostly. When a drive is marked write-mostly it should only be the target of reads if there is no other option. This behaviour was broken by commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc md/raid1: read balance chooses idlest disk for SSD which causes a write-mostly device to be preferred is some cases. Restore correct behaviour by checking and setting best_dist_disk and best_pending_disk rather than best_disk. We only need to test one of these as they are both changed from -1 or >=0 at the same time. As we leave min_pending and best_dist unchanged, any non-write-mostly device will appear better than the write-mostly device. Reported-by: Tomáš Hodek <tomas.hodek@volny.cz> Reported-by: Dark Penguin <darkpenguin@yandex.ru> Signed-off-by: NeilBrown <neilb@suse.de> Link: http://marc.info/?l=linux-raid&m=135982797322422 Fixes: 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc Cc: stable@vger.kernel.org (3.6+) show more ...
Revision tags: v3.19, v3.19-rc7, v3.19-rc6, v3.19-rc5, v3.19-rc4, v3.19-rc3, v3.19-rc2, v3.19-rc1, v3.18, v3.18-rc7, v3.18-rc6, v3.18-rc5, v3.18-rc4, v3.18-rc3
# 1aee41f6	29-Oct-2014	Goldwyn Rodrigues <rgoldwyn@suse.com>	Add new disk to clustered array Algorithm: 1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD) 2. Node 1 sends NEW Add new disk to clustered array Algorithm: 1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD) 2. Node 1 sends NEWDISK with uuid and slot number 3. Other nodes issue kobject_uevent_env with uuid and slot number (Steps 4,5 could be a udev rule) 4. In userspace, the node searches for the disk, perhaps using blkid -t SUB_UUID="" 5. Other nodes issue either of the following depending on whether the disk was found: ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and disc.number set to slot number) ioctl(CLUSTERED_DISK_NACK) 6. Other nodes drop lock on no-new-devs (CR) if device is found 7. Node 1 attempts EX lock on no-new-devs 8. If node 1 gets the lock, it sends METADATA_UPDATED after unmarking the disk as SpareLocal 9. If not (get no-new-dev lock), it fails the operation and sends METADATA_UPDATED 10. Other nodes understand if the device is added or not by reading the superblock again after receiving the METADATA_UPDATED message. Signed-off-by: Lidong Zhong <lzhong@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> show more ...
Revision tags: v3.18-rc2, v3.18-rc1, v3.17, v3.17-rc7, v3.17-rc6, v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2, v3.17-rc1
# 7d49ffcf	12-Aug-2014	Goldwyn Rodrigues <rgoldwyn@suse.com>	Read from the first device when an area is resyncing set choose_first true for cluster read in read balance when the area is resyncing. Signed-off-by: Lidong Zhong <lzhong@suse.com> Signed-off-by: Read from the first device when an area is resyncing set choose_first true for cluster read in read balance when the area is resyncing. Signed-off-by: Lidong Zhong <lzhong@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> show more ...
Revision tags: v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3, v3.16-rc2, v3.16-rc1, v3.15
# 589a1c49	07-Jun-2014	Goldwyn Rodrigues <rgoldwyn@suse.com>	Suspend writes in RAID1 if within range If there is a resync going on, all nodes must suspend writes to the range. This is recorded in the suspend_info/suspend_list. If there is an I/O within the r Suspend writes in RAID1 if within range If there is a resync going on, all nodes must suspend writes to the range. This is recorded in the suspend_info/suspend_list. If there is an I/O within the ranges of any of the suspend_info, should_suspend will return 1. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> show more ...
# ab713cdc	12-Feb-2015	Nate Dailey <nate.dailey@stratus.com>	md/raid1: round up to bdev_logical_block_size in narrow_write_error This modifies raid1's narrow_write_error to round up block_sectors to the device's logical block size. This prevents sd complaini md/raid1: round up to bdev_logical_block_size in narrow_write_error This modifies raid1's narrow_write_error to round up block_sectors to the device's logical block size. This prevents sd complaining about "Bad block number requested" for non-512-byte sector disks. Signed-off-by: Nate Dailey <nate.dailey@stratus.com> Signed-off-by: NeilBrown <neilb@suse.de> show more ...
# afa0f557	14-Dec-2014	NeilBrown <neilb@suse.de>	md: rename ->stop to ->free Now that the ->stop function only frees the private data, rename is accordingly. Also pass in the private pointer as an arg rather than using mddev->private. This flexi md: rename ->stop to ->free Now that the ->stop function only frees the private data, rename is accordingly. Also pass in the private pointer as an arg rather than using mddev->private. This flexibility will be useful in level_store(). Finally, don't clear ->private. It doesn't make sense to clear it seeing that isn't what we free, and it is no longer necessary to clear ->private (it was some time ago before ->to_remove was introduced). Setting ->to_remove in ->free() is a bit of a wart, but not a big problem at the moment. Signed-off-by: NeilBrown <neilb@suse.de> show more ...
# 5aa61f42	14-Dec-2014	NeilBrown <neilb@suse.de>	md: split detach operation out from ->stop. Each md personality has a 'stop' operation which does two things: 1/ it finalizes some aspects of the array to ensure nothing is accessing the ->priv md: split detach operation out from ->stop. Each md personality has a 'stop' operation which does two things: 1/ it finalizes some aspects of the array to ensure nothing is accessing the ->private data 2/ it frees the ->private data. All the steps in '1' can apply to all arrays and so can be performed in common code. This is useful as in the case where we change the personality which manages an array (in level_store()), it would be helpful to do step 1 early, and step 2 later. So split the 'step 1' functionality out into a new mddev_detach(). Signed-off-by: NeilBrown <neilb@suse.de> show more ...
# 64590f45	14-Dec-2014	NeilBrown <neilb@suse.de>	md: make merge_bvec_fn more robust in face of personality changes. There is no locking around calls to merge_bvec_fn(), so it is possible that calls which coincide with a level (or personality) chan md: make merge_bvec_fn more robust in face of personality changes. There is no locking around calls to merge_bvec_fn(), so it is possible that calls which coincide with a level (or personality) change could go wrong. So create a central dispatch point for these functions and use rcu_read_lock(). If the array is suspended, reject any merge that can be rejected. If not, we know it is safe to call the function. Signed-off-by: NeilBrown <neilb@suse.de> show more ...
# 5c675f83	14-Dec-2014	NeilBrown <neilb@suse.de>	md: make ->congested robust against personality changes. There is currently no locking around calls to the 'congested' bdi function. If called at an awkward time while an array is being converted f md: make ->congested robust against personality changes. There is currently no locking around calls to the 'congested' bdi function. If called at an awkward time while an array is being converted from one level (or personality) to another, there is a tiny chance of running code in an unreferenced module etc. So add a 'congested' function to the md_personality operations structure, and call it with appropriate locking from a central 'mddev_congested'. When the array personality is changing the array will be 'suspended' so no IO is processed. If mddev_congested detects this, it simply reports that the array is congested, which is a safe guess. As mddev_suspend calls synchronize_rcu(), mddev_congested can avoid races by included the whole call inside an rcu_read_lock() region. This require that the congested functions for all subordinate devices can be run under rcu_lock. Fortunately this is the case. Signed-off-by: NeilBrown <neilb@suse.de> show more ...
1 2 3 4 5 6 7 8910 >>...41