#
c186b128 |
| 30-Sep-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
md-cluster: Perform resync/recovery under a DLM lock
Resync or recovery must be performed by only one node at a time. A DLM lock resource, resync_lockres provides the mutual exclusion so that only o
md-cluster: Perform resync/recovery under a DLM lock
Resync or recovery must be performed by only one node at a time. A DLM lock resource, resync_lockres provides the mutual exclusion so that only one node performs the recovery/resync at a time.
If a node is unable to get the resync_lockres, because recovery is being performed by another node, it set MD_RECOVER_NEEDED so as to schedule recovery in the future.
Remove the debug message in resync_info_update() used during development.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
show more ...
|
Revision tags: v4.3-rc1, v4.2, v4.2-rc8 |
|
#
70bcecdb |
| 21-Aug-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
md-cluster: Improve md_reload_sb to be less error prone
md_reload_sb is too simplistic and it explicitly needs to determine the changes made by the writing node. However, there are multiple areas wh
md-cluster: Improve md_reload_sb to be less error prone
md_reload_sb is too simplistic and it explicitly needs to determine the changes made by the writing node. However, there are multiple areas where a simple reload could fail.
Instead, read the superblock of one of the "good" rdevs and update the necessary information:
- read the superblock into a newly allocated page, by temporarily swapping out rdev->sb_page and calling ->load_super. - if that fails return - if it succeeds, call check_sb_changes 1. iterates over list of active devices and checks the matching dev_roles[] value. If that is 'faulty', the device must be marked as faulty - call md_error to mark the device as faulty. Make sure not to set CHANGE_DEVS and wakeup mddev->thread or else it would initiate a resync process, which is the responsibility of the "primary" node. - clear the Blocked bit - Call remove_and_add_spares() to hot remove the device. If the device is 'spare': - call remove_and_add_spares() to get the number of spares added in this operation. - Reduce mddev->degraded to mark the array as not degraded. 2. reset recovery_cp - read the rest of the rdevs to update recovery_offset. If recovery_offset is equal to MaxSector, call spare_active() to set it In_sync
This required that recovery_offset be initialized to MaxSector, as opposed to zero so as to communicate the end of sync for a rdev.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
show more ...
|
#
c40f341f |
| 18-Aug-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
md-cluster: Use a small window for resync
Suspending the entire device for resync could take too long. Resync in small chunks.
cluster's resync window (32M) is maintained in r1conf as cluster_sync_
md-cluster: Use a small window for resync
Suspending the entire device for resync could take too long. Resync in small chunks.
cluster's resync window (32M) is maintained in r1conf as cluster_sync_low and cluster_sync_high and processed in raid1's sync_request(). If the current resync is outside the cluster resync window:
1. Set the cluster_sync_low to curr_resync_completed. 2. Check if the sync will fit in the new window, if not issue a wait_barrier() and set cluster_sync_low to sector_nr. 3. Set cluster_sync_high to cluster_sync_low + resync_window. 4. Send a message to all nodes so they may add it in their suspension list.
bitmap_cond_end_sync is modified to allow to force a sync inorder to get the curr_resync_completed uptodate with the sector passed.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>
show more ...
|
#
a452744b |
| 01-Oct-2015 |
Mikulas Patocka <mpatocka@redhat.com> |
crash in md-raid1 and md-raid10 due to incorrect list manipulation
The commit 55ce74d4bfe1b9444436264c637f39a152d1e5ac (md/raid1: ensure device failure recorded before write request returns) is caus
crash in md-raid1 and md-raid10 due to incorrect list manipulation
The commit 55ce74d4bfe1b9444436264c637f39a152d1e5ac (md/raid1: ensure device failure recorded before write request returns) is causing crash in the LVM2 testsuite test shell/lvchange-raid.sh. For me the crash is 100% reproducible.
The reason for the crash is that the newly added code in raid1d moves the list from conf->bio_end_io_list to tmp, then tests if tmp is non-empty and then incorrectly pops the bio from conf->bio_end_io_list (which is empty because the list was alrady moved).
Raid-10 has a similar bug.
Kernel Fault: Code=15 regs=000000006ccb8640 (Addr=0000000100000000) CPU: 3 PID: 1930 Comm: mdX_raid1 Not tainted 4.2.0-rc5-bisect+ #35 task: 000000006cc1f258 ti: 000000006ccb8000 task.ti: 000000006ccb8000
YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001001111111000001111 Not tainted r00-03 000000ff0804fe0f 000000001059d000 000000001059f818 000000007f16be38 r04-07 000000001059d000 000000007f16be08 0000000000200200 0000000000000001 r08-11 000000006ccb8260 000000007b7934d0 0000000000000001 0000000000000000 r12-15 000000004056f320 0000000000000000 0000000000013dd0 0000000000000000 r16-19 00000000f0d00ae0 0000000000000000 0000000000000000 0000000000000001 r20-23 000000000800000f 0000000042200390 0000000000000000 0000000000000000 r24-27 0000000000000001 000000000800000f 000000007f16be08 000000001059d000 r28-31 0000000100000000 000000006ccb8560 000000006ccb8640 0000000000000000 sr00-03 0000000000249800 0000000000000000 0000000000000000 0000000000249800 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
IASQ: 0000000000000000 0000000000000000 IAOQ: 000000001059f61c 000000001059f620 IIR: 0f8010c6 ISR: 0000000000000000 IOR: 0000000100000000 CPU: 3 CR30: 000000006ccb8000 CR31: 0000000000000000 ORIG_R28: 000000001059d000 IAOQ[0]: call_bio_endio+0x34/0x1a8 [raid1] IAOQ[1]: call_bio_endio+0x38/0x1a8 [raid1] RP(r2): raid_end_bio_io+0x88/0x168 [raid1] Backtrace: [<000000001059f818>] raid_end_bio_io+0x88/0x168 [raid1] [<00000000105a4f64>] raid1d+0x144/0x1640 [raid1] [<000000004017fd5c>] kthread+0x144/0x160
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.") Fixes: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.") Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
e8ff8bf0 |
| 16-Sep-2015 |
Jes Sorensen <Jes.Sorensen@redhat.com> |
md/raid1: Avoid raid1 resync getting stuck
close_sync() needs to set conf->next_resync to a large, but safe value below MaxSector and use it to determine whether or not to set start_next_window in w
md/raid1: Avoid raid1 resync getting stuck
close_sync() needs to set conf->next_resync to a large, but safe value below MaxSector and use it to determine whether or not to set start_next_window in wait_barrier()
Solution suggested by Neil Brown.
Reported-by: Nate Dailey <nate.dailey@stratus.com> Tested-by: Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
644df1a8 |
| 13-Sep-2015 |
Julia Lawall <Julia.Lawall@lip6.fr> |
md: drop null test before destroy functions
Remove unneeded NULL test.
The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/)
// <smpl> @@ expression x; @@ -if (x !=
md: drop null test before destroy functions
Remove unneeded NULL test.
The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/)
// <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
Revision tags: v4.2-rc7 |
|
#
55ce74d4 |
| 13-Aug-2015 |
NeilBrown <neilb@suse.com> |
md/raid1: ensure device failure recorded before write request returns.
When a write to one of the legs of a RAID1 fails, the failure is recorded in the metadata of the other leg(s) so that after a r
md/raid1: ensure device failure recorded before write request returns.
When a write to one of the legs of a RAID1 fails, the failure is recorded in the metadata of the other leg(s) so that after a restart the data on the failed drive wont be trusted even if that drive seems to be working again (maybe a cable was unplugged).
Similarly when we record a bad-block in response to a write failure, we must not let the write complete until the bad-block update is safe.
Currently there is no interlock between the write request completing and the metadata update. So it is possible that the write will complete, the app will confirm success in some way, and then the machine will crash before the metadata update completes.
This is an extremely small hole for a racy to fit in, but it is theoretically possible and so should be closed.
So: - set MD_CHANGE_PENDING when requesting a metadata update for a failed device, so we can know with certainty when it completes - queue requests that experienced an error on a new queue which is only processed after the metadata update completes - call raid_end_bio_io() on bios in that queue when the time comes.
Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
Revision tags: v4.2-rc6, v4.2-rc5, v4.2-rc4, v4.2-rc3, v4.2-rc2 |
|
#
985ca973 |
| 05-Jul-2015 |
NeilBrown <neilb@suse.com> |
md: close some races between setting and checking sync_action.
When checking sync_action in a script, we want to be sure it is as accurate as possible. As resync/reshape etc doesn't always start imm
md: close some races between setting and checking sync_action.
When checking sync_action in a script, we want to be sure it is as accurate as possible. As resync/reshape etc doesn't always start immediately (a separate thread is scheduled to do it), it is best if 'action_show' checks if MD_RECOVER_NEEDED is set (which it does) and in that case reports what is likely to start soon (which it only sometimes does).
So: - report 'reshape' if reshape_position suggests one might start. - set MD_RECOVERY_RECOVER in raid1_reshape(), because that is very likely to happen next.
Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
Revision tags: v4.2-rc1, v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4, v4.1-rc3, v4.1-rc2 |
|
#
8ae12666 |
| 28-Apr-2015 |
Kent Overstreet <kent.overstreet@gmail.com> |
block: kill merge_bvec_fn() completely
As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bve
block: kill merge_bvec_fn() completely
As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely.
Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
423f04d6 |
| 26-Jul-2015 |
NeilBrown <neilb@suse.com> |
md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies
raid1_end_read_request() assumes that the In_sync bits are consistent with the ->degaded count. raid1_spare_active
md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies
raid1_end_read_request() assumes that the In_sync bits are consistent with the ->degaded count. raid1_spare_active updates the In_sync bit before the ->degraded count and so exposes an inconsistency, as does error() So extend the spinlock in raid1_spare_active() and error() to hide those inconsistencies.
This should probably be part of Commit: 34cab6f42003 ("md/raid1: fix test for 'was read error from last working device'.") as it addresses the same issue. It fixes the same bug and should go to -stable for same reasons.
Fixes: 76073054c95b ("md/raid1: clean up read_balance.") Cc: stable@vger.kernel.org (v3.0+) Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
b7c44ed9 |
| 24-Jul-2015 |
Jens Axboe <axboe@fb.com> |
block: manipulate bio->bi_flags through helpers
Some places use helpers now, others don't. We only have the 'is set' helper, add helpers for setting and clearing flags too.
It was a bit of a mess o
block: manipulate bio->bi_flags through helpers
Some places use helpers now, others don't. We only have the 'is set' helper, add helpers for setting and clearing flags too.
It was a bit of a mess of atomic vs non-atomic access. With BIO_UPTODATE gone, we don't have any risk of concurrent access to the flags. So relax the restriction and don't make any of them atomic. The flags that do have serialization issues (reffed and chained), we already handle those separately.
Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
4246a0b6 |
| 20-Jul-2015 |
Christoph Hellwig <hch@lst.de> |
block: add a bi_error field to struct bio
Currently we have two different ways to signal an I/O error on a BIO:
(1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the b
block: add a bi_error field to struct bio
Currently we have two different ways to signal an I/O error on a BIO:
(1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback
The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns.
So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
#
90382ed9 |
| 24-Jun-2015 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
Fix read-balancing during node failure
During a node failure, We need to suspend read balancing so that the reads are directed to the first device and stale data is not read. Suspending writes is no
Fix read-balancing during node failure
During a node failure, We need to suspend read balancing so that the reads are directed to the first device and stale data is not read. Suspending writes is not required because these would be recorded and synced eventually.
A new flag MD_CLUSTER_SUSPEND_READ_BALANCING is set in recover_prep(). area_resyncing() will respond true for the entire devices if this flag is set and the request type is READ. The flag is cleared in recover_done().
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reported-By: David Teigland <teigland@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
34cab6f4 |
| 23-Jul-2015 |
NeilBrown <neilb@suse.com> |
md/raid1: fix test for 'was read error from last working device'.
When we get a read error from the last working device, we don't try to repair it, and don't fail the device. We simple report a rea
md/raid1: fix test for 'was read error from last working device'.
When we get a read error from the last working device, we don't try to repair it, and don't fail the device. We simple report a read error to the caller.
However the current test for 'is this the last working device' is wrong. When there is only one fully working device, it assumes that a non-faulty device is that device. However a spare which is rebuilding would be non-faulty but so not the only working device.
So change the test from "!Faulty" to "In_sync". If ->degraded says there is only one fully working device and this device is in_sync, this must be the one.
This bug has existed since we allowed read_balance to read from a recovering spare in v3.0
Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com> Fixes: 76073054c95b ("md/raid1: clean up read_balance.") Cc: stable@vger.kernel.org (v3.0+) Signed-off-by: NeilBrown <neilb@suse.com>
show more ...
|
#
4452226e |
| 22-May-2015 |
Tejun Heo <tj@kernel.org> |
writeback: move backing_dev_info->state into bdi_writeback
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for wri
writeback: move backing_dev_info->state into bdi_writeback
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently.
This patch moves bdi->state into wb.
* enum bdi_state is renamed to wb_state and the prefix of all enums is changed from BDI_ to WB_.
* Explicit zeroing of bdi->state is removed without adding zeoring of wb->state as the whole data structure is zeroed on init anyway.
* As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->state are mechanically replaced with bdi->wb.state introducing no behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: drbd-dev@lists.linbit.com Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
show more ...
|
Revision tags: v4.1-rc1, v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5, v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1 |
|
#
09314799 |
| 18-Feb-2015 |
NeilBrown <neilb@suse.de> |
md: remove 'go_faster' option from ->sync_request()
This option is not well justified and testing suggests that it hardly ever makes any difference.
The comment suggests there might be a need to wa
md: remove 'go_faster' option from ->sync_request()
This option is not well justified and testing suggests that it hardly ever makes any difference.
The comment suggests there might be a need to wait for non-resync activity indicated by ->nr_waiting, however raise_barrier() already waits for all of that.
So just remove it to simplify reasoning about speed limiting.
This allows us to remove a 'FIXME' comment from raid5.c as that never used the flag.
Signed-off-by: NeilBrown <neilb@suse.de>
show more ...
|
#
d1901ef0 |
| 22-Feb-2015 |
Tomáš Hodek <tomas.hodek@volny.cz> |
md/raid1: fix read balance when a drive is write-mostly.
When a drive is marked write-mostly it should only be the target of reads if there is no other option.
This behaviour was broken by
commit
md/raid1: fix read balance when a drive is write-mostly.
When a drive is marked write-mostly it should only be the target of reads if there is no other option.
This behaviour was broken by
commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc md/raid1: read balance chooses idlest disk for SSD
which causes a write-mostly device to be *preferred* is some cases.
Restore correct behaviour by checking and setting best_dist_disk and best_pending_disk rather than best_disk.
We only need to test one of these as they are both changed from -1 or >=0 at the same time.
As we leave min_pending and best_dist unchanged, any non-write-mostly device will appear better than the write-mostly device.
Reported-by: Tomáš Hodek <tomas.hodek@volny.cz> Reported-by: Dark Penguin <darkpenguin@yandex.ru> Signed-off-by: NeilBrown <neilb@suse.de> Link: http://marc.info/?l=linux-raid&m=135982797322422 Fixes: 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc Cc: stable@vger.kernel.org (3.6+)
show more ...
|
Revision tags: v3.19, v3.19-rc7, v3.19-rc6, v3.19-rc5, v3.19-rc4, v3.19-rc3, v3.19-rc2, v3.19-rc1, v3.18, v3.18-rc7, v3.18-rc6, v3.18-rc5, v3.18-rc4, v3.18-rc3 |
|
#
1aee41f6 |
| 29-Oct-2014 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
Add new disk to clustered array
Algorithm: 1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD) 2. Node 1 sends NEW
Add new disk to clustered array
Algorithm: 1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD) 2. Node 1 sends NEWDISK with uuid and slot number 3. Other nodes issue kobject_uevent_env with uuid and slot number (Steps 4,5 could be a udev rule) 4. In userspace, the node searches for the disk, perhaps using blkid -t SUB_UUID="" 5. Other nodes issue either of the following depending on whether the disk was found: ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and disc.number set to slot number) ioctl(CLUSTERED_DISK_NACK) 6. Other nodes drop lock on no-new-devs (CR) if device is found 7. Node 1 attempts EX lock on no-new-devs 8. If node 1 gets the lock, it sends METADATA_UPDATED after unmarking the disk as SpareLocal 9. If not (get no-new-dev lock), it fails the operation and sends METADATA_UPDATED 10. Other nodes understand if the device is added or not by reading the superblock again after receiving the METADATA_UPDATED message.
Signed-off-by: Lidong Zhong <lzhong@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
show more ...
|
Revision tags: v3.18-rc2, v3.18-rc1, v3.17, v3.17-rc7, v3.17-rc6, v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2, v3.17-rc1 |
|
#
7d49ffcf |
| 12-Aug-2014 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
Read from the first device when an area is resyncing
set choose_first true for cluster read in read balance when the area is resyncing.
Signed-off-by: Lidong Zhong <lzhong@suse.com> Signed-off-by:
Read from the first device when an area is resyncing
set choose_first true for cluster read in read balance when the area is resyncing.
Signed-off-by: Lidong Zhong <lzhong@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
show more ...
|
Revision tags: v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3, v3.16-rc2, v3.16-rc1, v3.15 |
|
#
589a1c49 |
| 07-Jun-2014 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
Suspend writes in RAID1 if within range
If there is a resync going on, all nodes must suspend writes to the range. This is recorded in the suspend_info/suspend_list.
If there is an I/O within the r
Suspend writes in RAID1 if within range
If there is a resync going on, all nodes must suspend writes to the range. This is recorded in the suspend_info/suspend_list.
If there is an I/O within the ranges of any of the suspend_info, should_suspend will return 1.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
show more ...
|
#
ab713cdc |
| 12-Feb-2015 |
Nate Dailey <nate.dailey@stratus.com> |
md/raid1: round up to bdev_logical_block_size in narrow_write_error
This modifies raid1's narrow_write_error to round up block_sectors to the device's logical block size.
This prevents sd complaini
md/raid1: round up to bdev_logical_block_size in narrow_write_error
This modifies raid1's narrow_write_error to round up block_sectors to the device's logical block size.
This prevents sd complaining about "Bad block number requested" for non-512-byte sector disks.
Signed-off-by: Nate Dailey <nate.dailey@stratus.com> Signed-off-by: NeilBrown <neilb@suse.de>
show more ...
|
#
afa0f557 |
| 14-Dec-2014 |
NeilBrown <neilb@suse.de> |
md: rename ->stop to ->free
Now that the ->stop function only frees the private data, rename is accordingly.
Also pass in the private pointer as an arg rather than using mddev->private. This flexi
md: rename ->stop to ->free
Now that the ->stop function only frees the private data, rename is accordingly.
Also pass in the private pointer as an arg rather than using mddev->private. This flexibility will be useful in level_store().
Finally, don't clear ->private. It doesn't make sense to clear it seeing that isn't what we free, and it is no longer necessary to clear ->private (it was some time ago before ->to_remove was introduced).
Setting ->to_remove in ->free() is a bit of a wart, but not a big problem at the moment.
Signed-off-by: NeilBrown <neilb@suse.de>
show more ...
|
#
5aa61f42 |
| 14-Dec-2014 |
NeilBrown <neilb@suse.de> |
md: split detach operation out from ->stop.
Each md personality has a 'stop' operation which does two things: 1/ it finalizes some aspects of the array to ensure nothing is accessing the ->priv
md: split detach operation out from ->stop.
Each md personality has a 'stop' operation which does two things: 1/ it finalizes some aspects of the array to ensure nothing is accessing the ->private data 2/ it frees the ->private data.
All the steps in '1' can apply to all arrays and so can be performed in common code.
This is useful as in the case where we change the personality which manages an array (in level_store()), it would be helpful to do step 1 early, and step 2 later.
So split the 'step 1' functionality out into a new mddev_detach().
Signed-off-by: NeilBrown <neilb@suse.de>
show more ...
|
#
64590f45 |
| 14-Dec-2014 |
NeilBrown <neilb@suse.de> |
md: make merge_bvec_fn more robust in face of personality changes.
There is no locking around calls to merge_bvec_fn(), so it is possible that calls which coincide with a level (or personality) chan
md: make merge_bvec_fn more robust in face of personality changes.
There is no locking around calls to merge_bvec_fn(), so it is possible that calls which coincide with a level (or personality) change could go wrong.
So create a central dispatch point for these functions and use rcu_read_lock(). If the array is suspended, reject any merge that can be rejected. If not, we know it is safe to call the function.
Signed-off-by: NeilBrown <neilb@suse.de>
show more ...
|
#
5c675f83 |
| 14-Dec-2014 |
NeilBrown <neilb@suse.de> |
md: make ->congested robust against personality changes.
There is currently no locking around calls to the 'congested' bdi function. If called at an awkward time while an array is being converted f
md: make ->congested robust against personality changes.
There is currently no locking around calls to the 'congested' bdi function. If called at an awkward time while an array is being converted from one level (or personality) to another, there is a tiny chance of running code in an unreferenced module etc.
So add a 'congested' function to the md_personality operations structure, and call it with appropriate locking from a central 'mddev_congested'.
When the array personality is changing the array will be 'suspended' so no IO is processed. If mddev_congested detects this, it simply reports that the array is congested, which is a safe guess. As mddev_suspend calls synchronize_rcu(), mddev_congested can avoid races by included the whole call inside an rcu_read_lock() region. This require that the congested functions for all subordinate devices can be run under rcu_lock. Fortunately this is the case.
Signed-off-by: NeilBrown <neilb@suse.de>
show more ...
|