History log of /openbmc/linux/drivers/md/raid1.c (Results 201 – 225 of 1017)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# c186b128 30-Sep-2015 Goldwyn Rodrigues <rgoldwyn@suse.com>

md-cluster: Perform resync/recovery under a DLM lock

Resync or recovery must be performed by only one node at a time.
A DLM lock resource, resync_lockres provides the mutual exclusion
so that only o

md-cluster: Perform resync/recovery under a DLM lock

Resync or recovery must be performed by only one node at a time.
A DLM lock resource, resync_lockres provides the mutual exclusion
so that only one node performs the recovery/resync at a time.

If a node is unable to get the resync_lockres, because recovery is
being performed by another node, it set MD_RECOVER_NEEDED so as
to schedule recovery in the future.

Remove the debug message in resync_info_update()
used during development.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

show more ...


Revision tags: v4.3-rc1, v4.2, v4.2-rc8
# 70bcecdb 21-Aug-2015 Goldwyn Rodrigues <rgoldwyn@suse.com>

md-cluster: Improve md_reload_sb to be less error prone

md_reload_sb is too simplistic and it explicitly needs to determine
the changes made by the writing node. However, there are multiple areas
wh

md-cluster: Improve md_reload_sb to be less error prone

md_reload_sb is too simplistic and it explicitly needs to determine
the changes made by the writing node. However, there are multiple areas
where a simple reload could fail.

Instead, read the superblock of one of the "good" rdevs and update
the necessary information:

- read the superblock into a newly allocated page, by temporarily
swapping out rdev->sb_page and calling ->load_super.
- if that fails return
- if it succeeds, call check_sb_changes
1. iterates over list of active devices and checks the matching
dev_roles[] value.
If that is 'faulty', the device must be marked as faulty
- call md_error to mark the device as faulty. Make sure
not to set CHANGE_DEVS and wakeup mddev->thread or else
it would initiate a resync process, which is the responsibility
of the "primary" node.
- clear the Blocked bit
- Call remove_and_add_spares() to hot remove the device.
If the device is 'spare':
- call remove_and_add_spares() to get the number of spares
added in this operation.
- Reduce mddev->degraded to mark the array as not degraded.
2. reset recovery_cp
- read the rest of the rdevs to update recovery_offset. If recovery_offset
is equal to MaxSector, call spare_active() to set it In_sync

This required that recovery_offset be initialized to MaxSector, as
opposed to zero so as to communicate the end of sync for a rdev.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

show more ...


# c40f341f 18-Aug-2015 Goldwyn Rodrigues <rgoldwyn@suse.com>

md-cluster: Use a small window for resync

Suspending the entire device for resync could take too long. Resync
in small chunks.

cluster's resync window (32M) is maintained in r1conf as
cluster_sync_

md-cluster: Use a small window for resync

Suspending the entire device for resync could take too long. Resync
in small chunks.

cluster's resync window (32M) is maintained in r1conf as
cluster_sync_low and cluster_sync_high and processed in
raid1's sync_request(). If the current resync is outside the cluster
resync window:

1. Set the cluster_sync_low to curr_resync_completed.
2. Check if the sync will fit in the new window, if not issue a
wait_barrier() and set cluster_sync_low to sector_nr.
3. Set cluster_sync_high to cluster_sync_low + resync_window.
4. Send a message to all nodes so they may add it in their suspension
list.

bitmap_cond_end_sync is modified to allow to force a sync inorder
to get the curr_resync_completed uptodate with the sector passed.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>

show more ...


# a452744b 01-Oct-2015 Mikulas Patocka <mpatocka@redhat.com>

crash in md-raid1 and md-raid10 due to incorrect list manipulation

The commit 55ce74d4bfe1b9444436264c637f39a152d1e5ac (md/raid1: ensure
device failure recorded before write request returns) is caus

crash in md-raid1 and md-raid10 due to incorrect list manipulation

The commit 55ce74d4bfe1b9444436264c637f39a152d1e5ac (md/raid1: ensure
device failure recorded before write request returns) is causing crash in
the LVM2 testsuite test shell/lvchange-raid.sh. For me the crash is 100%
reproducible.

The reason for the crash is that the newly added code in raid1d moves the
list from conf->bio_end_io_list to tmp, then tests if tmp is non-empty and
then incorrectly pops the bio from conf->bio_end_io_list (which is empty
because the list was alrady moved).

Raid-10 has a similar bug.

Kernel Fault: Code=15 regs=000000006ccb8640 (Addr=0000000100000000)
CPU: 3 PID: 1930 Comm: mdX_raid1 Not tainted 4.2.0-rc5-bisect+ #35
task: 000000006cc1f258 ti: 000000006ccb8000 task.ti: 000000006ccb8000

YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001001111111000001111 Not tainted
r00-03 000000ff0804fe0f 000000001059d000 000000001059f818 000000007f16be38
r04-07 000000001059d000 000000007f16be08 0000000000200200 0000000000000001
r08-11 000000006ccb8260 000000007b7934d0 0000000000000001 0000000000000000
r12-15 000000004056f320 0000000000000000 0000000000013dd0 0000000000000000
r16-19 00000000f0d00ae0 0000000000000000 0000000000000000 0000000000000001
r20-23 000000000800000f 0000000042200390 0000000000000000 0000000000000000
r24-27 0000000000000001 000000000800000f 000000007f16be08 000000001059d000
r28-31 0000000100000000 000000006ccb8560 000000006ccb8640 0000000000000000
sr00-03 0000000000249800 0000000000000000 0000000000000000 0000000000249800
sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 000000001059f61c 000000001059f620
IIR: 0f8010c6 ISR: 0000000000000000 IOR: 0000000100000000
CPU: 3 CR30: 000000006ccb8000 CR31: 0000000000000000
ORIG_R28: 000000001059d000
IAOQ[0]: call_bio_endio+0x34/0x1a8 [raid1]
IAOQ[1]: call_bio_endio+0x38/0x1a8 [raid1]
RP(r2): raid_end_bio_io+0x88/0x168 [raid1]
Backtrace:
[<000000001059f818>] raid_end_bio_io+0x88/0x168 [raid1]
[<00000000105a4f64>] raid1d+0x144/0x1640 [raid1]
[<000000004017fd5c>] kthread+0x144/0x160

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.")
Fixes: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.")
Signed-off-by: NeilBrown <neilb@suse.com>

show more ...


# e8ff8bf0 16-Sep-2015 Jes Sorensen <Jes.Sorensen@redhat.com>

md/raid1: Avoid raid1 resync getting stuck

close_sync() needs to set conf->next_resync to a large, but safe value
below MaxSector and use it to determine whether or not to set
start_next_window in w

md/raid1: Avoid raid1 resync getting stuck

close_sync() needs to set conf->next_resync to a large, but safe value
below MaxSector and use it to determine whether or not to set
start_next_window in wait_barrier()

Solution suggested by Neil Brown.

Reported-by: Nate Dailey <nate.dailey@stratus.com>
Tested-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>

show more ...


# 644df1a8 13-Sep-2015 Julia Lawall <Julia.Lawall@lip6.fr>

md: drop null test before destroy functions

Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@ expression x; @@
-if (x !=

md: drop null test before destroy functions

Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@ expression x; @@
-if (x != NULL)
\(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NeilBrown <neilb@suse.com>

show more ...


Revision tags: v4.2-rc7
# 55ce74d4 13-Aug-2015 NeilBrown <neilb@suse.com>

md/raid1: ensure device failure recorded before write request returns.

When a write to one of the legs of a RAID1 fails, the failure is
recorded in the metadata of the other leg(s) so that after a r

md/raid1: ensure device failure recorded before write request returns.

When a write to one of the legs of a RAID1 fails, the failure is
recorded in the metadata of the other leg(s) so that after a restart
the data on the failed drive wont be trusted even if that drive seems
to be working again (maybe a cable was unplugged).

Similarly when we record a bad-block in response to a write failure,
we must not let the write complete until the bad-block update is safe.

Currently there is no interlock between the write request completing
and the metadata update. So it is possible that the write will
complete, the app will confirm success in some way, and then the
machine will crash before the metadata update completes.

This is an extremely small hole for a racy to fit in, but it is
theoretically possible and so should be closed.

So:
- set MD_CHANGE_PENDING when requesting a metadata update for a
failed device, so we can know with certainty when it completes
- queue requests that experienced an error on a new queue which
is only processed after the metadata update completes
- call raid_end_bio_io() on bios in that queue when the time comes.

Signed-off-by: NeilBrown <neilb@suse.com>

show more ...


Revision tags: v4.2-rc6, v4.2-rc5, v4.2-rc4, v4.2-rc3, v4.2-rc2
# 985ca973 05-Jul-2015 NeilBrown <neilb@suse.com>

md: close some races between setting and checking sync_action.

When checking sync_action in a script, we want to be sure it is
as accurate as possible.
As resync/reshape etc doesn't always start imm

md: close some races between setting and checking sync_action.

When checking sync_action in a script, we want to be sure it is
as accurate as possible.
As resync/reshape etc doesn't always start immediately (a separate
thread is scheduled to do it), it is best if 'action_show'
checks if MD_RECOVER_NEEDED is set (which it does) and in that
case reports what is likely to start soon (which it only sometimes
does).

So:
- report 'reshape' if reshape_position suggests one might start.
- set MD_RECOVERY_RECOVER in raid1_reshape(), because that is very
likely to happen next.

Signed-off-by: NeilBrown <neilb@suse.com>

show more ...


Revision tags: v4.2-rc1, v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4, v4.1-rc3, v4.1-rc2
# 8ae12666 28-Apr-2015 Kent Overstreet <kent.overstreet@gmail.com>

block: kill merge_bvec_fn() completely

As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own ->merge_bve

block: kill merge_bvec_fn() completely

As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own ->merge_bvec_fn() callback. Remove every invocation completely.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@kernel.org>
Cc: ceph-devel@vger.kernel.org
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: also remove ->merge_bvec_fn() in dm-thin as well as
dm-era-target, and resolve merge conflicts]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

show more ...


# 423f04d6 26-Jul-2015 NeilBrown <neilb@suse.com>

md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies

raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active

md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies

raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active updates the In_sync bit before the ->degraded count
and so exposes an inconsistency, as does error()
So extend the spinlock in raid1_spare_active() and error() to hide those
inconsistencies.

This should probably be part of
Commit: 34cab6f42003 ("md/raid1: fix test for 'was read error from
last working device'.")
as it addresses the same issue. It fixes the same bug and should go
to -stable for same reasons.

Fixes: 76073054c95b ("md/raid1: clean up read_balance.")
Cc: stable@vger.kernel.org (v3.0+)
Signed-off-by: NeilBrown <neilb@suse.com>

show more ...


# b7c44ed9 24-Jul-2015 Jens Axboe <axboe@fb.com>

block: manipulate bio->bi_flags through helpers

Some places use helpers now, others don't. We only have the 'is set'
helper, add helpers for setting and clearing flags too.

It was a bit of a mess o

block: manipulate bio->bi_flags through helpers

Some places use helpers now, others don't. We only have the 'is set'
helper, add helpers for setting and clearing flags too.

It was a bit of a mess of atomic vs non-atomic access. With
BIO_UPTODATE gone, we don't have any risk of concurrent access to the
flags. So relax the restriction and don't make any of them atomic. The
flags that do have serialization issues (reffed and chained), we
already handle those separately.

Signed-off-by: Jens Axboe <axboe@fb.com>

show more ...


# 4246a0b6 20-Jul-2015 Christoph Hellwig <hch@lst.de>

block: add a bi_error field to struct bio

Currently we have two different ways to signal an I/O error on a BIO:

(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the b

block: add a bi_error field to struct bio

Currently we have two different ways to signal an I/O error on a BIO:

(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

show more ...


# 90382ed9 24-Jun-2015 Goldwyn Rodrigues <rgoldwyn@suse.com>

Fix read-balancing during node failure

During a node failure, We need to suspend read balancing so that the
reads are directed to the first device and stale data is not read.
Suspending writes is no

Fix read-balancing during node failure

During a node failure, We need to suspend read balancing so that the
reads are directed to the first device and stale data is not read.
Suspending writes is not required because these would be recorded and
synced eventually.

A new flag MD_CLUSTER_SUSPEND_READ_BALANCING is set in recover_prep().
area_resyncing() will respond true for the entire devices if this
flag is set and the request type is READ. The flag is cleared
in recover_done().

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reported-By: David Teigland <teigland@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>

show more ...


# 34cab6f4 23-Jul-2015 NeilBrown <neilb@suse.com>

md/raid1: fix test for 'was read error from last working device'.

When we get a read error from the last working device, we don't
try to repair it, and don't fail the device. We simple report a
rea

md/raid1: fix test for 'was read error from last working device'.

When we get a read error from the last working device, we don't
try to repair it, and don't fail the device. We simple report a
read error to the caller.

However the current test for 'is this the last working device' is
wrong.
When there is only one fully working device, it assumes that a
non-faulty device is that device. However a spare which is rebuilding
would be non-faulty but so not the only working device.

So change the test from "!Faulty" to "In_sync". If ->degraded says
there is only one fully working device and this device is in_sync,
this must be the one.

This bug has existed since we allowed read_balance to read from
a recovering spare in v3.0

Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Fixes: 76073054c95b ("md/raid1: clean up read_balance.")
Cc: stable@vger.kernel.org (v3.0+)
Signed-off-by: NeilBrown <neilb@suse.com>

show more ...


# 4452226e 22-May-2015 Tejun Heo <tj@kernel.org>

writeback: move backing_dev_info->state into bdi_writeback

Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear. For cgroup support for
wri

writeback: move backing_dev_info->state into bdi_writeback

Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear. For cgroup support for
writeback IOs, a bdi will be updated to host multiple wb's where each
wb serves writeback IOs of a different cgroup on the bdi. To achieve
that, a wb should carry all states necessary for servicing writeback
IOs for a cgroup independently.

This patch moves bdi->state into wb.

* enum bdi_state is renamed to wb_state and the prefix of all enums is
changed from BDI_ to WB_.

* Explicit zeroing of bdi->state is removed without adding zeoring of
wb->state as the whole data structure is zeroed on init anyway.

* As there's still only one bdi_writeback per backing_dev_info, all
uses of bdi->state are mechanically replaced with bdi->wb.state
introducing no behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: drbd-dev@lists.linbit.com
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

show more ...


Revision tags: v4.1-rc1, v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5, v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1
# 09314799 18-Feb-2015 NeilBrown <neilb@suse.de>

md: remove 'go_faster' option from ->sync_request()

This option is not well justified and testing suggests that
it hardly ever makes any difference.

The comment suggests there might be a need to wa

md: remove 'go_faster' option from ->sync_request()

This option is not well justified and testing suggests that
it hardly ever makes any difference.

The comment suggests there might be a need to wait for non-resync
activity indicated by ->nr_waiting, however raise_barrier()
already waits for all of that.

So just remove it to simplify reasoning about speed limiting.

This allows us to remove a 'FIXME' comment from raid5.c as that
never used the flag.

Signed-off-by: NeilBrown <neilb@suse.de>

show more ...


# d1901ef0 22-Feb-2015 Tomáš Hodek <tomas.hodek@volny.cz>

md/raid1: fix read balance when a drive is write-mostly.

When a drive is marked write-mostly it should only be the
target of reads if there is no other option.

This behaviour was broken by

commit

md/raid1: fix read balance when a drive is write-mostly.

When a drive is marked write-mostly it should only be the
target of reads if there is no other option.

This behaviour was broken by

commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
md/raid1: read balance chooses idlest disk for SSD

which causes a write-mostly device to be *preferred* is some cases.

Restore correct behaviour by checking and setting
best_dist_disk and best_pending_disk rather than best_disk.

We only need to test one of these as they are both changed
from -1 or >=0 at the same time.

As we leave min_pending and best_dist unchanged, any non-write-mostly
device will appear better than the write-mostly device.

Reported-by: Tomáš Hodek <tomas.hodek@volny.cz>
Reported-by: Dark Penguin <darkpenguin@yandex.ru>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: http://marc.info/?l=linux-raid&m=135982797322422
Fixes: 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
Cc: stable@vger.kernel.org (3.6+)

show more ...


Revision tags: v3.19, v3.19-rc7, v3.19-rc6, v3.19-rc5, v3.19-rc4, v3.19-rc3, v3.19-rc2, v3.19-rc1, v3.18, v3.18-rc7, v3.18-rc6, v3.18-rc5, v3.18-rc4, v3.18-rc3
# 1aee41f6 29-Oct-2014 Goldwyn Rodrigues <rgoldwyn@suse.com>

Add new disk to clustered array

Algorithm:
1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD)
2. Node 1 sends NEW

Add new disk to clustered array

Algorithm:
1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD)
2. Node 1 sends NEWDISK with uuid and slot number
3. Other nodes issue kobject_uevent_env with uuid and slot number
(Steps 4,5 could be a udev rule)
4. In userspace, the node searches for the disk, perhaps
using blkid -t SUB_UUID=""
5. Other nodes issue either of the following depending on whether the disk
was found:
ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
disc.number set to slot number)
ioctl(CLUSTERED_DISK_NACK)
6. Other nodes drop lock on no-new-devs (CR) if device is found
7. Node 1 attempts EX lock on no-new-devs
8. If node 1 gets the lock, it sends METADATA_UPDATED after unmarking the disk
as SpareLocal
9. If not (get no-new-dev lock), it fails the operation and sends METADATA_UPDATED
10. Other nodes understand if the device is added or not by reading the superblock again after receiving the METADATA_UPDATED message.

Signed-off-by: Lidong Zhong <lzhong@suse.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

show more ...


Revision tags: v3.18-rc2, v3.18-rc1, v3.17, v3.17-rc7, v3.17-rc6, v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2, v3.17-rc1
# 7d49ffcf 12-Aug-2014 Goldwyn Rodrigues <rgoldwyn@suse.com>

Read from the first device when an area is resyncing

set choose_first true for cluster read in read balance when the area
is resyncing.

Signed-off-by: Lidong Zhong <lzhong@suse.com>
Signed-off-by:

Read from the first device when an area is resyncing

set choose_first true for cluster read in read balance when the area
is resyncing.

Signed-off-by: Lidong Zhong <lzhong@suse.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

show more ...


Revision tags: v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3, v3.16-rc2, v3.16-rc1, v3.15
# 589a1c49 07-Jun-2014 Goldwyn Rodrigues <rgoldwyn@suse.com>

Suspend writes in RAID1 if within range

If there is a resync going on, all nodes must suspend writes to the
range. This is recorded in the suspend_info/suspend_list.

If there is an I/O within the r

Suspend writes in RAID1 if within range

If there is a resync going on, all nodes must suspend writes to the
range. This is recorded in the suspend_info/suspend_list.

If there is an I/O within the ranges of any of the suspend_info,
should_suspend will return 1.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

show more ...


# ab713cdc 12-Feb-2015 Nate Dailey <nate.dailey@stratus.com>

md/raid1: round up to bdev_logical_block_size in narrow_write_error

This modifies raid1's narrow_write_error to round up block_sectors to the
device's logical block size.

This prevents sd complaini

md/raid1: round up to bdev_logical_block_size in narrow_write_error

This modifies raid1's narrow_write_error to round up block_sectors to the
device's logical block size.

This prevents sd complaining about "Bad block number requested" for non-512-byte
sector disks.

Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>

show more ...


# afa0f557 14-Dec-2014 NeilBrown <neilb@suse.de>

md: rename ->stop to ->free

Now that the ->stop function only frees the private data,
rename is accordingly.

Also pass in the private pointer as an arg rather than using
mddev->private. This flexi

md: rename ->stop to ->free

Now that the ->stop function only frees the private data,
rename is accordingly.

Also pass in the private pointer as an arg rather than using
mddev->private. This flexibility will be useful in level_store().

Finally, don't clear ->private. It doesn't make sense to clear
it seeing that isn't what we free, and it is no longer necessary
to clear ->private (it was some time ago before ->to_remove was
introduced).

Setting ->to_remove in ->free() is a bit of a wart, but not a
big problem at the moment.

Signed-off-by: NeilBrown <neilb@suse.de>

show more ...


# 5aa61f42 14-Dec-2014 NeilBrown <neilb@suse.de>

md: split detach operation out from ->stop.

Each md personality has a 'stop' operation which does two
things:
1/ it finalizes some aspects of the array to ensure nothing
is accessing the ->priv

md: split detach operation out from ->stop.

Each md personality has a 'stop' operation which does two
things:
1/ it finalizes some aspects of the array to ensure nothing
is accessing the ->private data
2/ it frees the ->private data.

All the steps in '1' can apply to all arrays and so can be
performed in common code.

This is useful as in the case where we change the personality which
manages an array (in level_store()), it would be helpful to do
step 1 early, and step 2 later.

So split the 'step 1' functionality out into a new mddev_detach().

Signed-off-by: NeilBrown <neilb@suse.de>

show more ...


# 64590f45 14-Dec-2014 NeilBrown <neilb@suse.de>

md: make merge_bvec_fn more robust in face of personality changes.

There is no locking around calls to merge_bvec_fn(), so
it is possible that calls which coincide with a level (or personality)
chan

md: make merge_bvec_fn more robust in face of personality changes.

There is no locking around calls to merge_bvec_fn(), so
it is possible that calls which coincide with a level (or personality)
change could go wrong.

So create a central dispatch point for these functions and use
rcu_read_lock().
If the array is suspended, reject any merge that can be rejected.
If not, we know it is safe to call the function.

Signed-off-by: NeilBrown <neilb@suse.de>

show more ...


# 5c675f83 14-Dec-2014 NeilBrown <neilb@suse.de>

md: make ->congested robust against personality changes.

There is currently no locking around calls to the 'congested'
bdi function. If called at an awkward time while an array is
being converted f

md: make ->congested robust against personality changes.

There is currently no locking around calls to the 'congested'
bdi function. If called at an awkward time while an array is
being converted from one level (or personality) to another, there
is a tiny chance of running code in an unreferenced module etc.

So add a 'congested' function to the md_personality operations
structure, and call it with appropriate locking from a central
'mddev_congested'.

When the array personality is changing the array will be 'suspended'
so no IO is processed.
If mddev_congested detects this, it simply reports that the
array is congested, which is a safe guess.
As mddev_suspend calls synchronize_rcu(), mddev_congested can
avoid races by included the whole call inside an rcu_read_lock()
region.
This require that the congested functions for all subordinate devices
can be run under rcu_lock. Fortunately this is the case.

Signed-off-by: NeilBrown <neilb@suse.de>

show more ...


12345678910>>...41