xref: /openbmc/linux/Documentation/admin-guide/device-mapper/dm-raid.rst (revision cdd38c5f1ce4398ec58fec95904b75824daab7b5)
16cf2a73cSMauro Carvalho Chehab=======
26cf2a73cSMauro Carvalho Chehabdm-raid
36cf2a73cSMauro Carvalho Chehab=======
46cf2a73cSMauro Carvalho Chehab
56cf2a73cSMauro Carvalho ChehabThe device-mapper RAID (dm-raid) target provides a bridge from DM to MD.
66cf2a73cSMauro Carvalho ChehabIt allows the MD RAID drivers to be accessed using a device-mapper
76cf2a73cSMauro Carvalho Chehabinterface.
86cf2a73cSMauro Carvalho Chehab
96cf2a73cSMauro Carvalho Chehab
106cf2a73cSMauro Carvalho ChehabMapping Table Interface
116cf2a73cSMauro Carvalho Chehab-----------------------
126cf2a73cSMauro Carvalho ChehabThe target is named "raid" and it accepts the following parameters::
136cf2a73cSMauro Carvalho Chehab
146cf2a73cSMauro Carvalho Chehab  <raid_type> <#raid_params> <raid_params> \
156cf2a73cSMauro Carvalho Chehab    <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
166cf2a73cSMauro Carvalho Chehab
176cf2a73cSMauro Carvalho Chehab<raid_type>:
186cf2a73cSMauro Carvalho Chehab
196cf2a73cSMauro Carvalho Chehab  ============= ===============================================================
206cf2a73cSMauro Carvalho Chehab  raid0		RAID0 striping (no resilience)
216cf2a73cSMauro Carvalho Chehab  raid1		RAID1 mirroring
226cf2a73cSMauro Carvalho Chehab  raid4		RAID4 with dedicated last parity disk
236cf2a73cSMauro Carvalho Chehab  raid5_n 	RAID5 with dedicated last parity disk supporting takeover
246cf2a73cSMauro Carvalho Chehab		Same as raid4
256cf2a73cSMauro Carvalho Chehab
266cf2a73cSMauro Carvalho Chehab		- Transitory layout
276cf2a73cSMauro Carvalho Chehab  raid5_la	RAID5 left asymmetric
286cf2a73cSMauro Carvalho Chehab
296cf2a73cSMauro Carvalho Chehab		- rotating parity 0 with data continuation
306cf2a73cSMauro Carvalho Chehab  raid5_ra	RAID5 right asymmetric
316cf2a73cSMauro Carvalho Chehab
326cf2a73cSMauro Carvalho Chehab		- rotating parity N with data continuation
336cf2a73cSMauro Carvalho Chehab  raid5_ls	RAID5 left symmetric
346cf2a73cSMauro Carvalho Chehab
356cf2a73cSMauro Carvalho Chehab		- rotating parity 0 with data restart
366cf2a73cSMauro Carvalho Chehab  raid5_rs 	RAID5 right symmetric
376cf2a73cSMauro Carvalho Chehab
386cf2a73cSMauro Carvalho Chehab		- rotating parity N with data restart
396cf2a73cSMauro Carvalho Chehab  raid6_zr	RAID6 zero restart
406cf2a73cSMauro Carvalho Chehab
416cf2a73cSMauro Carvalho Chehab		- rotating parity zero (left-to-right) with data restart
426cf2a73cSMauro Carvalho Chehab  raid6_nr	RAID6 N restart
436cf2a73cSMauro Carvalho Chehab
446cf2a73cSMauro Carvalho Chehab		- rotating parity N (right-to-left) with data restart
456cf2a73cSMauro Carvalho Chehab  raid6_nc	RAID6 N continue
466cf2a73cSMauro Carvalho Chehab
476cf2a73cSMauro Carvalho Chehab		- rotating parity N (right-to-left) with data continuation
486cf2a73cSMauro Carvalho Chehab  raid6_n_6	RAID6 with dedicate parity disks
496cf2a73cSMauro Carvalho Chehab
506cf2a73cSMauro Carvalho Chehab		- parity and Q-syndrome on the last 2 disks;
516cf2a73cSMauro Carvalho Chehab		  layout for takeover from/to raid4/raid5_n
526cf2a73cSMauro Carvalho Chehab  raid6_la_6	Same as "raid_la" plus dedicated last Q-syndrome disk
536cf2a73cSMauro Carvalho Chehab
546cf2a73cSMauro Carvalho Chehab		- layout for takeover from raid5_la from/to raid6
556cf2a73cSMauro Carvalho Chehab  raid6_ra_6	Same as "raid5_ra" dedicated last Q-syndrome disk
566cf2a73cSMauro Carvalho Chehab
576cf2a73cSMauro Carvalho Chehab		- layout for takeover from raid5_ra from/to raid6
586cf2a73cSMauro Carvalho Chehab  raid6_ls_6	Same as "raid5_ls" dedicated last Q-syndrome disk
596cf2a73cSMauro Carvalho Chehab
606cf2a73cSMauro Carvalho Chehab		- layout for takeover from raid5_ls from/to raid6
616cf2a73cSMauro Carvalho Chehab  raid6_rs_6	Same as "raid5_rs" dedicated last Q-syndrome disk
626cf2a73cSMauro Carvalho Chehab
636cf2a73cSMauro Carvalho Chehab		- layout for takeover from raid5_rs from/to raid6
646cf2a73cSMauro Carvalho Chehab  raid10        Various RAID10 inspired algorithms chosen by additional params
656cf2a73cSMauro Carvalho Chehab		(see raid10_format and raid10_copies below)
666cf2a73cSMauro Carvalho Chehab
676cf2a73cSMauro Carvalho Chehab		- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
686cf2a73cSMauro Carvalho Chehab		- RAID1E: Integrated Adjacent Stripe Mirroring
696cf2a73cSMauro Carvalho Chehab		- RAID1E: Integrated Offset Stripe Mirroring
706cf2a73cSMauro Carvalho Chehab		- and other similar RAID10 variants
716cf2a73cSMauro Carvalho Chehab  ============= ===============================================================
726cf2a73cSMauro Carvalho Chehab
736cf2a73cSMauro Carvalho Chehab  Reference: Chapter 4 of
746f3bc22bSAlexander A. Klimov  https://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
756cf2a73cSMauro Carvalho Chehab
766cf2a73cSMauro Carvalho Chehab<#raid_params>: The number of parameters that follow.
776cf2a73cSMauro Carvalho Chehab
786cf2a73cSMauro Carvalho Chehab<raid_params> consists of
796cf2a73cSMauro Carvalho Chehab
806cf2a73cSMauro Carvalho Chehab    Mandatory parameters:
816cf2a73cSMauro Carvalho Chehab        <chunk_size>:
826cf2a73cSMauro Carvalho Chehab		      Chunk size in sectors.  This parameter is often known as
836cf2a73cSMauro Carvalho Chehab		      "stripe size".  It is the only mandatory parameter and
846cf2a73cSMauro Carvalho Chehab		      is placed first.
856cf2a73cSMauro Carvalho Chehab
866cf2a73cSMauro Carvalho Chehab    followed by optional parameters (in any order):
876cf2a73cSMauro Carvalho Chehab	[sync|nosync]
886cf2a73cSMauro Carvalho Chehab		Force or prevent RAID initialization.
896cf2a73cSMauro Carvalho Chehab
906cf2a73cSMauro Carvalho Chehab	[rebuild <idx>]
916cf2a73cSMauro Carvalho Chehab		Rebuild drive number 'idx' (first drive is 0).
926cf2a73cSMauro Carvalho Chehab
936cf2a73cSMauro Carvalho Chehab	[daemon_sleep <ms>]
946cf2a73cSMauro Carvalho Chehab		Interval between runs of the bitmap daemon that
956cf2a73cSMauro Carvalho Chehab		clear bits.  A longer interval means less bitmap I/O but
966cf2a73cSMauro Carvalho Chehab		resyncing after a failure is likely to take longer.
976cf2a73cSMauro Carvalho Chehab
986cf2a73cSMauro Carvalho Chehab	[min_recovery_rate <kB/sec/disk>]
996cf2a73cSMauro Carvalho Chehab		Throttle RAID initialization
1006cf2a73cSMauro Carvalho Chehab	[max_recovery_rate <kB/sec/disk>]
1016cf2a73cSMauro Carvalho Chehab		Throttle RAID initialization
1026cf2a73cSMauro Carvalho Chehab	[write_mostly <idx>]
1036cf2a73cSMauro Carvalho Chehab		Mark drive index 'idx' write-mostly.
1046cf2a73cSMauro Carvalho Chehab	[max_write_behind <sectors>]
1056cf2a73cSMauro Carvalho Chehab		See '--write-behind=' (man mdadm)
1066cf2a73cSMauro Carvalho Chehab	[stripe_cache <sectors>]
1076cf2a73cSMauro Carvalho Chehab		Stripe cache size (RAID 4/5/6 only)
1086cf2a73cSMauro Carvalho Chehab	[region_size <sectors>]
1096cf2a73cSMauro Carvalho Chehab		The region_size multiplied by the number of regions is the
1106cf2a73cSMauro Carvalho Chehab		logical size of the array.  The bitmap records the device
1116cf2a73cSMauro Carvalho Chehab		synchronisation state for each region.
1126cf2a73cSMauro Carvalho Chehab
1136cf2a73cSMauro Carvalho Chehab        [raid10_copies   <# copies>], [raid10_format   <near|far|offset>]
1146cf2a73cSMauro Carvalho Chehab		These two options are used to alter the default layout of
1156cf2a73cSMauro Carvalho Chehab		a RAID10 configuration.  The number of copies is can be
1166cf2a73cSMauro Carvalho Chehab		specified, but the default is 2.  There are also three
1176cf2a73cSMauro Carvalho Chehab		variations to how the copies are laid down - the default
1186cf2a73cSMauro Carvalho Chehab		is "near".  Near copies are what most people think of with
1196cf2a73cSMauro Carvalho Chehab		respect to mirroring.  If these options are left unspecified,
1206cf2a73cSMauro Carvalho Chehab		or 'raid10_copies 2' and/or 'raid10_format near' are given,
1216cf2a73cSMauro Carvalho Chehab		then the layouts for 2, 3 and 4 devices	are:
1226cf2a73cSMauro Carvalho Chehab
1236cf2a73cSMauro Carvalho Chehab		========	 ==========	   ==============
1246cf2a73cSMauro Carvalho Chehab		2 drives         3 drives          4 drives
1256cf2a73cSMauro Carvalho Chehab		========	 ==========	   ==============
1266cf2a73cSMauro Carvalho Chehab		A1  A1           A1  A1  A2        A1  A1  A2  A2
1276cf2a73cSMauro Carvalho Chehab		A2  A2           A2  A3  A3        A3  A3  A4  A4
1286cf2a73cSMauro Carvalho Chehab		A3  A3           A4  A4  A5        A5  A5  A6  A6
1296cf2a73cSMauro Carvalho Chehab		A4  A4           A5  A6  A6        A7  A7  A8  A8
1306cf2a73cSMauro Carvalho Chehab		..  ..           ..  ..  ..        ..  ..  ..  ..
1316cf2a73cSMauro Carvalho Chehab		========	 ==========	   ==============
1326cf2a73cSMauro Carvalho Chehab
1336cf2a73cSMauro Carvalho Chehab		The 2-device layout is equivalent 2-way RAID1.  The 4-device
1346cf2a73cSMauro Carvalho Chehab		layout is what a traditional RAID10 would look like.  The
1356cf2a73cSMauro Carvalho Chehab		3-device layout is what might be called a 'RAID1E - Integrated
1366cf2a73cSMauro Carvalho Chehab		Adjacent Stripe Mirroring'.
1376cf2a73cSMauro Carvalho Chehab
1386cf2a73cSMauro Carvalho Chehab		If 'raid10_copies 2' and 'raid10_format far', then the layouts
1396cf2a73cSMauro Carvalho Chehab		for 2, 3 and 4 devices are:
1406cf2a73cSMauro Carvalho Chehab
1416cf2a73cSMauro Carvalho Chehab		========	     ============	  ===================
1426cf2a73cSMauro Carvalho Chehab		2 drives             3 drives             4 drives
1436cf2a73cSMauro Carvalho Chehab		========	     ============	  ===================
1446cf2a73cSMauro Carvalho Chehab		A1  A2               A1   A2   A3         A1   A2   A3   A4
1456cf2a73cSMauro Carvalho Chehab		A3  A4               A4   A5   A6         A5   A6   A7   A8
1466cf2a73cSMauro Carvalho Chehab		A5  A6               A7   A8   A9         A9   A10  A11  A12
1476cf2a73cSMauro Carvalho Chehab		..  ..               ..   ..   ..         ..   ..   ..   ..
1486cf2a73cSMauro Carvalho Chehab		A2  A1               A3   A1   A2         A2   A1   A4   A3
1496cf2a73cSMauro Carvalho Chehab		A4  A3               A6   A4   A5         A6   A5   A8   A7
1506cf2a73cSMauro Carvalho Chehab		A6  A5               A9   A7   A8         A10  A9   A12  A11
1516cf2a73cSMauro Carvalho Chehab		..  ..               ..   ..   ..         ..   ..   ..   ..
1526cf2a73cSMauro Carvalho Chehab		========	     ============	  ===================
1536cf2a73cSMauro Carvalho Chehab
1546cf2a73cSMauro Carvalho Chehab		If 'raid10_copies 2' and 'raid10_format offset', then the
1556cf2a73cSMauro Carvalho Chehab		layouts for 2, 3 and 4 devices are:
1566cf2a73cSMauro Carvalho Chehab
1576cf2a73cSMauro Carvalho Chehab		========       ==========         ================
1586cf2a73cSMauro Carvalho Chehab		2 drives       3 drives           4 drives
1596cf2a73cSMauro Carvalho Chehab		========       ==========         ================
1606cf2a73cSMauro Carvalho Chehab		A1  A2         A1  A2  A3         A1  A2  A3  A4
1616cf2a73cSMauro Carvalho Chehab		A2  A1         A3  A1  A2         A2  A1  A4  A3
1626cf2a73cSMauro Carvalho Chehab		A3  A4         A4  A5  A6         A5  A6  A7  A8
1636cf2a73cSMauro Carvalho Chehab		A4  A3         A6  A4  A5         A6  A5  A8  A7
1646cf2a73cSMauro Carvalho Chehab		A5  A6         A7  A8  A9         A9  A10 A11 A12
1656cf2a73cSMauro Carvalho Chehab		A6  A5         A9  A7  A8         A10 A9  A12 A11
1666cf2a73cSMauro Carvalho Chehab		..  ..         ..  ..  ..         ..  ..  ..  ..
1676cf2a73cSMauro Carvalho Chehab		========       ==========         ================
1686cf2a73cSMauro Carvalho Chehab
1696cf2a73cSMauro Carvalho Chehab		Here we see layouts closely akin to 'RAID1E - Integrated
1706cf2a73cSMauro Carvalho Chehab		Offset Stripe Mirroring'.
1716cf2a73cSMauro Carvalho Chehab
1726cf2a73cSMauro Carvalho Chehab        [delta_disks <N>]
1736cf2a73cSMauro Carvalho Chehab		The delta_disks option value (-251 < N < +251) triggers
1746cf2a73cSMauro Carvalho Chehab		device removal (negative value) or device addition (positive
1756cf2a73cSMauro Carvalho Chehab		value) to any reshape supporting raid levels 4/5/6 and 10.
1766cf2a73cSMauro Carvalho Chehab		RAID levels 4/5/6 allow for addition of devices (metadata
1776cf2a73cSMauro Carvalho Chehab		and data device tuple), raid10_near and raid10_offset only
1786cf2a73cSMauro Carvalho Chehab		allow for device addition. raid10_far does not support any
1796cf2a73cSMauro Carvalho Chehab		reshaping at all.
1806cf2a73cSMauro Carvalho Chehab		A minimum of devices have to be kept to enforce resilience,
1816cf2a73cSMauro Carvalho Chehab		which is 3 devices for raid4/5 and 4 devices for raid6.
1826cf2a73cSMauro Carvalho Chehab
1836cf2a73cSMauro Carvalho Chehab        [data_offset <sectors>]
1846cf2a73cSMauro Carvalho Chehab		This option value defines the offset into each data device
1856cf2a73cSMauro Carvalho Chehab		where the data starts. This is used to provide out-of-place
1866cf2a73cSMauro Carvalho Chehab		reshaping space to avoid writing over data while
1876cf2a73cSMauro Carvalho Chehab		changing the layout of stripes, hence an interruption/crash
1886cf2a73cSMauro Carvalho Chehab		may happen at any time without the risk of losing data.
1896cf2a73cSMauro Carvalho Chehab		E.g. when adding devices to an existing raid set during
1906cf2a73cSMauro Carvalho Chehab		forward reshaping, the out-of-place space will be allocated
1916cf2a73cSMauro Carvalho Chehab		at the beginning of each raid device. The kernel raid4/5/6/10
1926cf2a73cSMauro Carvalho Chehab		MD personalities supporting such device addition will read the data from
1936cf2a73cSMauro Carvalho Chehab		the existing first stripes (those with smaller number of stripes)
1946cf2a73cSMauro Carvalho Chehab		starting at data_offset to fill up a new stripe with the larger
1956cf2a73cSMauro Carvalho Chehab		number of stripes, calculate the redundancy blocks (CRC/Q-syndrome)
1966cf2a73cSMauro Carvalho Chehab		and write that new stripe to offset 0. Same will be applied to all
1976cf2a73cSMauro Carvalho Chehab		N-1 other new stripes. This out-of-place scheme is used to change
1986cf2a73cSMauro Carvalho Chehab		the RAID type (i.e. the allocation algorithm) as well, e.g.
1996cf2a73cSMauro Carvalho Chehab		changing from raid5_ls to raid5_n.
2006cf2a73cSMauro Carvalho Chehab
2016cf2a73cSMauro Carvalho Chehab	[journal_dev <dev>]
2026cf2a73cSMauro Carvalho Chehab		This option adds a journal device to raid4/5/6 raid sets and
2036cf2a73cSMauro Carvalho Chehab		uses it to close the 'write hole' caused by the non-atomic updates
2046cf2a73cSMauro Carvalho Chehab		to the component devices which can cause data loss during recovery.
2056cf2a73cSMauro Carvalho Chehab		The journal device is used as writethrough thus causing writes to
2066cf2a73cSMauro Carvalho Chehab		be throttled versus non-journaled raid4/5/6 sets.
2076cf2a73cSMauro Carvalho Chehab		Takeover/reshape is not possible with a raid4/5/6 journal device;
2086cf2a73cSMauro Carvalho Chehab		it has to be deconfigured before requesting these.
2096cf2a73cSMauro Carvalho Chehab
2106cf2a73cSMauro Carvalho Chehab	[journal_mode <mode>]
2116cf2a73cSMauro Carvalho Chehab		This option sets the caching mode on journaled raid4/5/6 raid sets
2126cf2a73cSMauro Carvalho Chehab		(see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'.
2136cf2a73cSMauro Carvalho Chehab		If 'writeback' is selected the journal device has to be resilient
2146cf2a73cSMauro Carvalho Chehab		and must not suffer from the 'write hole' problem itself (e.g. use
2156cf2a73cSMauro Carvalho Chehab		raid1 or raid10) to avoid a single point of failure.
2166cf2a73cSMauro Carvalho Chehab
2176cf2a73cSMauro Carvalho Chehab<#raid_devs>: The number of devices composing the array.
2186cf2a73cSMauro Carvalho Chehab	Each device consists of two entries.  The first is the device
2196cf2a73cSMauro Carvalho Chehab	containing the metadata (if any); the second is the one containing the
2206cf2a73cSMauro Carvalho Chehab	data. A Maximum of 64 metadata/data device entries are supported
2216cf2a73cSMauro Carvalho Chehab	up to target version 1.8.0.
2226cf2a73cSMauro Carvalho Chehab	1.9.0 supports up to 253 which is enforced by the used MD kernel runtime.
2236cf2a73cSMauro Carvalho Chehab
2246cf2a73cSMauro Carvalho Chehab	If a drive has failed or is missing at creation time, a '-' can be
2256cf2a73cSMauro Carvalho Chehab	given for both the metadata and data drives for a given position.
2266cf2a73cSMauro Carvalho Chehab
2276cf2a73cSMauro Carvalho Chehab
2286cf2a73cSMauro Carvalho ChehabExample Tables
2296cf2a73cSMauro Carvalho Chehab--------------
2306cf2a73cSMauro Carvalho Chehab
2316cf2a73cSMauro Carvalho Chehab::
2326cf2a73cSMauro Carvalho Chehab
2336cf2a73cSMauro Carvalho Chehab  # RAID4 - 4 data drives, 1 parity (no metadata devices)
2346cf2a73cSMauro Carvalho Chehab  # No metadata devices specified to hold superblock/bitmap info
2356cf2a73cSMauro Carvalho Chehab  # Chunk size of 1MiB
2366cf2a73cSMauro Carvalho Chehab  # (Lines separated for easy reading)
2376cf2a73cSMauro Carvalho Chehab
2386cf2a73cSMauro Carvalho Chehab  0 1960893648 raid \
2396cf2a73cSMauro Carvalho Chehab          raid4 1 2048 \
2406cf2a73cSMauro Carvalho Chehab          5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
2416cf2a73cSMauro Carvalho Chehab
2426cf2a73cSMauro Carvalho Chehab  # RAID4 - 4 data drives, 1 parity (with metadata devices)
2436cf2a73cSMauro Carvalho Chehab  # Chunk size of 1MiB, force RAID initialization,
2446cf2a73cSMauro Carvalho Chehab  #       min recovery rate at 20 kiB/sec/disk
2456cf2a73cSMauro Carvalho Chehab
2466cf2a73cSMauro Carvalho Chehab  0 1960893648 raid \
2476cf2a73cSMauro Carvalho Chehab          raid4 4 2048 sync min_recovery_rate 20 \
2486cf2a73cSMauro Carvalho Chehab          5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
2496cf2a73cSMauro Carvalho Chehab
2506cf2a73cSMauro Carvalho Chehab
2516cf2a73cSMauro Carvalho ChehabStatus Output
2526cf2a73cSMauro Carvalho Chehab-------------
2536cf2a73cSMauro Carvalho Chehab'dmsetup table' displays the table used to construct the mapping.
2546cf2a73cSMauro Carvalho ChehabThe optional parameters are always printed in the order listed
2556cf2a73cSMauro Carvalho Chehababove with "sync" or "nosync" always output ahead of the other
2566cf2a73cSMauro Carvalho Chehabarguments, regardless of the order used when originally loading the table.
2576cf2a73cSMauro Carvalho ChehabArguments that can be repeated are ordered by value.
2586cf2a73cSMauro Carvalho Chehab
2596cf2a73cSMauro Carvalho Chehab
2606cf2a73cSMauro Carvalho Chehab'dmsetup status' yields information on the state and health of the array.
2616cf2a73cSMauro Carvalho ChehabThe output is as follows (normally a single line, but expanded here for
2626cf2a73cSMauro Carvalho Chehabclarity)::
2636cf2a73cSMauro Carvalho Chehab
2646cf2a73cSMauro Carvalho Chehab  1: <s> <l> raid \
2656cf2a73cSMauro Carvalho Chehab  2:      <raid_type> <#devices> <health_chars> \
2666cf2a73cSMauro Carvalho Chehab  3:      <sync_ratio> <sync_action> <mismatch_cnt>
2676cf2a73cSMauro Carvalho Chehab
2686cf2a73cSMauro Carvalho ChehabLine 1 is the standard output produced by device-mapper.
2696cf2a73cSMauro Carvalho Chehab
2706cf2a73cSMauro Carvalho ChehabLine 2 & 3 are produced by the raid target and are best explained by example::
2716cf2a73cSMauro Carvalho Chehab
2726cf2a73cSMauro Carvalho Chehab        0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
2736cf2a73cSMauro Carvalho Chehab
2746cf2a73cSMauro Carvalho ChehabHere we can see the RAID type is raid4, there are 5 devices - all of
2756cf2a73cSMauro Carvalho Chehabwhich are 'A'live, and the array is 2/490221568 complete with its initial
2766cf2a73cSMauro Carvalho Chehabrecovery.  Here is a fuller description of the individual fields:
2776cf2a73cSMauro Carvalho Chehab
2786cf2a73cSMauro Carvalho Chehab	=============== =========================================================
2796cf2a73cSMauro Carvalho Chehab	<raid_type>     Same as the <raid_type> used to create the array.
2806cf2a73cSMauro Carvalho Chehab	<health_chars>  One char for each device, indicating:
2816cf2a73cSMauro Carvalho Chehab
2826cf2a73cSMauro Carvalho Chehab			- 'A' = alive and in-sync
2836cf2a73cSMauro Carvalho Chehab			- 'a' = alive but not in-sync
2846cf2a73cSMauro Carvalho Chehab			- 'D' = dead/failed.
2856cf2a73cSMauro Carvalho Chehab	<sync_ratio>    The ratio indicating how much of the array has undergone
2866cf2a73cSMauro Carvalho Chehab			the process described by 'sync_action'.  If the
2876cf2a73cSMauro Carvalho Chehab			'sync_action' is "check" or "repair", then the process
2886cf2a73cSMauro Carvalho Chehab			of "resync" or "recover" can be considered complete.
2896cf2a73cSMauro Carvalho Chehab	<sync_action>   One of the following possible states:
2906cf2a73cSMauro Carvalho Chehab
2916cf2a73cSMauro Carvalho Chehab			idle
2926cf2a73cSMauro Carvalho Chehab				- No synchronization action is being performed.
2936cf2a73cSMauro Carvalho Chehab			frozen
2946cf2a73cSMauro Carvalho Chehab				- The current action has been halted.
2956cf2a73cSMauro Carvalho Chehab			resync
2966cf2a73cSMauro Carvalho Chehab				- Array is undergoing its initial synchronization
2976cf2a73cSMauro Carvalho Chehab				  or is resynchronizing after an unclean shutdown
2986cf2a73cSMauro Carvalho Chehab				  (possibly aided by a bitmap).
2996cf2a73cSMauro Carvalho Chehab			recover
3006cf2a73cSMauro Carvalho Chehab				- A device in the array is being rebuilt or
3016cf2a73cSMauro Carvalho Chehab				  replaced.
3026cf2a73cSMauro Carvalho Chehab			check
3036cf2a73cSMauro Carvalho Chehab				- A user-initiated full check of the array is
3046cf2a73cSMauro Carvalho Chehab				  being performed.  All blocks are read and
3056cf2a73cSMauro Carvalho Chehab				  checked for consistency.  The number of
3066cf2a73cSMauro Carvalho Chehab				  discrepancies found are recorded in
3076cf2a73cSMauro Carvalho Chehab				  <mismatch_cnt>.  No changes are made to the
3086cf2a73cSMauro Carvalho Chehab				  array by this action.
3096cf2a73cSMauro Carvalho Chehab			repair
3106cf2a73cSMauro Carvalho Chehab				- The same as "check", but discrepancies are
3116cf2a73cSMauro Carvalho Chehab				  corrected.
3126cf2a73cSMauro Carvalho Chehab			reshape
3136cf2a73cSMauro Carvalho Chehab				- The array is undergoing a reshape.
3146cf2a73cSMauro Carvalho Chehab	<mismatch_cnt>  The number of discrepancies found between mirror copies
3156cf2a73cSMauro Carvalho Chehab			in RAID1/10 or wrong parity values found in RAID4/5/6.
3166cf2a73cSMauro Carvalho Chehab			This value is valid only after a "check" of the array
3176cf2a73cSMauro Carvalho Chehab			is performed.  A healthy array has a 'mismatch_cnt' of 0.
3186cf2a73cSMauro Carvalho Chehab	<data_offset>   The current data offset to the start of the user data on
3196cf2a73cSMauro Carvalho Chehab			each component device of a raid set (see the respective
3206cf2a73cSMauro Carvalho Chehab			raid parameter to support out-of-place reshaping).
3216cf2a73cSMauro Carvalho Chehab	<journal_char>	- 'A' - active write-through journal device.
3226cf2a73cSMauro Carvalho Chehab			- 'a' - active write-back journal device.
3236cf2a73cSMauro Carvalho Chehab			- 'D' - dead journal device.
3246cf2a73cSMauro Carvalho Chehab			- '-' - no journal device.
3256cf2a73cSMauro Carvalho Chehab	=============== =========================================================
3266cf2a73cSMauro Carvalho Chehab
3276cf2a73cSMauro Carvalho Chehab
3286cf2a73cSMauro Carvalho ChehabMessage Interface
3296cf2a73cSMauro Carvalho Chehab-----------------
3306cf2a73cSMauro Carvalho ChehabThe dm-raid target will accept certain actions through the 'message' interface.
3316cf2a73cSMauro Carvalho Chehab('man dmsetup' for more information on the message interface.)  These actions
3326cf2a73cSMauro Carvalho Chehabinclude:
3336cf2a73cSMauro Carvalho Chehab
3346cf2a73cSMauro Carvalho Chehab	========= ================================================
3356cf2a73cSMauro Carvalho Chehab	"idle"    Halt the current sync action.
3366cf2a73cSMauro Carvalho Chehab	"frozen"  Freeze the current sync action.
3376cf2a73cSMauro Carvalho Chehab	"resync"  Initiate/continue a resync.
3386cf2a73cSMauro Carvalho Chehab	"recover" Initiate/continue a recover process.
3396cf2a73cSMauro Carvalho Chehab	"check"   Initiate a check (i.e. a "scrub") of the array.
3406cf2a73cSMauro Carvalho Chehab	"repair"  Initiate a repair of the array.
3416cf2a73cSMauro Carvalho Chehab	========= ================================================
3426cf2a73cSMauro Carvalho Chehab
3436cf2a73cSMauro Carvalho Chehab
3446cf2a73cSMauro Carvalho ChehabDiscard Support
3456cf2a73cSMauro Carvalho Chehab---------------
3466cf2a73cSMauro Carvalho ChehabThe implementation of discard support among hardware vendors varies.
3476cf2a73cSMauro Carvalho ChehabWhen a block is discarded, some storage devices will return zeroes when
3486cf2a73cSMauro Carvalho Chehabthe block is read.  These devices set the 'discard_zeroes_data'
3496cf2a73cSMauro Carvalho Chehabattribute.  Other devices will return random data.  Confusingly, some
3506cf2a73cSMauro Carvalho Chehabdevices that advertise 'discard_zeroes_data' will not reliably return
3516cf2a73cSMauro Carvalho Chehabzeroes when discarded blocks are read!  Since RAID 4/5/6 uses blocks
3526cf2a73cSMauro Carvalho Chehabfrom a number of devices to calculate parity blocks and (for performance
3536cf2a73cSMauro Carvalho Chehabreasons) relies on 'discard_zeroes_data' being reliable, it is important
3546cf2a73cSMauro Carvalho Chehabthat the devices be consistent.  Blocks may be discarded in the middle
3556cf2a73cSMauro Carvalho Chehabof a RAID 4/5/6 stripe and if subsequent read results are not
3566cf2a73cSMauro Carvalho Chehabconsistent, the parity blocks may be calculated differently at any time;
3576cf2a73cSMauro Carvalho Chehabmaking the parity blocks useless for redundancy.  It is important to
3586cf2a73cSMauro Carvalho Chehabunderstand how your hardware behaves with discards if you are going to
3596cf2a73cSMauro Carvalho Chehabenable discards with RAID 4/5/6.
3606cf2a73cSMauro Carvalho Chehab
3616cf2a73cSMauro Carvalho ChehabSince the behavior of storage devices is unreliable in this respect,
3626cf2a73cSMauro Carvalho Chehabeven when reporting 'discard_zeroes_data', by default RAID 4/5/6
3636cf2a73cSMauro Carvalho Chehabdiscard support is disabled -- this ensures data integrity at the
3646cf2a73cSMauro Carvalho Chehabexpense of losing some performance.
3656cf2a73cSMauro Carvalho Chehab
3666cf2a73cSMauro Carvalho ChehabStorage devices that properly support 'discard_zeroes_data' are
3676cf2a73cSMauro Carvalho Chehabincreasingly whitelisted in the kernel and can thus be trusted.
3686cf2a73cSMauro Carvalho Chehab
3696cf2a73cSMauro Carvalho ChehabFor trusted devices, the following dm-raid module parameter can be set
3706cf2a73cSMauro Carvalho Chehabto safely enable discard support for RAID 4/5/6:
3716cf2a73cSMauro Carvalho Chehab
3726cf2a73cSMauro Carvalho Chehab    'devices_handle_discards_safely'
3736cf2a73cSMauro Carvalho Chehab
3746cf2a73cSMauro Carvalho Chehab
3756cf2a73cSMauro Carvalho ChehabVersion History
3766cf2a73cSMauro Carvalho Chehab---------------
3776cf2a73cSMauro Carvalho Chehab
3786cf2a73cSMauro Carvalho Chehab::
3796cf2a73cSMauro Carvalho Chehab
3806cf2a73cSMauro Carvalho Chehab 1.0.0	Initial version.  Support for RAID 4/5/6
3816cf2a73cSMauro Carvalho Chehab 1.1.0	Added support for RAID 1
3826cf2a73cSMauro Carvalho Chehab 1.2.0	Handle creation of arrays that contain failed devices.
3836cf2a73cSMauro Carvalho Chehab 1.3.0	Added support for RAID 10
3846cf2a73cSMauro Carvalho Chehab 1.3.1	Allow device replacement/rebuild for RAID 10
3856cf2a73cSMauro Carvalho Chehab 1.3.2	Fix/improve redundancy checking for RAID10
3866cf2a73cSMauro Carvalho Chehab 1.4.0	Non-functional change.  Removes arg from mapping function.
3876cf2a73cSMauro Carvalho Chehab 1.4.1	RAID10 fix redundancy validation checks (commit 55ebbb5).
3886cf2a73cSMauro Carvalho Chehab 1.4.2	Add RAID10 "far" and "offset" algorithm support.
3896cf2a73cSMauro Carvalho Chehab 1.5.0	Add message interface to allow manipulation of the sync_action.
3906cf2a73cSMauro Carvalho Chehab	New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
3916cf2a73cSMauro Carvalho Chehab 1.5.1	Add ability to restore transiently failed devices on resume.
3926cf2a73cSMauro Carvalho Chehab 1.5.2	'mismatch_cnt' is zero unless [last_]sync_action is "check".
3936cf2a73cSMauro Carvalho Chehab 1.6.0	Add discard support (and devices_handle_discard_safely module param).
3946cf2a73cSMauro Carvalho Chehab 1.7.0	Add support for MD RAID0 mappings.
3956cf2a73cSMauro Carvalho Chehab 1.8.0	Explicitly check for compatible flags in the superblock metadata
3966cf2a73cSMauro Carvalho Chehab	and reject to start the raid set if any are set by a newer
3976cf2a73cSMauro Carvalho Chehab	target version, thus avoiding data corruption on a raid set
3986cf2a73cSMauro Carvalho Chehab	with a reshape in progress.
3996cf2a73cSMauro Carvalho Chehab 1.9.0	Add support for RAID level takeover/reshape/region size
4006cf2a73cSMauro Carvalho Chehab	and set size reduction.
4016cf2a73cSMauro Carvalho Chehab 1.9.1	Fix activation of existing RAID 4/10 mapped devices
4026cf2a73cSMauro Carvalho Chehab 1.9.2	Don't emit '- -' on the status table line in case the constructor
4036cf2a73cSMauro Carvalho Chehab	fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
4046cf2a73cSMauro Carvalho Chehab	'D' on the status line.  If '- -' is passed into the constructor, emit
4056cf2a73cSMauro Carvalho Chehab	'- -' on the table line and '-' as the status line health character.
4066cf2a73cSMauro Carvalho Chehab 1.10.0	Add support for raid4/5/6 journal device
4076cf2a73cSMauro Carvalho Chehab 1.10.1	Fix data corruption on reshape request
4086cf2a73cSMauro Carvalho Chehab 1.11.0	Fix table line argument order
4096cf2a73cSMauro Carvalho Chehab	(wrong raid10_copies/raid10_format sequence)
4106cf2a73cSMauro Carvalho Chehab 1.11.1	Add raid4/5/6 journal write-back support via journal_mode option
4116cf2a73cSMauro Carvalho Chehab 1.12.1	Fix for MD deadlock between mddev_suspend() and md_write_start() available
4126cf2a73cSMauro Carvalho Chehab 1.13.0	Fix dev_health status at end of "recover" (was 'a', now 'A')
4136cf2a73cSMauro Carvalho Chehab 1.13.1	Fix deadlock caused by early md_stop_writes().  Also fix size an
4146cf2a73cSMauro Carvalho Chehab	state races.
4156cf2a73cSMauro Carvalho Chehab 1.13.2	Fix raid redundancy validation and avoid keeping raid set frozen
4166cf2a73cSMauro Carvalho Chehab 1.14.0	Fix reshape race on small devices.  Fix stripe adding reshape
4176cf2a73cSMauro Carvalho Chehab	deadlock/potential data corruption.  Update superblock when
4186cf2a73cSMauro Carvalho Chehab	specific devices are requested via rebuild.  Fix RAID leg
4196cf2a73cSMauro Carvalho Chehab	rebuild errors.
42099273d9eSHeinz Mauelshagen 1.15.0 Fix size extensions not being synchronized in case of new MD bitmap
421*751d5b27SAndrew Klychkov        pages allocated;  also fix those not occurring after previous reductions
42243f3952aSHeinz Mauelshagen 1.15.1 Fix argument count and arguments for rebuild/write_mostly/journal_(dev|mode)
42343f3952aSHeinz Mauelshagen        on the status line.
424