16cf2a73cSMauro Carvalho Chehab======= 26cf2a73cSMauro Carvalho Chehabdm-raid 36cf2a73cSMauro Carvalho Chehab======= 46cf2a73cSMauro Carvalho Chehab 56cf2a73cSMauro Carvalho ChehabThe device-mapper RAID (dm-raid) target provides a bridge from DM to MD. 66cf2a73cSMauro Carvalho ChehabIt allows the MD RAID drivers to be accessed using a device-mapper 76cf2a73cSMauro Carvalho Chehabinterface. 86cf2a73cSMauro Carvalho Chehab 96cf2a73cSMauro Carvalho Chehab 106cf2a73cSMauro Carvalho ChehabMapping Table Interface 116cf2a73cSMauro Carvalho Chehab----------------------- 126cf2a73cSMauro Carvalho ChehabThe target is named "raid" and it accepts the following parameters:: 136cf2a73cSMauro Carvalho Chehab 146cf2a73cSMauro Carvalho Chehab <raid_type> <#raid_params> <raid_params> \ 156cf2a73cSMauro Carvalho Chehab <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>] 166cf2a73cSMauro Carvalho Chehab 176cf2a73cSMauro Carvalho Chehab<raid_type>: 186cf2a73cSMauro Carvalho Chehab 196cf2a73cSMauro Carvalho Chehab ============= =============================================================== 206cf2a73cSMauro Carvalho Chehab raid0 RAID0 striping (no resilience) 216cf2a73cSMauro Carvalho Chehab raid1 RAID1 mirroring 226cf2a73cSMauro Carvalho Chehab raid4 RAID4 with dedicated last parity disk 236cf2a73cSMauro Carvalho Chehab raid5_n RAID5 with dedicated last parity disk supporting takeover 246cf2a73cSMauro Carvalho Chehab Same as raid4 256cf2a73cSMauro Carvalho Chehab 266cf2a73cSMauro Carvalho Chehab - Transitory layout 276cf2a73cSMauro Carvalho Chehab raid5_la RAID5 left asymmetric 286cf2a73cSMauro Carvalho Chehab 296cf2a73cSMauro Carvalho Chehab - rotating parity 0 with data continuation 306cf2a73cSMauro Carvalho Chehab raid5_ra RAID5 right asymmetric 316cf2a73cSMauro Carvalho Chehab 326cf2a73cSMauro Carvalho Chehab - rotating parity N with data continuation 336cf2a73cSMauro Carvalho Chehab raid5_ls RAID5 left symmetric 346cf2a73cSMauro Carvalho Chehab 356cf2a73cSMauro Carvalho Chehab - rotating parity 0 with data restart 366cf2a73cSMauro Carvalho Chehab raid5_rs RAID5 right symmetric 376cf2a73cSMauro Carvalho Chehab 386cf2a73cSMauro Carvalho Chehab - rotating parity N with data restart 396cf2a73cSMauro Carvalho Chehab raid6_zr RAID6 zero restart 406cf2a73cSMauro Carvalho Chehab 416cf2a73cSMauro Carvalho Chehab - rotating parity zero (left-to-right) with data restart 426cf2a73cSMauro Carvalho Chehab raid6_nr RAID6 N restart 436cf2a73cSMauro Carvalho Chehab 446cf2a73cSMauro Carvalho Chehab - rotating parity N (right-to-left) with data restart 456cf2a73cSMauro Carvalho Chehab raid6_nc RAID6 N continue 466cf2a73cSMauro Carvalho Chehab 476cf2a73cSMauro Carvalho Chehab - rotating parity N (right-to-left) with data continuation 486cf2a73cSMauro Carvalho Chehab raid6_n_6 RAID6 with dedicate parity disks 496cf2a73cSMauro Carvalho Chehab 506cf2a73cSMauro Carvalho Chehab - parity and Q-syndrome on the last 2 disks; 516cf2a73cSMauro Carvalho Chehab layout for takeover from/to raid4/raid5_n 526cf2a73cSMauro Carvalho Chehab raid6_la_6 Same as "raid_la" plus dedicated last Q-syndrome disk 536cf2a73cSMauro Carvalho Chehab 546cf2a73cSMauro Carvalho Chehab - layout for takeover from raid5_la from/to raid6 556cf2a73cSMauro Carvalho Chehab raid6_ra_6 Same as "raid5_ra" dedicated last Q-syndrome disk 566cf2a73cSMauro Carvalho Chehab 576cf2a73cSMauro Carvalho Chehab - layout for takeover from raid5_ra from/to raid6 586cf2a73cSMauro Carvalho Chehab raid6_ls_6 Same as "raid5_ls" dedicated last Q-syndrome disk 596cf2a73cSMauro Carvalho Chehab 606cf2a73cSMauro Carvalho Chehab - layout for takeover from raid5_ls from/to raid6 616cf2a73cSMauro Carvalho Chehab raid6_rs_6 Same as "raid5_rs" dedicated last Q-syndrome disk 626cf2a73cSMauro Carvalho Chehab 636cf2a73cSMauro Carvalho Chehab - layout for takeover from raid5_rs from/to raid6 646cf2a73cSMauro Carvalho Chehab raid10 Various RAID10 inspired algorithms chosen by additional params 656cf2a73cSMauro Carvalho Chehab (see raid10_format and raid10_copies below) 666cf2a73cSMauro Carvalho Chehab 676cf2a73cSMauro Carvalho Chehab - RAID10: Striped Mirrors (aka 'Striping on top of mirrors') 686cf2a73cSMauro Carvalho Chehab - RAID1E: Integrated Adjacent Stripe Mirroring 696cf2a73cSMauro Carvalho Chehab - RAID1E: Integrated Offset Stripe Mirroring 706cf2a73cSMauro Carvalho Chehab - and other similar RAID10 variants 716cf2a73cSMauro Carvalho Chehab ============= =============================================================== 726cf2a73cSMauro Carvalho Chehab 736cf2a73cSMauro Carvalho Chehab Reference: Chapter 4 of 746f3bc22bSAlexander A. Klimov https://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf 756cf2a73cSMauro Carvalho Chehab 766cf2a73cSMauro Carvalho Chehab<#raid_params>: The number of parameters that follow. 776cf2a73cSMauro Carvalho Chehab 786cf2a73cSMauro Carvalho Chehab<raid_params> consists of 796cf2a73cSMauro Carvalho Chehab 806cf2a73cSMauro Carvalho Chehab Mandatory parameters: 816cf2a73cSMauro Carvalho Chehab <chunk_size>: 826cf2a73cSMauro Carvalho Chehab Chunk size in sectors. This parameter is often known as 836cf2a73cSMauro Carvalho Chehab "stripe size". It is the only mandatory parameter and 846cf2a73cSMauro Carvalho Chehab is placed first. 856cf2a73cSMauro Carvalho Chehab 866cf2a73cSMauro Carvalho Chehab followed by optional parameters (in any order): 876cf2a73cSMauro Carvalho Chehab [sync|nosync] 886cf2a73cSMauro Carvalho Chehab Force or prevent RAID initialization. 896cf2a73cSMauro Carvalho Chehab 906cf2a73cSMauro Carvalho Chehab [rebuild <idx>] 916cf2a73cSMauro Carvalho Chehab Rebuild drive number 'idx' (first drive is 0). 926cf2a73cSMauro Carvalho Chehab 936cf2a73cSMauro Carvalho Chehab [daemon_sleep <ms>] 946cf2a73cSMauro Carvalho Chehab Interval between runs of the bitmap daemon that 956cf2a73cSMauro Carvalho Chehab clear bits. A longer interval means less bitmap I/O but 966cf2a73cSMauro Carvalho Chehab resyncing after a failure is likely to take longer. 976cf2a73cSMauro Carvalho Chehab 986cf2a73cSMauro Carvalho Chehab [min_recovery_rate <kB/sec/disk>] 996cf2a73cSMauro Carvalho Chehab Throttle RAID initialization 1006cf2a73cSMauro Carvalho Chehab [max_recovery_rate <kB/sec/disk>] 1016cf2a73cSMauro Carvalho Chehab Throttle RAID initialization 1026cf2a73cSMauro Carvalho Chehab [write_mostly <idx>] 1036cf2a73cSMauro Carvalho Chehab Mark drive index 'idx' write-mostly. 1046cf2a73cSMauro Carvalho Chehab [max_write_behind <sectors>] 1056cf2a73cSMauro Carvalho Chehab See '--write-behind=' (man mdadm) 1066cf2a73cSMauro Carvalho Chehab [stripe_cache <sectors>] 1076cf2a73cSMauro Carvalho Chehab Stripe cache size (RAID 4/5/6 only) 1086cf2a73cSMauro Carvalho Chehab [region_size <sectors>] 1096cf2a73cSMauro Carvalho Chehab The region_size multiplied by the number of regions is the 1106cf2a73cSMauro Carvalho Chehab logical size of the array. The bitmap records the device 1116cf2a73cSMauro Carvalho Chehab synchronisation state for each region. 1126cf2a73cSMauro Carvalho Chehab 1136cf2a73cSMauro Carvalho Chehab [raid10_copies <# copies>], [raid10_format <near|far|offset>] 1146cf2a73cSMauro Carvalho Chehab These two options are used to alter the default layout of 1156cf2a73cSMauro Carvalho Chehab a RAID10 configuration. The number of copies is can be 1166cf2a73cSMauro Carvalho Chehab specified, but the default is 2. There are also three 1176cf2a73cSMauro Carvalho Chehab variations to how the copies are laid down - the default 1186cf2a73cSMauro Carvalho Chehab is "near". Near copies are what most people think of with 1196cf2a73cSMauro Carvalho Chehab respect to mirroring. If these options are left unspecified, 1206cf2a73cSMauro Carvalho Chehab or 'raid10_copies 2' and/or 'raid10_format near' are given, 1216cf2a73cSMauro Carvalho Chehab then the layouts for 2, 3 and 4 devices are: 1226cf2a73cSMauro Carvalho Chehab 1236cf2a73cSMauro Carvalho Chehab ======== ========== ============== 1246cf2a73cSMauro Carvalho Chehab 2 drives 3 drives 4 drives 1256cf2a73cSMauro Carvalho Chehab ======== ========== ============== 1266cf2a73cSMauro Carvalho Chehab A1 A1 A1 A1 A2 A1 A1 A2 A2 1276cf2a73cSMauro Carvalho Chehab A2 A2 A2 A3 A3 A3 A3 A4 A4 1286cf2a73cSMauro Carvalho Chehab A3 A3 A4 A4 A5 A5 A5 A6 A6 1296cf2a73cSMauro Carvalho Chehab A4 A4 A5 A6 A6 A7 A7 A8 A8 1306cf2a73cSMauro Carvalho Chehab .. .. .. .. .. .. .. .. .. 1316cf2a73cSMauro Carvalho Chehab ======== ========== ============== 1326cf2a73cSMauro Carvalho Chehab 1336cf2a73cSMauro Carvalho Chehab The 2-device layout is equivalent 2-way RAID1. The 4-device 1346cf2a73cSMauro Carvalho Chehab layout is what a traditional RAID10 would look like. The 1356cf2a73cSMauro Carvalho Chehab 3-device layout is what might be called a 'RAID1E - Integrated 1366cf2a73cSMauro Carvalho Chehab Adjacent Stripe Mirroring'. 1376cf2a73cSMauro Carvalho Chehab 1386cf2a73cSMauro Carvalho Chehab If 'raid10_copies 2' and 'raid10_format far', then the layouts 1396cf2a73cSMauro Carvalho Chehab for 2, 3 and 4 devices are: 1406cf2a73cSMauro Carvalho Chehab 1416cf2a73cSMauro Carvalho Chehab ======== ============ =================== 1426cf2a73cSMauro Carvalho Chehab 2 drives 3 drives 4 drives 1436cf2a73cSMauro Carvalho Chehab ======== ============ =================== 1446cf2a73cSMauro Carvalho Chehab A1 A2 A1 A2 A3 A1 A2 A3 A4 1456cf2a73cSMauro Carvalho Chehab A3 A4 A4 A5 A6 A5 A6 A7 A8 1466cf2a73cSMauro Carvalho Chehab A5 A6 A7 A8 A9 A9 A10 A11 A12 1476cf2a73cSMauro Carvalho Chehab .. .. .. .. .. .. .. .. .. 1486cf2a73cSMauro Carvalho Chehab A2 A1 A3 A1 A2 A2 A1 A4 A3 1496cf2a73cSMauro Carvalho Chehab A4 A3 A6 A4 A5 A6 A5 A8 A7 1506cf2a73cSMauro Carvalho Chehab A6 A5 A9 A7 A8 A10 A9 A12 A11 1516cf2a73cSMauro Carvalho Chehab .. .. .. .. .. .. .. .. .. 1526cf2a73cSMauro Carvalho Chehab ======== ============ =================== 1536cf2a73cSMauro Carvalho Chehab 1546cf2a73cSMauro Carvalho Chehab If 'raid10_copies 2' and 'raid10_format offset', then the 1556cf2a73cSMauro Carvalho Chehab layouts for 2, 3 and 4 devices are: 1566cf2a73cSMauro Carvalho Chehab 1576cf2a73cSMauro Carvalho Chehab ======== ========== ================ 1586cf2a73cSMauro Carvalho Chehab 2 drives 3 drives 4 drives 1596cf2a73cSMauro Carvalho Chehab ======== ========== ================ 1606cf2a73cSMauro Carvalho Chehab A1 A2 A1 A2 A3 A1 A2 A3 A4 1616cf2a73cSMauro Carvalho Chehab A2 A1 A3 A1 A2 A2 A1 A4 A3 1626cf2a73cSMauro Carvalho Chehab A3 A4 A4 A5 A6 A5 A6 A7 A8 1636cf2a73cSMauro Carvalho Chehab A4 A3 A6 A4 A5 A6 A5 A8 A7 1646cf2a73cSMauro Carvalho Chehab A5 A6 A7 A8 A9 A9 A10 A11 A12 1656cf2a73cSMauro Carvalho Chehab A6 A5 A9 A7 A8 A10 A9 A12 A11 1666cf2a73cSMauro Carvalho Chehab .. .. .. .. .. .. .. .. .. 1676cf2a73cSMauro Carvalho Chehab ======== ========== ================ 1686cf2a73cSMauro Carvalho Chehab 1696cf2a73cSMauro Carvalho Chehab Here we see layouts closely akin to 'RAID1E - Integrated 1706cf2a73cSMauro Carvalho Chehab Offset Stripe Mirroring'. 1716cf2a73cSMauro Carvalho Chehab 1726cf2a73cSMauro Carvalho Chehab [delta_disks <N>] 1736cf2a73cSMauro Carvalho Chehab The delta_disks option value (-251 < N < +251) triggers 1746cf2a73cSMauro Carvalho Chehab device removal (negative value) or device addition (positive 1756cf2a73cSMauro Carvalho Chehab value) to any reshape supporting raid levels 4/5/6 and 10. 1766cf2a73cSMauro Carvalho Chehab RAID levels 4/5/6 allow for addition of devices (metadata 1776cf2a73cSMauro Carvalho Chehab and data device tuple), raid10_near and raid10_offset only 1786cf2a73cSMauro Carvalho Chehab allow for device addition. raid10_far does not support any 1796cf2a73cSMauro Carvalho Chehab reshaping at all. 1806cf2a73cSMauro Carvalho Chehab A minimum of devices have to be kept to enforce resilience, 1816cf2a73cSMauro Carvalho Chehab which is 3 devices for raid4/5 and 4 devices for raid6. 1826cf2a73cSMauro Carvalho Chehab 1836cf2a73cSMauro Carvalho Chehab [data_offset <sectors>] 1846cf2a73cSMauro Carvalho Chehab This option value defines the offset into each data device 1856cf2a73cSMauro Carvalho Chehab where the data starts. This is used to provide out-of-place 1866cf2a73cSMauro Carvalho Chehab reshaping space to avoid writing over data while 1876cf2a73cSMauro Carvalho Chehab changing the layout of stripes, hence an interruption/crash 1886cf2a73cSMauro Carvalho Chehab may happen at any time without the risk of losing data. 1896cf2a73cSMauro Carvalho Chehab E.g. when adding devices to an existing raid set during 1906cf2a73cSMauro Carvalho Chehab forward reshaping, the out-of-place space will be allocated 1916cf2a73cSMauro Carvalho Chehab at the beginning of each raid device. The kernel raid4/5/6/10 1926cf2a73cSMauro Carvalho Chehab MD personalities supporting such device addition will read the data from 1936cf2a73cSMauro Carvalho Chehab the existing first stripes (those with smaller number of stripes) 1946cf2a73cSMauro Carvalho Chehab starting at data_offset to fill up a new stripe with the larger 1956cf2a73cSMauro Carvalho Chehab number of stripes, calculate the redundancy blocks (CRC/Q-syndrome) 1966cf2a73cSMauro Carvalho Chehab and write that new stripe to offset 0. Same will be applied to all 1976cf2a73cSMauro Carvalho Chehab N-1 other new stripes. This out-of-place scheme is used to change 1986cf2a73cSMauro Carvalho Chehab the RAID type (i.e. the allocation algorithm) as well, e.g. 1996cf2a73cSMauro Carvalho Chehab changing from raid5_ls to raid5_n. 2006cf2a73cSMauro Carvalho Chehab 2016cf2a73cSMauro Carvalho Chehab [journal_dev <dev>] 2026cf2a73cSMauro Carvalho Chehab This option adds a journal device to raid4/5/6 raid sets and 2036cf2a73cSMauro Carvalho Chehab uses it to close the 'write hole' caused by the non-atomic updates 2046cf2a73cSMauro Carvalho Chehab to the component devices which can cause data loss during recovery. 2056cf2a73cSMauro Carvalho Chehab The journal device is used as writethrough thus causing writes to 2066cf2a73cSMauro Carvalho Chehab be throttled versus non-journaled raid4/5/6 sets. 2076cf2a73cSMauro Carvalho Chehab Takeover/reshape is not possible with a raid4/5/6 journal device; 2086cf2a73cSMauro Carvalho Chehab it has to be deconfigured before requesting these. 2096cf2a73cSMauro Carvalho Chehab 2106cf2a73cSMauro Carvalho Chehab [journal_mode <mode>] 2116cf2a73cSMauro Carvalho Chehab This option sets the caching mode on journaled raid4/5/6 raid sets 2126cf2a73cSMauro Carvalho Chehab (see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'. 2136cf2a73cSMauro Carvalho Chehab If 'writeback' is selected the journal device has to be resilient 2146cf2a73cSMauro Carvalho Chehab and must not suffer from the 'write hole' problem itself (e.g. use 2156cf2a73cSMauro Carvalho Chehab raid1 or raid10) to avoid a single point of failure. 2166cf2a73cSMauro Carvalho Chehab 2176cf2a73cSMauro Carvalho Chehab<#raid_devs>: The number of devices composing the array. 2186cf2a73cSMauro Carvalho Chehab Each device consists of two entries. The first is the device 2196cf2a73cSMauro Carvalho Chehab containing the metadata (if any); the second is the one containing the 2206cf2a73cSMauro Carvalho Chehab data. A Maximum of 64 metadata/data device entries are supported 2216cf2a73cSMauro Carvalho Chehab up to target version 1.8.0. 2226cf2a73cSMauro Carvalho Chehab 1.9.0 supports up to 253 which is enforced by the used MD kernel runtime. 2236cf2a73cSMauro Carvalho Chehab 2246cf2a73cSMauro Carvalho Chehab If a drive has failed or is missing at creation time, a '-' can be 2256cf2a73cSMauro Carvalho Chehab given for both the metadata and data drives for a given position. 2266cf2a73cSMauro Carvalho Chehab 2276cf2a73cSMauro Carvalho Chehab 2286cf2a73cSMauro Carvalho ChehabExample Tables 2296cf2a73cSMauro Carvalho Chehab-------------- 2306cf2a73cSMauro Carvalho Chehab 2316cf2a73cSMauro Carvalho Chehab:: 2326cf2a73cSMauro Carvalho Chehab 2336cf2a73cSMauro Carvalho Chehab # RAID4 - 4 data drives, 1 parity (no metadata devices) 2346cf2a73cSMauro Carvalho Chehab # No metadata devices specified to hold superblock/bitmap info 2356cf2a73cSMauro Carvalho Chehab # Chunk size of 1MiB 2366cf2a73cSMauro Carvalho Chehab # (Lines separated for easy reading) 2376cf2a73cSMauro Carvalho Chehab 2386cf2a73cSMauro Carvalho Chehab 0 1960893648 raid \ 2396cf2a73cSMauro Carvalho Chehab raid4 1 2048 \ 2406cf2a73cSMauro Carvalho Chehab 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81 2416cf2a73cSMauro Carvalho Chehab 2426cf2a73cSMauro Carvalho Chehab # RAID4 - 4 data drives, 1 parity (with metadata devices) 2436cf2a73cSMauro Carvalho Chehab # Chunk size of 1MiB, force RAID initialization, 2446cf2a73cSMauro Carvalho Chehab # min recovery rate at 20 kiB/sec/disk 2456cf2a73cSMauro Carvalho Chehab 2466cf2a73cSMauro Carvalho Chehab 0 1960893648 raid \ 2476cf2a73cSMauro Carvalho Chehab raid4 4 2048 sync min_recovery_rate 20 \ 2486cf2a73cSMauro Carvalho Chehab 5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82 2496cf2a73cSMauro Carvalho Chehab 2506cf2a73cSMauro Carvalho Chehab 2516cf2a73cSMauro Carvalho ChehabStatus Output 2526cf2a73cSMauro Carvalho Chehab------------- 2536cf2a73cSMauro Carvalho Chehab'dmsetup table' displays the table used to construct the mapping. 2546cf2a73cSMauro Carvalho ChehabThe optional parameters are always printed in the order listed 2556cf2a73cSMauro Carvalho Chehababove with "sync" or "nosync" always output ahead of the other 2566cf2a73cSMauro Carvalho Chehabarguments, regardless of the order used when originally loading the table. 2576cf2a73cSMauro Carvalho ChehabArguments that can be repeated are ordered by value. 2586cf2a73cSMauro Carvalho Chehab 2596cf2a73cSMauro Carvalho Chehab 2606cf2a73cSMauro Carvalho Chehab'dmsetup status' yields information on the state and health of the array. 2616cf2a73cSMauro Carvalho ChehabThe output is as follows (normally a single line, but expanded here for 2626cf2a73cSMauro Carvalho Chehabclarity):: 2636cf2a73cSMauro Carvalho Chehab 2646cf2a73cSMauro Carvalho Chehab 1: <s> <l> raid \ 2656cf2a73cSMauro Carvalho Chehab 2: <raid_type> <#devices> <health_chars> \ 2666cf2a73cSMauro Carvalho Chehab 3: <sync_ratio> <sync_action> <mismatch_cnt> 2676cf2a73cSMauro Carvalho Chehab 2686cf2a73cSMauro Carvalho ChehabLine 1 is the standard output produced by device-mapper. 2696cf2a73cSMauro Carvalho Chehab 2706cf2a73cSMauro Carvalho ChehabLine 2 & 3 are produced by the raid target and are best explained by example:: 2716cf2a73cSMauro Carvalho Chehab 2726cf2a73cSMauro Carvalho Chehab 0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0 2736cf2a73cSMauro Carvalho Chehab 2746cf2a73cSMauro Carvalho ChehabHere we can see the RAID type is raid4, there are 5 devices - all of 2756cf2a73cSMauro Carvalho Chehabwhich are 'A'live, and the array is 2/490221568 complete with its initial 2766cf2a73cSMauro Carvalho Chehabrecovery. Here is a fuller description of the individual fields: 2776cf2a73cSMauro Carvalho Chehab 2786cf2a73cSMauro Carvalho Chehab =============== ========================================================= 2796cf2a73cSMauro Carvalho Chehab <raid_type> Same as the <raid_type> used to create the array. 2806cf2a73cSMauro Carvalho Chehab <health_chars> One char for each device, indicating: 2816cf2a73cSMauro Carvalho Chehab 2826cf2a73cSMauro Carvalho Chehab - 'A' = alive and in-sync 2836cf2a73cSMauro Carvalho Chehab - 'a' = alive but not in-sync 2846cf2a73cSMauro Carvalho Chehab - 'D' = dead/failed. 2856cf2a73cSMauro Carvalho Chehab <sync_ratio> The ratio indicating how much of the array has undergone 2866cf2a73cSMauro Carvalho Chehab the process described by 'sync_action'. If the 2876cf2a73cSMauro Carvalho Chehab 'sync_action' is "check" or "repair", then the process 2886cf2a73cSMauro Carvalho Chehab of "resync" or "recover" can be considered complete. 2896cf2a73cSMauro Carvalho Chehab <sync_action> One of the following possible states: 2906cf2a73cSMauro Carvalho Chehab 2916cf2a73cSMauro Carvalho Chehab idle 2926cf2a73cSMauro Carvalho Chehab - No synchronization action is being performed. 2936cf2a73cSMauro Carvalho Chehab frozen 2946cf2a73cSMauro Carvalho Chehab - The current action has been halted. 2956cf2a73cSMauro Carvalho Chehab resync 2966cf2a73cSMauro Carvalho Chehab - Array is undergoing its initial synchronization 2976cf2a73cSMauro Carvalho Chehab or is resynchronizing after an unclean shutdown 2986cf2a73cSMauro Carvalho Chehab (possibly aided by a bitmap). 2996cf2a73cSMauro Carvalho Chehab recover 3006cf2a73cSMauro Carvalho Chehab - A device in the array is being rebuilt or 3016cf2a73cSMauro Carvalho Chehab replaced. 3026cf2a73cSMauro Carvalho Chehab check 3036cf2a73cSMauro Carvalho Chehab - A user-initiated full check of the array is 3046cf2a73cSMauro Carvalho Chehab being performed. All blocks are read and 3056cf2a73cSMauro Carvalho Chehab checked for consistency. The number of 3066cf2a73cSMauro Carvalho Chehab discrepancies found are recorded in 3076cf2a73cSMauro Carvalho Chehab <mismatch_cnt>. No changes are made to the 3086cf2a73cSMauro Carvalho Chehab array by this action. 3096cf2a73cSMauro Carvalho Chehab repair 3106cf2a73cSMauro Carvalho Chehab - The same as "check", but discrepancies are 3116cf2a73cSMauro Carvalho Chehab corrected. 3126cf2a73cSMauro Carvalho Chehab reshape 3136cf2a73cSMauro Carvalho Chehab - The array is undergoing a reshape. 3146cf2a73cSMauro Carvalho Chehab <mismatch_cnt> The number of discrepancies found between mirror copies 3156cf2a73cSMauro Carvalho Chehab in RAID1/10 or wrong parity values found in RAID4/5/6. 3166cf2a73cSMauro Carvalho Chehab This value is valid only after a "check" of the array 3176cf2a73cSMauro Carvalho Chehab is performed. A healthy array has a 'mismatch_cnt' of 0. 3186cf2a73cSMauro Carvalho Chehab <data_offset> The current data offset to the start of the user data on 3196cf2a73cSMauro Carvalho Chehab each component device of a raid set (see the respective 3206cf2a73cSMauro Carvalho Chehab raid parameter to support out-of-place reshaping). 3216cf2a73cSMauro Carvalho Chehab <journal_char> - 'A' - active write-through journal device. 3226cf2a73cSMauro Carvalho Chehab - 'a' - active write-back journal device. 3236cf2a73cSMauro Carvalho Chehab - 'D' - dead journal device. 3246cf2a73cSMauro Carvalho Chehab - '-' - no journal device. 3256cf2a73cSMauro Carvalho Chehab =============== ========================================================= 3266cf2a73cSMauro Carvalho Chehab 3276cf2a73cSMauro Carvalho Chehab 3286cf2a73cSMauro Carvalho ChehabMessage Interface 3296cf2a73cSMauro Carvalho Chehab----------------- 3306cf2a73cSMauro Carvalho ChehabThe dm-raid target will accept certain actions through the 'message' interface. 3316cf2a73cSMauro Carvalho Chehab('man dmsetup' for more information on the message interface.) These actions 3326cf2a73cSMauro Carvalho Chehabinclude: 3336cf2a73cSMauro Carvalho Chehab 3346cf2a73cSMauro Carvalho Chehab ========= ================================================ 3356cf2a73cSMauro Carvalho Chehab "idle" Halt the current sync action. 3366cf2a73cSMauro Carvalho Chehab "frozen" Freeze the current sync action. 3376cf2a73cSMauro Carvalho Chehab "resync" Initiate/continue a resync. 3386cf2a73cSMauro Carvalho Chehab "recover" Initiate/continue a recover process. 3396cf2a73cSMauro Carvalho Chehab "check" Initiate a check (i.e. a "scrub") of the array. 3406cf2a73cSMauro Carvalho Chehab "repair" Initiate a repair of the array. 3416cf2a73cSMauro Carvalho Chehab ========= ================================================ 3426cf2a73cSMauro Carvalho Chehab 3436cf2a73cSMauro Carvalho Chehab 3446cf2a73cSMauro Carvalho ChehabDiscard Support 3456cf2a73cSMauro Carvalho Chehab--------------- 3466cf2a73cSMauro Carvalho ChehabThe implementation of discard support among hardware vendors varies. 3476cf2a73cSMauro Carvalho ChehabWhen a block is discarded, some storage devices will return zeroes when 3486cf2a73cSMauro Carvalho Chehabthe block is read. These devices set the 'discard_zeroes_data' 3496cf2a73cSMauro Carvalho Chehabattribute. Other devices will return random data. Confusingly, some 3506cf2a73cSMauro Carvalho Chehabdevices that advertise 'discard_zeroes_data' will not reliably return 3516cf2a73cSMauro Carvalho Chehabzeroes when discarded blocks are read! Since RAID 4/5/6 uses blocks 3526cf2a73cSMauro Carvalho Chehabfrom a number of devices to calculate parity blocks and (for performance 3536cf2a73cSMauro Carvalho Chehabreasons) relies on 'discard_zeroes_data' being reliable, it is important 3546cf2a73cSMauro Carvalho Chehabthat the devices be consistent. Blocks may be discarded in the middle 3556cf2a73cSMauro Carvalho Chehabof a RAID 4/5/6 stripe and if subsequent read results are not 3566cf2a73cSMauro Carvalho Chehabconsistent, the parity blocks may be calculated differently at any time; 3576cf2a73cSMauro Carvalho Chehabmaking the parity blocks useless for redundancy. It is important to 3586cf2a73cSMauro Carvalho Chehabunderstand how your hardware behaves with discards if you are going to 3596cf2a73cSMauro Carvalho Chehabenable discards with RAID 4/5/6. 3606cf2a73cSMauro Carvalho Chehab 3616cf2a73cSMauro Carvalho ChehabSince the behavior of storage devices is unreliable in this respect, 3626cf2a73cSMauro Carvalho Chehabeven when reporting 'discard_zeroes_data', by default RAID 4/5/6 3636cf2a73cSMauro Carvalho Chehabdiscard support is disabled -- this ensures data integrity at the 3646cf2a73cSMauro Carvalho Chehabexpense of losing some performance. 3656cf2a73cSMauro Carvalho Chehab 3666cf2a73cSMauro Carvalho ChehabStorage devices that properly support 'discard_zeroes_data' are 3676cf2a73cSMauro Carvalho Chehabincreasingly whitelisted in the kernel and can thus be trusted. 3686cf2a73cSMauro Carvalho Chehab 3696cf2a73cSMauro Carvalho ChehabFor trusted devices, the following dm-raid module parameter can be set 3706cf2a73cSMauro Carvalho Chehabto safely enable discard support for RAID 4/5/6: 3716cf2a73cSMauro Carvalho Chehab 3726cf2a73cSMauro Carvalho Chehab 'devices_handle_discards_safely' 3736cf2a73cSMauro Carvalho Chehab 3746cf2a73cSMauro Carvalho Chehab 3756cf2a73cSMauro Carvalho ChehabVersion History 3766cf2a73cSMauro Carvalho Chehab--------------- 3776cf2a73cSMauro Carvalho Chehab 3786cf2a73cSMauro Carvalho Chehab:: 3796cf2a73cSMauro Carvalho Chehab 3806cf2a73cSMauro Carvalho Chehab 1.0.0 Initial version. Support for RAID 4/5/6 3816cf2a73cSMauro Carvalho Chehab 1.1.0 Added support for RAID 1 3826cf2a73cSMauro Carvalho Chehab 1.2.0 Handle creation of arrays that contain failed devices. 3836cf2a73cSMauro Carvalho Chehab 1.3.0 Added support for RAID 10 3846cf2a73cSMauro Carvalho Chehab 1.3.1 Allow device replacement/rebuild for RAID 10 3856cf2a73cSMauro Carvalho Chehab 1.3.2 Fix/improve redundancy checking for RAID10 3866cf2a73cSMauro Carvalho Chehab 1.4.0 Non-functional change. Removes arg from mapping function. 3876cf2a73cSMauro Carvalho Chehab 1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5). 3886cf2a73cSMauro Carvalho Chehab 1.4.2 Add RAID10 "far" and "offset" algorithm support. 3896cf2a73cSMauro Carvalho Chehab 1.5.0 Add message interface to allow manipulation of the sync_action. 3906cf2a73cSMauro Carvalho Chehab New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt. 3916cf2a73cSMauro Carvalho Chehab 1.5.1 Add ability to restore transiently failed devices on resume. 3926cf2a73cSMauro Carvalho Chehab 1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check". 3936cf2a73cSMauro Carvalho Chehab 1.6.0 Add discard support (and devices_handle_discard_safely module param). 3946cf2a73cSMauro Carvalho Chehab 1.7.0 Add support for MD RAID0 mappings. 3956cf2a73cSMauro Carvalho Chehab 1.8.0 Explicitly check for compatible flags in the superblock metadata 3966cf2a73cSMauro Carvalho Chehab and reject to start the raid set if any are set by a newer 3976cf2a73cSMauro Carvalho Chehab target version, thus avoiding data corruption on a raid set 3986cf2a73cSMauro Carvalho Chehab with a reshape in progress. 3996cf2a73cSMauro Carvalho Chehab 1.9.0 Add support for RAID level takeover/reshape/region size 4006cf2a73cSMauro Carvalho Chehab and set size reduction. 4016cf2a73cSMauro Carvalho Chehab 1.9.1 Fix activation of existing RAID 4/10 mapped devices 4026cf2a73cSMauro Carvalho Chehab 1.9.2 Don't emit '- -' on the status table line in case the constructor 4036cf2a73cSMauro Carvalho Chehab fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and 4046cf2a73cSMauro Carvalho Chehab 'D' on the status line. If '- -' is passed into the constructor, emit 4056cf2a73cSMauro Carvalho Chehab '- -' on the table line and '-' as the status line health character. 4066cf2a73cSMauro Carvalho Chehab 1.10.0 Add support for raid4/5/6 journal device 4076cf2a73cSMauro Carvalho Chehab 1.10.1 Fix data corruption on reshape request 4086cf2a73cSMauro Carvalho Chehab 1.11.0 Fix table line argument order 4096cf2a73cSMauro Carvalho Chehab (wrong raid10_copies/raid10_format sequence) 4106cf2a73cSMauro Carvalho Chehab 1.11.1 Add raid4/5/6 journal write-back support via journal_mode option 4116cf2a73cSMauro Carvalho Chehab 1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available 4126cf2a73cSMauro Carvalho Chehab 1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A') 4136cf2a73cSMauro Carvalho Chehab 1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an 4146cf2a73cSMauro Carvalho Chehab state races. 4156cf2a73cSMauro Carvalho Chehab 1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen 4166cf2a73cSMauro Carvalho Chehab 1.14.0 Fix reshape race on small devices. Fix stripe adding reshape 4176cf2a73cSMauro Carvalho Chehab deadlock/potential data corruption. Update superblock when 4186cf2a73cSMauro Carvalho Chehab specific devices are requested via rebuild. Fix RAID leg 4196cf2a73cSMauro Carvalho Chehab rebuild errors. 42099273d9eSHeinz Mauelshagen 1.15.0 Fix size extensions not being synchronized in case of new MD bitmap 421*751d5b27SAndrew Klychkov pages allocated; also fix those not occurring after previous reductions 42243f3952aSHeinz Mauelshagen 1.15.1 Fix argument count and arguments for rebuild/write_mostly/journal_(dev|mode) 42343f3952aSHeinz Mauelshagen on the status line. 424