16cf2a73cSMauro Carvalho Chehab================================
26cf2a73cSMauro Carvalho ChehabDevice-mapper "unstriped" target
36cf2a73cSMauro Carvalho Chehab================================
46cf2a73cSMauro Carvalho Chehab
56cf2a73cSMauro Carvalho ChehabIntroduction
66cf2a73cSMauro Carvalho Chehab============
76cf2a73cSMauro Carvalho Chehab
86cf2a73cSMauro Carvalho ChehabThe device-mapper "unstriped" target provides a transparent mechanism to
96cf2a73cSMauro Carvalho Chehabunstripe a device-mapper "striped" target to access the underlying disks
106cf2a73cSMauro Carvalho Chehabwithout having to touch the true backing block-device.  It can also be
116cf2a73cSMauro Carvalho Chehabused to unstripe a hardware RAID-0 to access backing disks.
126cf2a73cSMauro Carvalho Chehab
136cf2a73cSMauro Carvalho ChehabParameters:
146cf2a73cSMauro Carvalho Chehab<number of stripes> <chunk size> <stripe #> <dev_path> <offset>
156cf2a73cSMauro Carvalho Chehab
166cf2a73cSMauro Carvalho Chehab<number of stripes>
176cf2a73cSMauro Carvalho Chehab        The number of stripes in the RAID 0.
186cf2a73cSMauro Carvalho Chehab
196cf2a73cSMauro Carvalho Chehab<chunk size>
206cf2a73cSMauro Carvalho Chehab	The amount of 512B sectors in the chunk striping.
216cf2a73cSMauro Carvalho Chehab
226cf2a73cSMauro Carvalho Chehab<dev_path>
236cf2a73cSMauro Carvalho Chehab	The block device you wish to unstripe.
246cf2a73cSMauro Carvalho Chehab
256cf2a73cSMauro Carvalho Chehab<stripe #>
266cf2a73cSMauro Carvalho Chehab        The stripe number within the device that corresponds to physical
276cf2a73cSMauro Carvalho Chehab        drive you wish to unstripe.  This must be 0 indexed.
286cf2a73cSMauro Carvalho Chehab
296cf2a73cSMauro Carvalho Chehab
306cf2a73cSMauro Carvalho ChehabWhy use this module?
316cf2a73cSMauro Carvalho Chehab====================
326cf2a73cSMauro Carvalho Chehab
336cf2a73cSMauro Carvalho ChehabAn example of undoing an existing dm-stripe
346cf2a73cSMauro Carvalho Chehab-------------------------------------------
356cf2a73cSMauro Carvalho Chehab
366cf2a73cSMauro Carvalho ChehabThis small bash script will setup 4 loop devices and use the existing
376cf2a73cSMauro Carvalho Chehabstriped target to combine the 4 devices into one.  It then will use
386cf2a73cSMauro Carvalho Chehabthe unstriped target on top of the striped device to access the
396cf2a73cSMauro Carvalho Chehabindividual backing loop devices.  We write data to the newly exposed
406cf2a73cSMauro Carvalho Chehabunstriped devices and verify the data written matches the correct
416cf2a73cSMauro Carvalho Chehabunderlying device on the striped array::
426cf2a73cSMauro Carvalho Chehab
436cf2a73cSMauro Carvalho Chehab  #!/bin/bash
446cf2a73cSMauro Carvalho Chehab
456cf2a73cSMauro Carvalho Chehab  MEMBER_SIZE=$((128 * 1024 * 1024))
466cf2a73cSMauro Carvalho Chehab  NUM=4
476cf2a73cSMauro Carvalho Chehab  SEQ_END=$((${NUM}-1))
486cf2a73cSMauro Carvalho Chehab  CHUNK=256
496cf2a73cSMauro Carvalho Chehab  BS=4096
506cf2a73cSMauro Carvalho Chehab
516cf2a73cSMauro Carvalho Chehab  RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
526cf2a73cSMauro Carvalho Chehab  DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
536cf2a73cSMauro Carvalho Chehab  COUNT=$((${MEMBER_SIZE} / ${BS}))
546cf2a73cSMauro Carvalho Chehab
556cf2a73cSMauro Carvalho Chehab  for i in $(seq 0 ${SEQ_END}); do
566cf2a73cSMauro Carvalho Chehab    dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
576cf2a73cSMauro Carvalho Chehab    losetup /dev/loop${i} member-${i}
586cf2a73cSMauro Carvalho Chehab    DM_PARMS+=" /dev/loop${i} 0"
596cf2a73cSMauro Carvalho Chehab  done
606cf2a73cSMauro Carvalho Chehab
616cf2a73cSMauro Carvalho Chehab  echo $DM_PARMS | dmsetup create raid0
626cf2a73cSMauro Carvalho Chehab  for i in $(seq 0 ${SEQ_END}); do
636cf2a73cSMauro Carvalho Chehab    echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i}
646cf2a73cSMauro Carvalho Chehab  done;
656cf2a73cSMauro Carvalho Chehab
666cf2a73cSMauro Carvalho Chehab  for i in $(seq 0 ${SEQ_END}); do
676cf2a73cSMauro Carvalho Chehab    dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct
686cf2a73cSMauro Carvalho Chehab    diff /dev/mapper/set-${i} member-${i}
696cf2a73cSMauro Carvalho Chehab  done;
706cf2a73cSMauro Carvalho Chehab
716cf2a73cSMauro Carvalho Chehab  for i in $(seq 0 ${SEQ_END}); do
726cf2a73cSMauro Carvalho Chehab    dmsetup remove set-${i}
736cf2a73cSMauro Carvalho Chehab  done
746cf2a73cSMauro Carvalho Chehab
756cf2a73cSMauro Carvalho Chehab  dmsetup remove raid0
766cf2a73cSMauro Carvalho Chehab
776cf2a73cSMauro Carvalho Chehab  for i in $(seq 0 ${SEQ_END}); do
786cf2a73cSMauro Carvalho Chehab    losetup -d /dev/loop${i}
796cf2a73cSMauro Carvalho Chehab    rm -f member-${i}
806cf2a73cSMauro Carvalho Chehab  done
816cf2a73cSMauro Carvalho Chehab
826cf2a73cSMauro Carvalho ChehabAnother example
836cf2a73cSMauro Carvalho Chehab---------------
846cf2a73cSMauro Carvalho Chehab
856cf2a73cSMauro Carvalho ChehabIntel NVMe drives contain two cores on the physical device.
866cf2a73cSMauro Carvalho ChehabEach core of the drive has segregated access to its LBA range.
876cf2a73cSMauro Carvalho ChehabThe current LBA model has a RAID 0 128k chunk on each core, resulting
886cf2a73cSMauro Carvalho Chehabin a 256k stripe across the two cores::
896cf2a73cSMauro Carvalho Chehab
906cf2a73cSMauro Carvalho Chehab   Core 0:       Core 1:
916cf2a73cSMauro Carvalho Chehab  __________    __________
926cf2a73cSMauro Carvalho Chehab  | LBA 512|    | LBA 768|
936cf2a73cSMauro Carvalho Chehab  | LBA 0  |    | LBA 256|
946cf2a73cSMauro Carvalho Chehab  ----------    ----------
956cf2a73cSMauro Carvalho Chehab
966cf2a73cSMauro Carvalho ChehabThe purpose of this unstriping is to provide better QoS in noisy
976cf2a73cSMauro Carvalho Chehabneighbor environments. When two partitions are created on the
986cf2a73cSMauro Carvalho Chehabaggregate drive without this unstriping, reads on one partition
996cf2a73cSMauro Carvalho Chehabcan affect writes on another partition.  This is because the partitions
1006cf2a73cSMauro Carvalho Chehabare striped across the two cores.  When we unstripe this hardware RAID 0
1016cf2a73cSMauro Carvalho Chehaband make partitions on each new exposed device the two partitions are now
1026cf2a73cSMauro Carvalho Chehabphysically separated.
1036cf2a73cSMauro Carvalho Chehab
1046cf2a73cSMauro Carvalho ChehabWith the dm-unstriped target we're able to segregate an fio script that
1056cf2a73cSMauro Carvalho Chehabhas read and write jobs that are independent of each other.  Compared to
1066cf2a73cSMauro Carvalho Chehabwhen we run the test on a combined drive with partitions, we were able
1076cf2a73cSMauro Carvalho Chehabto get a 92% reduction in read latency using this device mapper target.
1086cf2a73cSMauro Carvalho Chehab
1096cf2a73cSMauro Carvalho Chehab
1106cf2a73cSMauro Carvalho ChehabExample dmsetup usage
1116cf2a73cSMauro Carvalho Chehab=====================
1126cf2a73cSMauro Carvalho Chehab
1136cf2a73cSMauro Carvalho Chehabunstriped on top of Intel NVMe device that has 2 cores
114*dbeb56feSRandy Dunlap------------------------------------------------------
1156cf2a73cSMauro Carvalho Chehab
1166cf2a73cSMauro Carvalho Chehab::
1176cf2a73cSMauro Carvalho Chehab
1186cf2a73cSMauro Carvalho Chehab  dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0'
1196cf2a73cSMauro Carvalho Chehab  dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0'
1206cf2a73cSMauro Carvalho Chehab
1216cf2a73cSMauro Carvalho ChehabThere will now be two devices that expose Intel NVMe core 0 and 1
1226cf2a73cSMauro Carvalho Chehabrespectively::
1236cf2a73cSMauro Carvalho Chehab
1246cf2a73cSMauro Carvalho Chehab  /dev/mapper/nvmset0
1256cf2a73cSMauro Carvalho Chehab  /dev/mapper/nvmset1
1266cf2a73cSMauro Carvalho Chehab
1276cf2a73cSMauro Carvalho Chehabunstriped on top of striped with 4 drives using 128K chunk size
128*dbeb56feSRandy Dunlap---------------------------------------------------------------
1296cf2a73cSMauro Carvalho Chehab
1306cf2a73cSMauro Carvalho Chehab::
1316cf2a73cSMauro Carvalho Chehab
1326cf2a73cSMauro Carvalho Chehab  dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0'
1336cf2a73cSMauro Carvalho Chehab  dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0'
1346cf2a73cSMauro Carvalho Chehab  dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0'
1356cf2a73cSMauro Carvalho Chehab  dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0'
136