16cf2a73cSMauro Carvalho Chehab================================ 26cf2a73cSMauro Carvalho ChehabDevice-mapper "unstriped" target 36cf2a73cSMauro Carvalho Chehab================================ 46cf2a73cSMauro Carvalho Chehab 56cf2a73cSMauro Carvalho ChehabIntroduction 66cf2a73cSMauro Carvalho Chehab============ 76cf2a73cSMauro Carvalho Chehab 86cf2a73cSMauro Carvalho ChehabThe device-mapper "unstriped" target provides a transparent mechanism to 96cf2a73cSMauro Carvalho Chehabunstripe a device-mapper "striped" target to access the underlying disks 106cf2a73cSMauro Carvalho Chehabwithout having to touch the true backing block-device. It can also be 116cf2a73cSMauro Carvalho Chehabused to unstripe a hardware RAID-0 to access backing disks. 126cf2a73cSMauro Carvalho Chehab 136cf2a73cSMauro Carvalho ChehabParameters: 146cf2a73cSMauro Carvalho Chehab<number of stripes> <chunk size> <stripe #> <dev_path> <offset> 156cf2a73cSMauro Carvalho Chehab 166cf2a73cSMauro Carvalho Chehab<number of stripes> 176cf2a73cSMauro Carvalho Chehab The number of stripes in the RAID 0. 186cf2a73cSMauro Carvalho Chehab 196cf2a73cSMauro Carvalho Chehab<chunk size> 206cf2a73cSMauro Carvalho Chehab The amount of 512B sectors in the chunk striping. 216cf2a73cSMauro Carvalho Chehab 226cf2a73cSMauro Carvalho Chehab<dev_path> 236cf2a73cSMauro Carvalho Chehab The block device you wish to unstripe. 246cf2a73cSMauro Carvalho Chehab 256cf2a73cSMauro Carvalho Chehab<stripe #> 266cf2a73cSMauro Carvalho Chehab The stripe number within the device that corresponds to physical 276cf2a73cSMauro Carvalho Chehab drive you wish to unstripe. This must be 0 indexed. 286cf2a73cSMauro Carvalho Chehab 296cf2a73cSMauro Carvalho Chehab 306cf2a73cSMauro Carvalho ChehabWhy use this module? 316cf2a73cSMauro Carvalho Chehab==================== 326cf2a73cSMauro Carvalho Chehab 336cf2a73cSMauro Carvalho ChehabAn example of undoing an existing dm-stripe 346cf2a73cSMauro Carvalho Chehab------------------------------------------- 356cf2a73cSMauro Carvalho Chehab 366cf2a73cSMauro Carvalho ChehabThis small bash script will setup 4 loop devices and use the existing 376cf2a73cSMauro Carvalho Chehabstriped target to combine the 4 devices into one. It then will use 386cf2a73cSMauro Carvalho Chehabthe unstriped target ontop of the striped device to access the 396cf2a73cSMauro Carvalho Chehabindividual backing loop devices. We write data to the newly exposed 406cf2a73cSMauro Carvalho Chehabunstriped devices and verify the data written matches the correct 416cf2a73cSMauro Carvalho Chehabunderlying device on the striped array:: 426cf2a73cSMauro Carvalho Chehab 436cf2a73cSMauro Carvalho Chehab #!/bin/bash 446cf2a73cSMauro Carvalho Chehab 456cf2a73cSMauro Carvalho Chehab MEMBER_SIZE=$((128 * 1024 * 1024)) 466cf2a73cSMauro Carvalho Chehab NUM=4 476cf2a73cSMauro Carvalho Chehab SEQ_END=$((${NUM}-1)) 486cf2a73cSMauro Carvalho Chehab CHUNK=256 496cf2a73cSMauro Carvalho Chehab BS=4096 506cf2a73cSMauro Carvalho Chehab 516cf2a73cSMauro Carvalho Chehab RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512)) 526cf2a73cSMauro Carvalho Chehab DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}" 536cf2a73cSMauro Carvalho Chehab COUNT=$((${MEMBER_SIZE} / ${BS})) 546cf2a73cSMauro Carvalho Chehab 556cf2a73cSMauro Carvalho Chehab for i in $(seq 0 ${SEQ_END}); do 566cf2a73cSMauro Carvalho Chehab dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct 576cf2a73cSMauro Carvalho Chehab losetup /dev/loop${i} member-${i} 586cf2a73cSMauro Carvalho Chehab DM_PARMS+=" /dev/loop${i} 0" 596cf2a73cSMauro Carvalho Chehab done 606cf2a73cSMauro Carvalho Chehab 616cf2a73cSMauro Carvalho Chehab echo $DM_PARMS | dmsetup create raid0 626cf2a73cSMauro Carvalho Chehab for i in $(seq 0 ${SEQ_END}); do 636cf2a73cSMauro Carvalho Chehab echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i} 646cf2a73cSMauro Carvalho Chehab done; 656cf2a73cSMauro Carvalho Chehab 666cf2a73cSMauro Carvalho Chehab for i in $(seq 0 ${SEQ_END}); do 676cf2a73cSMauro Carvalho Chehab dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct 686cf2a73cSMauro Carvalho Chehab diff /dev/mapper/set-${i} member-${i} 696cf2a73cSMauro Carvalho Chehab done; 706cf2a73cSMauro Carvalho Chehab 716cf2a73cSMauro Carvalho Chehab for i in $(seq 0 ${SEQ_END}); do 726cf2a73cSMauro Carvalho Chehab dmsetup remove set-${i} 736cf2a73cSMauro Carvalho Chehab done 746cf2a73cSMauro Carvalho Chehab 756cf2a73cSMauro Carvalho Chehab dmsetup remove raid0 766cf2a73cSMauro Carvalho Chehab 776cf2a73cSMauro Carvalho Chehab for i in $(seq 0 ${SEQ_END}); do 786cf2a73cSMauro Carvalho Chehab losetup -d /dev/loop${i} 796cf2a73cSMauro Carvalho Chehab rm -f member-${i} 806cf2a73cSMauro Carvalho Chehab done 816cf2a73cSMauro Carvalho Chehab 826cf2a73cSMauro Carvalho ChehabAnother example 836cf2a73cSMauro Carvalho Chehab--------------- 846cf2a73cSMauro Carvalho Chehab 856cf2a73cSMauro Carvalho ChehabIntel NVMe drives contain two cores on the physical device. 866cf2a73cSMauro Carvalho ChehabEach core of the drive has segregated access to its LBA range. 876cf2a73cSMauro Carvalho ChehabThe current LBA model has a RAID 0 128k chunk on each core, resulting 886cf2a73cSMauro Carvalho Chehabin a 256k stripe across the two cores:: 896cf2a73cSMauro Carvalho Chehab 906cf2a73cSMauro Carvalho Chehab Core 0: Core 1: 916cf2a73cSMauro Carvalho Chehab __________ __________ 926cf2a73cSMauro Carvalho Chehab | LBA 512| | LBA 768| 936cf2a73cSMauro Carvalho Chehab | LBA 0 | | LBA 256| 946cf2a73cSMauro Carvalho Chehab ---------- ---------- 956cf2a73cSMauro Carvalho Chehab 966cf2a73cSMauro Carvalho ChehabThe purpose of this unstriping is to provide better QoS in noisy 976cf2a73cSMauro Carvalho Chehabneighbor environments. When two partitions are created on the 986cf2a73cSMauro Carvalho Chehabaggregate drive without this unstriping, reads on one partition 996cf2a73cSMauro Carvalho Chehabcan affect writes on another partition. This is because the partitions 1006cf2a73cSMauro Carvalho Chehabare striped across the two cores. When we unstripe this hardware RAID 0 1016cf2a73cSMauro Carvalho Chehaband make partitions on each new exposed device the two partitions are now 1026cf2a73cSMauro Carvalho Chehabphysically separated. 1036cf2a73cSMauro Carvalho Chehab 1046cf2a73cSMauro Carvalho ChehabWith the dm-unstriped target we're able to segregate an fio script that 1056cf2a73cSMauro Carvalho Chehabhas read and write jobs that are independent of each other. Compared to 1066cf2a73cSMauro Carvalho Chehabwhen we run the test on a combined drive with partitions, we were able 1076cf2a73cSMauro Carvalho Chehabto get a 92% reduction in read latency using this device mapper target. 1086cf2a73cSMauro Carvalho Chehab 1096cf2a73cSMauro Carvalho Chehab 1106cf2a73cSMauro Carvalho ChehabExample dmsetup usage 1116cf2a73cSMauro Carvalho Chehab===================== 1126cf2a73cSMauro Carvalho Chehab 1136cf2a73cSMauro Carvalho Chehabunstriped ontop of Intel NVMe device that has 2 cores 1146cf2a73cSMauro Carvalho Chehab----------------------------------------------------- 1156cf2a73cSMauro Carvalho Chehab 1166cf2a73cSMauro Carvalho Chehab:: 1176cf2a73cSMauro Carvalho Chehab 1186cf2a73cSMauro Carvalho Chehab dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0' 1196cf2a73cSMauro Carvalho Chehab dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0' 1206cf2a73cSMauro Carvalho Chehab 1216cf2a73cSMauro Carvalho ChehabThere will now be two devices that expose Intel NVMe core 0 and 1 1226cf2a73cSMauro Carvalho Chehabrespectively:: 1236cf2a73cSMauro Carvalho Chehab 1246cf2a73cSMauro Carvalho Chehab /dev/mapper/nvmset0 1256cf2a73cSMauro Carvalho Chehab /dev/mapper/nvmset1 1266cf2a73cSMauro Carvalho Chehab 1276cf2a73cSMauro Carvalho Chehabunstriped ontop of striped with 4 drives using 128K chunk size 1286cf2a73cSMauro Carvalho Chehab-------------------------------------------------------------- 1296cf2a73cSMauro Carvalho Chehab 1306cf2a73cSMauro Carvalho Chehab:: 1316cf2a73cSMauro Carvalho Chehab 1326cf2a73cSMauro Carvalho Chehab dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0' 1336cf2a73cSMauro Carvalho Chehab dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0' 1346cf2a73cSMauro Carvalho Chehab dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0' 1356cf2a73cSMauro Carvalho Chehab dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0' 136