1*6cf2a73cSMauro Carvalho Chehab============= 2*6cf2a73cSMauro Carvalho Chehabdm-log-writes 3*6cf2a73cSMauro Carvalho Chehab============= 4*6cf2a73cSMauro Carvalho Chehab 5*6cf2a73cSMauro Carvalho ChehabThis target takes 2 devices, one to pass all IO to normally, and one to log all 6*6cf2a73cSMauro Carvalho Chehabof the write operations to. This is intended for file system developers wishing 7*6cf2a73cSMauro Carvalho Chehabto verify the integrity of metadata or data as the file system is written to. 8*6cf2a73cSMauro Carvalho ChehabThere is a log_write_entry written for every WRITE request and the target is 9*6cf2a73cSMauro Carvalho Chehabable to take arbitrary data from userspace to insert into the log. The data 10*6cf2a73cSMauro Carvalho Chehabthat is in the WRITE requests is copied into the log to make the replay happen 11*6cf2a73cSMauro Carvalho Chehabexactly as it happened originally. 12*6cf2a73cSMauro Carvalho Chehab 13*6cf2a73cSMauro Carvalho ChehabLog Ordering 14*6cf2a73cSMauro Carvalho Chehab============ 15*6cf2a73cSMauro Carvalho Chehab 16*6cf2a73cSMauro Carvalho ChehabWe log things in order of completion once we are sure the write is no longer in 17*6cf2a73cSMauro Carvalho Chehabcache. This means that normal WRITE requests are not actually logged until the 18*6cf2a73cSMauro Carvalho Chehabnext REQ_PREFLUSH request. This is to make it easier for userspace to replay 19*6cf2a73cSMauro Carvalho Chehabthe log in a way that correlates to what is on disk and not what is in cache, 20*6cf2a73cSMauro Carvalho Chehabto make it easier to detect improper waiting/flushing. 21*6cf2a73cSMauro Carvalho Chehab 22*6cf2a73cSMauro Carvalho ChehabThis works by attaching all WRITE requests to a list once the write completes. 23*6cf2a73cSMauro Carvalho ChehabOnce we see a REQ_PREFLUSH request we splice this list onto the request and once 24*6cf2a73cSMauro Carvalho Chehabthe FLUSH request completes we log all of the WRITEs and then the FLUSH. Only 25*6cf2a73cSMauro Carvalho Chehabcompleted WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to 26*6cf2a73cSMauro Carvalho Chehabsimulate the worst case scenario with regard to power failures. Consider the 27*6cf2a73cSMauro Carvalho Chehabfollowing example (W means write, C means complete): 28*6cf2a73cSMauro Carvalho Chehab 29*6cf2a73cSMauro Carvalho Chehab W1,W2,W3,C3,C2,Wflush,C1,Cflush 30*6cf2a73cSMauro Carvalho Chehab 31*6cf2a73cSMauro Carvalho ChehabThe log would show the following: 32*6cf2a73cSMauro Carvalho Chehab 33*6cf2a73cSMauro Carvalho Chehab W3,W2,flush,W1.... 34*6cf2a73cSMauro Carvalho Chehab 35*6cf2a73cSMauro Carvalho ChehabAgain this is to simulate what is actually on disk, this allows us to detect 36*6cf2a73cSMauro Carvalho Chehabcases where a power failure at a particular point in time would create an 37*6cf2a73cSMauro Carvalho Chehabinconsistent file system. 38*6cf2a73cSMauro Carvalho Chehab 39*6cf2a73cSMauro Carvalho ChehabAny REQ_FUA requests bypass this flushing mechanism and are logged as soon as 40*6cf2a73cSMauro Carvalho Chehabthey complete as those requests will obviously bypass the device cache. 41*6cf2a73cSMauro Carvalho Chehab 42*6cf2a73cSMauro Carvalho ChehabAny REQ_OP_DISCARD requests are treated like WRITE requests. Otherwise we would 43*6cf2a73cSMauro Carvalho Chehabhave all the DISCARD requests, and then the WRITE requests and then the FLUSH 44*6cf2a73cSMauro Carvalho Chehabrequest. Consider the following example: 45*6cf2a73cSMauro Carvalho Chehab 46*6cf2a73cSMauro Carvalho Chehab WRITE block 1, DISCARD block 1, FLUSH 47*6cf2a73cSMauro Carvalho Chehab 48*6cf2a73cSMauro Carvalho ChehabIf we logged DISCARD when it completed, the replay would look like this: 49*6cf2a73cSMauro Carvalho Chehab 50*6cf2a73cSMauro Carvalho Chehab DISCARD 1, WRITE 1, FLUSH 51*6cf2a73cSMauro Carvalho Chehab 52*6cf2a73cSMauro Carvalho Chehabwhich isn't quite what happened and wouldn't be caught during the log replay. 53*6cf2a73cSMauro Carvalho Chehab 54*6cf2a73cSMauro Carvalho ChehabTarget interface 55*6cf2a73cSMauro Carvalho Chehab================ 56*6cf2a73cSMauro Carvalho Chehab 57*6cf2a73cSMauro Carvalho Chehabi) Constructor 58*6cf2a73cSMauro Carvalho Chehab 59*6cf2a73cSMauro Carvalho Chehab log-writes <dev_path> <log_dev_path> 60*6cf2a73cSMauro Carvalho Chehab 61*6cf2a73cSMauro Carvalho Chehab ============= ============================================== 62*6cf2a73cSMauro Carvalho Chehab dev_path Device that all of the IO will go to normally. 63*6cf2a73cSMauro Carvalho Chehab log_dev_path Device where the log entries are written to. 64*6cf2a73cSMauro Carvalho Chehab ============= ============================================== 65*6cf2a73cSMauro Carvalho Chehab 66*6cf2a73cSMauro Carvalho Chehabii) Status 67*6cf2a73cSMauro Carvalho Chehab 68*6cf2a73cSMauro Carvalho Chehab <#logged entries> <highest allocated sector> 69*6cf2a73cSMauro Carvalho Chehab 70*6cf2a73cSMauro Carvalho Chehab =========================== ======================== 71*6cf2a73cSMauro Carvalho Chehab #logged entries Number of logged entries 72*6cf2a73cSMauro Carvalho Chehab highest allocated sector Highest allocated sector 73*6cf2a73cSMauro Carvalho Chehab =========================== ======================== 74*6cf2a73cSMauro Carvalho Chehab 75*6cf2a73cSMauro Carvalho Chehabiii) Messages 76*6cf2a73cSMauro Carvalho Chehab 77*6cf2a73cSMauro Carvalho Chehab mark <description> 78*6cf2a73cSMauro Carvalho Chehab 79*6cf2a73cSMauro Carvalho Chehab You can use a dmsetup message to set an arbitrary mark in a log. 80*6cf2a73cSMauro Carvalho Chehab For example say you want to fsck a file system after every 81*6cf2a73cSMauro Carvalho Chehab write, but first you need to replay up to the mkfs to make sure 82*6cf2a73cSMauro Carvalho Chehab we're fsck'ing something reasonable, you would do something like 83*6cf2a73cSMauro Carvalho Chehab this:: 84*6cf2a73cSMauro Carvalho Chehab 85*6cf2a73cSMauro Carvalho Chehab mkfs.btrfs -f /dev/mapper/log 86*6cf2a73cSMauro Carvalho Chehab dmsetup message log 0 mark mkfs 87*6cf2a73cSMauro Carvalho Chehab <run test> 88*6cf2a73cSMauro Carvalho Chehab 89*6cf2a73cSMauro Carvalho Chehab This would allow you to replay the log up to the mkfs mark and 90*6cf2a73cSMauro Carvalho Chehab then replay from that point on doing the fsck check in the 91*6cf2a73cSMauro Carvalho Chehab interval that you want. 92*6cf2a73cSMauro Carvalho Chehab 93*6cf2a73cSMauro Carvalho Chehab Every log has a mark at the end labeled "dm-log-writes-end". 94*6cf2a73cSMauro Carvalho Chehab 95*6cf2a73cSMauro Carvalho ChehabUserspace component 96*6cf2a73cSMauro Carvalho Chehab=================== 97*6cf2a73cSMauro Carvalho Chehab 98*6cf2a73cSMauro Carvalho ChehabThere is a userspace tool that will replay the log for you in various ways. 99*6cf2a73cSMauro Carvalho ChehabIt can be found here: https://github.com/josefbacik/log-writes 100*6cf2a73cSMauro Carvalho Chehab 101*6cf2a73cSMauro Carvalho ChehabExample usage 102*6cf2a73cSMauro Carvalho Chehab============= 103*6cf2a73cSMauro Carvalho Chehab 104*6cf2a73cSMauro Carvalho ChehabSay you want to test fsync on your file system. You would do something like 105*6cf2a73cSMauro Carvalho Chehabthis:: 106*6cf2a73cSMauro Carvalho Chehab 107*6cf2a73cSMauro Carvalho Chehab TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" 108*6cf2a73cSMauro Carvalho Chehab dmsetup create log --table "$TABLE" 109*6cf2a73cSMauro Carvalho Chehab mkfs.btrfs -f /dev/mapper/log 110*6cf2a73cSMauro Carvalho Chehab dmsetup message log 0 mark mkfs 111*6cf2a73cSMauro Carvalho Chehab 112*6cf2a73cSMauro Carvalho Chehab mount /dev/mapper/log /mnt/btrfs-test 113*6cf2a73cSMauro Carvalho Chehab <some test that does fsync at the end> 114*6cf2a73cSMauro Carvalho Chehab dmsetup message log 0 mark fsync 115*6cf2a73cSMauro Carvalho Chehab md5sum /mnt/btrfs-test/foo 116*6cf2a73cSMauro Carvalho Chehab umount /mnt/btrfs-test 117*6cf2a73cSMauro Carvalho Chehab 118*6cf2a73cSMauro Carvalho Chehab dmsetup remove log 119*6cf2a73cSMauro Carvalho Chehab replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync 120*6cf2a73cSMauro Carvalho Chehab mount /dev/sdb /mnt/btrfs-test 121*6cf2a73cSMauro Carvalho Chehab md5sum /mnt/btrfs-test/foo 122*6cf2a73cSMauro Carvalho Chehab <verify md5sum's are correct> 123*6cf2a73cSMauro Carvalho Chehab 124*6cf2a73cSMauro Carvalho Chehab Another option is to do a complicated file system operation and verify the file 125*6cf2a73cSMauro Carvalho Chehab system is consistent during the entire operation. You could do this with: 126*6cf2a73cSMauro Carvalho Chehab 127*6cf2a73cSMauro Carvalho Chehab TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" 128*6cf2a73cSMauro Carvalho Chehab dmsetup create log --table "$TABLE" 129*6cf2a73cSMauro Carvalho Chehab mkfs.btrfs -f /dev/mapper/log 130*6cf2a73cSMauro Carvalho Chehab dmsetup message log 0 mark mkfs 131*6cf2a73cSMauro Carvalho Chehab 132*6cf2a73cSMauro Carvalho Chehab mount /dev/mapper/log /mnt/btrfs-test 133*6cf2a73cSMauro Carvalho Chehab <fsstress to dirty the fs> 134*6cf2a73cSMauro Carvalho Chehab btrfs filesystem balance /mnt/btrfs-test 135*6cf2a73cSMauro Carvalho Chehab umount /mnt/btrfs-test 136*6cf2a73cSMauro Carvalho Chehab dmsetup remove log 137*6cf2a73cSMauro Carvalho Chehab 138*6cf2a73cSMauro Carvalho Chehab replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs 139*6cf2a73cSMauro Carvalho Chehab btrfsck /dev/sdb 140*6cf2a73cSMauro Carvalho Chehab replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ 141*6cf2a73cSMauro Carvalho Chehab --fsck "btrfsck /dev/sdb" --check fua 142*6cf2a73cSMauro Carvalho Chehab 143*6cf2a73cSMauro Carvalho ChehabAnd that will replay the log until it sees a FUA request, run the fsck command 144*6cf2a73cSMauro Carvalho Chehaband if the fsck passes it will replay to the next FUA, until it is completed or 145*6cf2a73cSMauro Carvalho Chehabthe fsck command exists abnormally. 146