1*6cf2a73cSMauro Carvalho Chehab=============== 2*6cf2a73cSMauro Carvalho ChehabPersistent data 3*6cf2a73cSMauro Carvalho Chehab=============== 4*6cf2a73cSMauro Carvalho Chehab 5*6cf2a73cSMauro Carvalho ChehabIntroduction 6*6cf2a73cSMauro Carvalho Chehab============ 7*6cf2a73cSMauro Carvalho Chehab 8*6cf2a73cSMauro Carvalho ChehabThe more-sophisticated device-mapper targets require complex metadata 9*6cf2a73cSMauro Carvalho Chehabthat is managed in kernel. In late 2010 we were seeing that various 10*6cf2a73cSMauro Carvalho Chehabdifferent targets were rolling their own data structures, for example: 11*6cf2a73cSMauro Carvalho Chehab 12*6cf2a73cSMauro Carvalho Chehab- Mikulas Patocka's multisnap implementation 13*6cf2a73cSMauro Carvalho Chehab- Heinz Mauelshagen's thin provisioning target 14*6cf2a73cSMauro Carvalho Chehab- Another btree-based caching target posted to dm-devel 15*6cf2a73cSMauro Carvalho Chehab- Another multi-snapshot target based on a design of Daniel Phillips 16*6cf2a73cSMauro Carvalho Chehab 17*6cf2a73cSMauro Carvalho ChehabMaintaining these data structures takes a lot of work, so if possible 18*6cf2a73cSMauro Carvalho Chehabwe'd like to reduce the number. 19*6cf2a73cSMauro Carvalho Chehab 20*6cf2a73cSMauro Carvalho ChehabThe persistent-data library is an attempt to provide a re-usable 21*6cf2a73cSMauro Carvalho Chehabframework for people who want to store metadata in device-mapper 22*6cf2a73cSMauro Carvalho Chehabtargets. It's currently used by the thin-provisioning target and an 23*6cf2a73cSMauro Carvalho Chehabupcoming hierarchical storage target. 24*6cf2a73cSMauro Carvalho Chehab 25*6cf2a73cSMauro Carvalho ChehabOverview 26*6cf2a73cSMauro Carvalho Chehab======== 27*6cf2a73cSMauro Carvalho Chehab 28*6cf2a73cSMauro Carvalho ChehabThe main documentation is in the header files which can all be found 29*6cf2a73cSMauro Carvalho Chehabunder drivers/md/persistent-data. 30*6cf2a73cSMauro Carvalho Chehab 31*6cf2a73cSMauro Carvalho ChehabThe block manager 32*6cf2a73cSMauro Carvalho Chehab----------------- 33*6cf2a73cSMauro Carvalho Chehab 34*6cf2a73cSMauro Carvalho Chehabdm-block-manager.[hc] 35*6cf2a73cSMauro Carvalho Chehab 36*6cf2a73cSMauro Carvalho ChehabThis provides access to the data on disk in fixed sized-blocks. There 37*6cf2a73cSMauro Carvalho Chehabis a read/write locking interface to prevent concurrent accesses, and 38*6cf2a73cSMauro Carvalho Chehabkeep data that is being used in the cache. 39*6cf2a73cSMauro Carvalho Chehab 40*6cf2a73cSMauro Carvalho ChehabClients of persistent-data are unlikely to use this directly. 41*6cf2a73cSMauro Carvalho Chehab 42*6cf2a73cSMauro Carvalho ChehabThe transaction manager 43*6cf2a73cSMauro Carvalho Chehab----------------------- 44*6cf2a73cSMauro Carvalho Chehab 45*6cf2a73cSMauro Carvalho Chehabdm-transaction-manager.[hc] 46*6cf2a73cSMauro Carvalho Chehab 47*6cf2a73cSMauro Carvalho ChehabThis restricts access to blocks and enforces copy-on-write semantics. 48*6cf2a73cSMauro Carvalho ChehabThe only way you can get hold of a writable block through the 49*6cf2a73cSMauro Carvalho Chehabtransaction manager is by shadowing an existing block (ie. doing 50*6cf2a73cSMauro Carvalho Chehabcopy-on-write) or allocating a fresh one. Shadowing is elided within 51*6cf2a73cSMauro Carvalho Chehabthe same transaction so performance is reasonable. The commit method 52*6cf2a73cSMauro Carvalho Chehabensures that all data is flushed before it writes the superblock. 53*6cf2a73cSMauro Carvalho ChehabOn power failure your metadata will be as it was when last committed. 54*6cf2a73cSMauro Carvalho Chehab 55*6cf2a73cSMauro Carvalho ChehabThe Space Maps 56*6cf2a73cSMauro Carvalho Chehab-------------- 57*6cf2a73cSMauro Carvalho Chehab 58*6cf2a73cSMauro Carvalho Chehabdm-space-map.h 59*6cf2a73cSMauro Carvalho Chehabdm-space-map-metadata.[hc] 60*6cf2a73cSMauro Carvalho Chehabdm-space-map-disk.[hc] 61*6cf2a73cSMauro Carvalho Chehab 62*6cf2a73cSMauro Carvalho ChehabOn-disk data structures that keep track of reference counts of blocks. 63*6cf2a73cSMauro Carvalho ChehabAlso acts as the allocator of new blocks. Currently two 64*6cf2a73cSMauro Carvalho Chehabimplementations: a simpler one for managing blocks on a different 65*6cf2a73cSMauro Carvalho Chehabdevice (eg. thinly-provisioned data blocks); and one for managing 66*6cf2a73cSMauro Carvalho Chehabthe metadata space. The latter is complicated by the need to store 67*6cf2a73cSMauro Carvalho Chehabits own data within the space it's managing. 68*6cf2a73cSMauro Carvalho Chehab 69*6cf2a73cSMauro Carvalho ChehabThe data structures 70*6cf2a73cSMauro Carvalho Chehab------------------- 71*6cf2a73cSMauro Carvalho Chehab 72*6cf2a73cSMauro Carvalho Chehabdm-btree.[hc] 73*6cf2a73cSMauro Carvalho Chehabdm-btree-remove.c 74*6cf2a73cSMauro Carvalho Chehabdm-btree-spine.c 75*6cf2a73cSMauro Carvalho Chehabdm-btree-internal.h 76*6cf2a73cSMauro Carvalho Chehab 77*6cf2a73cSMauro Carvalho ChehabCurrently there is only one data structure, a hierarchical btree. 78*6cf2a73cSMauro Carvalho ChehabThere are plans to add more. For example, something with an 79*6cf2a73cSMauro Carvalho Chehabarray-like interface would see a lot of use. 80*6cf2a73cSMauro Carvalho Chehab 81*6cf2a73cSMauro Carvalho ChehabThe btree is 'hierarchical' in that you can define it to be composed 82*6cf2a73cSMauro Carvalho Chehabof nested btrees, and take multiple keys. For example, the 83*6cf2a73cSMauro Carvalho Chehabthin-provisioning target uses a btree with two levels of nesting. 84*6cf2a73cSMauro Carvalho ChehabThe first maps a device id to a mapping tree, and that in turn maps a 85*6cf2a73cSMauro Carvalho Chehabvirtual block to a physical block. 86*6cf2a73cSMauro Carvalho Chehab 87*6cf2a73cSMauro Carvalho ChehabValues stored in the btrees can have arbitrary size. Keys are always 88*6cf2a73cSMauro Carvalho Chehab64bits, although nesting allows you to use multiple keys. 89