1*6cf2a73cSMauro Carvalho Chehab================= 2*6cf2a73cSMauro Carvalho ChehabThin provisioning 3*6cf2a73cSMauro Carvalho Chehab================= 4*6cf2a73cSMauro Carvalho Chehab 5*6cf2a73cSMauro Carvalho ChehabIntroduction 6*6cf2a73cSMauro Carvalho Chehab============ 7*6cf2a73cSMauro Carvalho Chehab 8*6cf2a73cSMauro Carvalho ChehabThis document describes a collection of device-mapper targets that 9*6cf2a73cSMauro Carvalho Chehabbetween them implement thin-provisioning and snapshots. 10*6cf2a73cSMauro Carvalho Chehab 11*6cf2a73cSMauro Carvalho ChehabThe main highlight of this implementation, compared to the previous 12*6cf2a73cSMauro Carvalho Chehabimplementation of snapshots, is that it allows many virtual devices to 13*6cf2a73cSMauro Carvalho Chehabbe stored on the same data volume. This simplifies administration and 14*6cf2a73cSMauro Carvalho Chehaballows the sharing of data between volumes, thus reducing disk usage. 15*6cf2a73cSMauro Carvalho Chehab 16*6cf2a73cSMauro Carvalho ChehabAnother significant feature is support for an arbitrary depth of 17*6cf2a73cSMauro Carvalho Chehabrecursive snapshots (snapshots of snapshots of snapshots ...). The 18*6cf2a73cSMauro Carvalho Chehabprevious implementation of snapshots did this by chaining together 19*6cf2a73cSMauro Carvalho Chehablookup tables, and so performance was O(depth). This new 20*6cf2a73cSMauro Carvalho Chehabimplementation uses a single data structure to avoid this degradation 21*6cf2a73cSMauro Carvalho Chehabwith depth. Fragmentation may still be an issue, however, in some 22*6cf2a73cSMauro Carvalho Chehabscenarios. 23*6cf2a73cSMauro Carvalho Chehab 24*6cf2a73cSMauro Carvalho ChehabMetadata is stored on a separate device from data, giving the 25*6cf2a73cSMauro Carvalho Chehabadministrator some freedom, for example to: 26*6cf2a73cSMauro Carvalho Chehab 27*6cf2a73cSMauro Carvalho Chehab- Improve metadata resilience by storing metadata on a mirrored volume 28*6cf2a73cSMauro Carvalho Chehab but data on a non-mirrored one. 29*6cf2a73cSMauro Carvalho Chehab 30*6cf2a73cSMauro Carvalho Chehab- Improve performance by storing the metadata on SSD. 31*6cf2a73cSMauro Carvalho Chehab 32*6cf2a73cSMauro Carvalho ChehabStatus 33*6cf2a73cSMauro Carvalho Chehab====== 34*6cf2a73cSMauro Carvalho Chehab 35*6cf2a73cSMauro Carvalho ChehabThese targets are considered safe for production use. But different use 36*6cf2a73cSMauro Carvalho Chehabcases will have different performance characteristics, for example due 37*6cf2a73cSMauro Carvalho Chehabto fragmentation of the data volume. 38*6cf2a73cSMauro Carvalho Chehab 39*6cf2a73cSMauro Carvalho ChehabIf you find this software is not performing as expected please mail 40*6cf2a73cSMauro Carvalho Chehabdm-devel@redhat.com with details and we'll try our best to improve 41*6cf2a73cSMauro Carvalho Chehabthings for you. 42*6cf2a73cSMauro Carvalho Chehab 43*6cf2a73cSMauro Carvalho ChehabUserspace tools for checking and repairing the metadata have been fully 44*6cf2a73cSMauro Carvalho Chehabdeveloped and are available as 'thin_check' and 'thin_repair'. The name 45*6cf2a73cSMauro Carvalho Chehabof the package that provides these utilities varies by distribution (on 46*6cf2a73cSMauro Carvalho Chehaba Red Hat distribution it is named 'device-mapper-persistent-data'). 47*6cf2a73cSMauro Carvalho Chehab 48*6cf2a73cSMauro Carvalho ChehabCookbook 49*6cf2a73cSMauro Carvalho Chehab======== 50*6cf2a73cSMauro Carvalho Chehab 51*6cf2a73cSMauro Carvalho ChehabThis section describes some quick recipes for using thin provisioning. 52*6cf2a73cSMauro Carvalho ChehabThey use the dmsetup program to control the device-mapper driver 53*6cf2a73cSMauro Carvalho Chehabdirectly. End users will be advised to use a higher-level volume 54*6cf2a73cSMauro Carvalho Chehabmanager such as LVM2 once support has been added. 55*6cf2a73cSMauro Carvalho Chehab 56*6cf2a73cSMauro Carvalho ChehabPool device 57*6cf2a73cSMauro Carvalho Chehab----------- 58*6cf2a73cSMauro Carvalho Chehab 59*6cf2a73cSMauro Carvalho ChehabThe pool device ties together the metadata volume and the data volume. 60*6cf2a73cSMauro Carvalho ChehabIt maps I/O linearly to the data volume and updates the metadata via 61*6cf2a73cSMauro Carvalho Chehabtwo mechanisms: 62*6cf2a73cSMauro Carvalho Chehab 63*6cf2a73cSMauro Carvalho Chehab- Function calls from the thin targets 64*6cf2a73cSMauro Carvalho Chehab 65*6cf2a73cSMauro Carvalho Chehab- Device-mapper 'messages' from userspace which control the creation of new 66*6cf2a73cSMauro Carvalho Chehab virtual devices amongst other things. 67*6cf2a73cSMauro Carvalho Chehab 68*6cf2a73cSMauro Carvalho ChehabSetting up a fresh pool device 69*6cf2a73cSMauro Carvalho Chehab------------------------------ 70*6cf2a73cSMauro Carvalho Chehab 71*6cf2a73cSMauro Carvalho ChehabSetting up a pool device requires a valid metadata device, and a 72*6cf2a73cSMauro Carvalho Chehabdata device. If you do not have an existing metadata device you can 73*6cf2a73cSMauro Carvalho Chehabmake one by zeroing the first 4k to indicate empty metadata. 74*6cf2a73cSMauro Carvalho Chehab 75*6cf2a73cSMauro Carvalho Chehab dd if=/dev/zero of=$metadata_dev bs=4096 count=1 76*6cf2a73cSMauro Carvalho Chehab 77*6cf2a73cSMauro Carvalho ChehabThe amount of metadata you need will vary according to how many blocks 78*6cf2a73cSMauro Carvalho Chehabare shared between thin devices (i.e. through snapshots). If you have 79*6cf2a73cSMauro Carvalho Chehabless sharing than average you'll need a larger-than-average metadata device. 80*6cf2a73cSMauro Carvalho Chehab 81*6cf2a73cSMauro Carvalho ChehabAs a guide, we suggest you calculate the number of bytes to use in the 82*6cf2a73cSMauro Carvalho Chehabmetadata device as 48 * $data_dev_size / $data_block_size but round it up 83*6cf2a73cSMauro Carvalho Chehabto 2MB if the answer is smaller. If you're creating large numbers of 84*6cf2a73cSMauro Carvalho Chehabsnapshots which are recording large amounts of change, you may find you 85*6cf2a73cSMauro Carvalho Chehabneed to increase this. 86*6cf2a73cSMauro Carvalho Chehab 87*6cf2a73cSMauro Carvalho ChehabThe largest size supported is 16GB: If the device is larger, 88*6cf2a73cSMauro Carvalho Chehaba warning will be issued and the excess space will not be used. 89*6cf2a73cSMauro Carvalho Chehab 90*6cf2a73cSMauro Carvalho ChehabReloading a pool table 91*6cf2a73cSMauro Carvalho Chehab---------------------- 92*6cf2a73cSMauro Carvalho Chehab 93*6cf2a73cSMauro Carvalho ChehabYou may reload a pool's table, indeed this is how the pool is resized 94*6cf2a73cSMauro Carvalho Chehabif it runs out of space. (N.B. While specifying a different metadata 95*6cf2a73cSMauro Carvalho Chehabdevice when reloading is not forbidden at the moment, things will go 96*6cf2a73cSMauro Carvalho Chehabwrong if it does not route I/O to exactly the same on-disk location as 97*6cf2a73cSMauro Carvalho Chehabpreviously.) 98*6cf2a73cSMauro Carvalho Chehab 99*6cf2a73cSMauro Carvalho ChehabUsing an existing pool device 100*6cf2a73cSMauro Carvalho Chehab----------------------------- 101*6cf2a73cSMauro Carvalho Chehab 102*6cf2a73cSMauro Carvalho Chehab:: 103*6cf2a73cSMauro Carvalho Chehab 104*6cf2a73cSMauro Carvalho Chehab dmsetup create pool \ 105*6cf2a73cSMauro Carvalho Chehab --table "0 20971520 thin-pool $metadata_dev $data_dev \ 106*6cf2a73cSMauro Carvalho Chehab $data_block_size $low_water_mark" 107*6cf2a73cSMauro Carvalho Chehab 108*6cf2a73cSMauro Carvalho Chehab$data_block_size gives the smallest unit of disk space that can be 109*6cf2a73cSMauro Carvalho Chehaballocated at a time expressed in units of 512-byte sectors. 110*6cf2a73cSMauro Carvalho Chehab$data_block_size must be between 128 (64KB) and 2097152 (1GB) and a 111*6cf2a73cSMauro Carvalho Chehabmultiple of 128 (64KB). $data_block_size cannot be changed after the 112*6cf2a73cSMauro Carvalho Chehabthin-pool is created. People primarily interested in thin provisioning 113*6cf2a73cSMauro Carvalho Chehabmay want to use a value such as 1024 (512KB). People doing lots of 114*6cf2a73cSMauro Carvalho Chehabsnapshotting may want a smaller value such as 128 (64KB). If you are 115*6cf2a73cSMauro Carvalho Chehabnot zeroing newly-allocated data, a larger $data_block_size in the 116*6cf2a73cSMauro Carvalho Chehabregion of 256000 (128MB) is suggested. 117*6cf2a73cSMauro Carvalho Chehab 118*6cf2a73cSMauro Carvalho Chehab$low_water_mark is expressed in blocks of size $data_block_size. If 119*6cf2a73cSMauro Carvalho Chehabfree space on the data device drops below this level then a dm event 120*6cf2a73cSMauro Carvalho Chehabwill be triggered which a userspace daemon should catch allowing it to 121*6cf2a73cSMauro Carvalho Chehabextend the pool device. Only one such event will be sent. 122*6cf2a73cSMauro Carvalho Chehab 123*6cf2a73cSMauro Carvalho ChehabNo special event is triggered if a just resumed device's free space is below 124*6cf2a73cSMauro Carvalho Chehabthe low water mark. However, resuming a device always triggers an 125*6cf2a73cSMauro Carvalho Chehabevent; a userspace daemon should verify that free space exceeds the low 126*6cf2a73cSMauro Carvalho Chehabwater mark when handling this event. 127*6cf2a73cSMauro Carvalho Chehab 128*6cf2a73cSMauro Carvalho ChehabA low water mark for the metadata device is maintained in the kernel and 129*6cf2a73cSMauro Carvalho Chehabwill trigger a dm event if free space on the metadata device drops below 130*6cf2a73cSMauro Carvalho Chehabit. 131*6cf2a73cSMauro Carvalho Chehab 132*6cf2a73cSMauro Carvalho ChehabUpdating on-disk metadata 133*6cf2a73cSMauro Carvalho Chehab------------------------- 134*6cf2a73cSMauro Carvalho Chehab 135*6cf2a73cSMauro Carvalho ChehabOn-disk metadata is committed every time a FLUSH or FUA bio is written. 136*6cf2a73cSMauro Carvalho ChehabIf no such requests are made then commits will occur every second. This 137*6cf2a73cSMauro Carvalho Chehabmeans the thin-provisioning target behaves like a physical disk that has 138*6cf2a73cSMauro Carvalho Chehaba volatile write cache. If power is lost you may lose some recent 139*6cf2a73cSMauro Carvalho Chehabwrites. The metadata should always be consistent in spite of any crash. 140*6cf2a73cSMauro Carvalho Chehab 141*6cf2a73cSMauro Carvalho ChehabIf data space is exhausted the pool will either error or queue IO 142*6cf2a73cSMauro Carvalho Chehabaccording to the configuration (see: error_if_no_space). If metadata 143*6cf2a73cSMauro Carvalho Chehabspace is exhausted or a metadata operation fails: the pool will error IO 144*6cf2a73cSMauro Carvalho Chehabuntil the pool is taken offline and repair is performed to 1) fix any 145*6cf2a73cSMauro Carvalho Chehabpotential inconsistencies and 2) clear the flag that imposes repair. 146*6cf2a73cSMauro Carvalho ChehabOnce the pool's metadata device is repaired it may be resized, which 147*6cf2a73cSMauro Carvalho Chehabwill allow the pool to return to normal operation. Note that if a pool 148*6cf2a73cSMauro Carvalho Chehabis flagged as needing repair, the pool's data and metadata devices 149*6cf2a73cSMauro Carvalho Chehabcannot be resized until repair is performed. It should also be noted 150*6cf2a73cSMauro Carvalho Chehabthat when the pool's metadata space is exhausted the current metadata 151*6cf2a73cSMauro Carvalho Chehabtransaction is aborted. Given that the pool will cache IO whose 152*6cf2a73cSMauro Carvalho Chehabcompletion may have already been acknowledged to upper IO layers 153*6cf2a73cSMauro Carvalho Chehab(e.g. filesystem) it is strongly suggested that consistency checks 154*6cf2a73cSMauro Carvalho Chehab(e.g. fsck) be performed on those layers when repair of the pool is 155*6cf2a73cSMauro Carvalho Chehabrequired. 156*6cf2a73cSMauro Carvalho Chehab 157*6cf2a73cSMauro Carvalho ChehabThin provisioning 158*6cf2a73cSMauro Carvalho Chehab----------------- 159*6cf2a73cSMauro Carvalho Chehab 160*6cf2a73cSMauro Carvalho Chehabi) Creating a new thinly-provisioned volume. 161*6cf2a73cSMauro Carvalho Chehab 162*6cf2a73cSMauro Carvalho Chehab To create a new thinly- provisioned volume you must send a message to an 163*6cf2a73cSMauro Carvalho Chehab active pool device, /dev/mapper/pool in this example:: 164*6cf2a73cSMauro Carvalho Chehab 165*6cf2a73cSMauro Carvalho Chehab dmsetup message /dev/mapper/pool 0 "create_thin 0" 166*6cf2a73cSMauro Carvalho Chehab 167*6cf2a73cSMauro Carvalho Chehab Here '0' is an identifier for the volume, a 24-bit number. It's up 168*6cf2a73cSMauro Carvalho Chehab to the caller to allocate and manage these identifiers. If the 169*6cf2a73cSMauro Carvalho Chehab identifier is already in use, the message will fail with -EEXIST. 170*6cf2a73cSMauro Carvalho Chehab 171*6cf2a73cSMauro Carvalho Chehabii) Using a thinly-provisioned volume. 172*6cf2a73cSMauro Carvalho Chehab 173*6cf2a73cSMauro Carvalho Chehab Thinly-provisioned volumes are activated using the 'thin' target:: 174*6cf2a73cSMauro Carvalho Chehab 175*6cf2a73cSMauro Carvalho Chehab dmsetup create thin --table "0 2097152 thin /dev/mapper/pool 0" 176*6cf2a73cSMauro Carvalho Chehab 177*6cf2a73cSMauro Carvalho Chehab The last parameter is the identifier for the thinp device. 178*6cf2a73cSMauro Carvalho Chehab 179*6cf2a73cSMauro Carvalho ChehabInternal snapshots 180*6cf2a73cSMauro Carvalho Chehab------------------ 181*6cf2a73cSMauro Carvalho Chehab 182*6cf2a73cSMauro Carvalho Chehabi) Creating an internal snapshot. 183*6cf2a73cSMauro Carvalho Chehab 184*6cf2a73cSMauro Carvalho Chehab Snapshots are created with another message to the pool. 185*6cf2a73cSMauro Carvalho Chehab 186*6cf2a73cSMauro Carvalho Chehab N.B. If the origin device that you wish to snapshot is active, you 187*6cf2a73cSMauro Carvalho Chehab must suspend it before creating the snapshot to avoid corruption. 188*6cf2a73cSMauro Carvalho Chehab This is NOT enforced at the moment, so please be careful! 189*6cf2a73cSMauro Carvalho Chehab 190*6cf2a73cSMauro Carvalho Chehab :: 191*6cf2a73cSMauro Carvalho Chehab 192*6cf2a73cSMauro Carvalho Chehab dmsetup suspend /dev/mapper/thin 193*6cf2a73cSMauro Carvalho Chehab dmsetup message /dev/mapper/pool 0 "create_snap 1 0" 194*6cf2a73cSMauro Carvalho Chehab dmsetup resume /dev/mapper/thin 195*6cf2a73cSMauro Carvalho Chehab 196*6cf2a73cSMauro Carvalho Chehab Here '1' is the identifier for the volume, a 24-bit number. '0' is the 197*6cf2a73cSMauro Carvalho Chehab identifier for the origin device. 198*6cf2a73cSMauro Carvalho Chehab 199*6cf2a73cSMauro Carvalho Chehabii) Using an internal snapshot. 200*6cf2a73cSMauro Carvalho Chehab 201*6cf2a73cSMauro Carvalho Chehab Once created, the user doesn't have to worry about any connection 202*6cf2a73cSMauro Carvalho Chehab between the origin and the snapshot. Indeed the snapshot is no 203*6cf2a73cSMauro Carvalho Chehab different from any other thinly-provisioned device and can be 204*6cf2a73cSMauro Carvalho Chehab snapshotted itself via the same method. It's perfectly legal to 205*6cf2a73cSMauro Carvalho Chehab have only one of them active, and there's no ordering requirement on 206*6cf2a73cSMauro Carvalho Chehab activating or removing them both. (This differs from conventional 207*6cf2a73cSMauro Carvalho Chehab device-mapper snapshots.) 208*6cf2a73cSMauro Carvalho Chehab 209*6cf2a73cSMauro Carvalho Chehab Activate it exactly the same way as any other thinly-provisioned volume:: 210*6cf2a73cSMauro Carvalho Chehab 211*6cf2a73cSMauro Carvalho Chehab dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1" 212*6cf2a73cSMauro Carvalho Chehab 213*6cf2a73cSMauro Carvalho ChehabExternal snapshots 214*6cf2a73cSMauro Carvalho Chehab------------------ 215*6cf2a73cSMauro Carvalho Chehab 216*6cf2a73cSMauro Carvalho ChehabYou can use an external **read only** device as an origin for a 217*6cf2a73cSMauro Carvalho Chehabthinly-provisioned volume. Any read to an unprovisioned area of the 218*6cf2a73cSMauro Carvalho Chehabthin device will be passed through to the origin. Writes trigger 219*6cf2a73cSMauro Carvalho Chehabthe allocation of new blocks as usual. 220*6cf2a73cSMauro Carvalho Chehab 221*6cf2a73cSMauro Carvalho ChehabOne use case for this is VM hosts that want to run guests on 222*6cf2a73cSMauro Carvalho Chehabthinly-provisioned volumes but have the base image on another device 223*6cf2a73cSMauro Carvalho Chehab(possibly shared between many VMs). 224*6cf2a73cSMauro Carvalho Chehab 225*6cf2a73cSMauro Carvalho ChehabYou must not write to the origin device if you use this technique! 226*6cf2a73cSMauro Carvalho ChehabOf course, you may write to the thin device and take internal snapshots 227*6cf2a73cSMauro Carvalho Chehabof the thin volume. 228*6cf2a73cSMauro Carvalho Chehab 229*6cf2a73cSMauro Carvalho Chehabi) Creating a snapshot of an external device 230*6cf2a73cSMauro Carvalho Chehab 231*6cf2a73cSMauro Carvalho Chehab This is the same as creating a thin device. 232*6cf2a73cSMauro Carvalho Chehab You don't mention the origin at this stage. 233*6cf2a73cSMauro Carvalho Chehab 234*6cf2a73cSMauro Carvalho Chehab :: 235*6cf2a73cSMauro Carvalho Chehab 236*6cf2a73cSMauro Carvalho Chehab dmsetup message /dev/mapper/pool 0 "create_thin 0" 237*6cf2a73cSMauro Carvalho Chehab 238*6cf2a73cSMauro Carvalho Chehabii) Using a snapshot of an external device. 239*6cf2a73cSMauro Carvalho Chehab 240*6cf2a73cSMauro Carvalho Chehab Append an extra parameter to the thin target specifying the origin:: 241*6cf2a73cSMauro Carvalho Chehab 242*6cf2a73cSMauro Carvalho Chehab dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image" 243*6cf2a73cSMauro Carvalho Chehab 244*6cf2a73cSMauro Carvalho Chehab N.B. All descendants (internal snapshots) of this snapshot require the 245*6cf2a73cSMauro Carvalho Chehab same extra origin parameter. 246*6cf2a73cSMauro Carvalho Chehab 247*6cf2a73cSMauro Carvalho ChehabDeactivation 248*6cf2a73cSMauro Carvalho Chehab------------ 249*6cf2a73cSMauro Carvalho Chehab 250*6cf2a73cSMauro Carvalho ChehabAll devices using a pool must be deactivated before the pool itself 251*6cf2a73cSMauro Carvalho Chehabcan be. 252*6cf2a73cSMauro Carvalho Chehab 253*6cf2a73cSMauro Carvalho Chehab:: 254*6cf2a73cSMauro Carvalho Chehab 255*6cf2a73cSMauro Carvalho Chehab dmsetup remove thin 256*6cf2a73cSMauro Carvalho Chehab dmsetup remove snap 257*6cf2a73cSMauro Carvalho Chehab dmsetup remove pool 258*6cf2a73cSMauro Carvalho Chehab 259*6cf2a73cSMauro Carvalho ChehabReference 260*6cf2a73cSMauro Carvalho Chehab========= 261*6cf2a73cSMauro Carvalho Chehab 262*6cf2a73cSMauro Carvalho Chehab'thin-pool' target 263*6cf2a73cSMauro Carvalho Chehab------------------ 264*6cf2a73cSMauro Carvalho Chehab 265*6cf2a73cSMauro Carvalho Chehabi) Constructor 266*6cf2a73cSMauro Carvalho Chehab 267*6cf2a73cSMauro Carvalho Chehab :: 268*6cf2a73cSMauro Carvalho Chehab 269*6cf2a73cSMauro Carvalho Chehab thin-pool <metadata dev> <data dev> <data block size (sectors)> \ 270*6cf2a73cSMauro Carvalho Chehab <low water mark (blocks)> [<number of feature args> [<arg>]*] 271*6cf2a73cSMauro Carvalho Chehab 272*6cf2a73cSMauro Carvalho Chehab Optional feature arguments: 273*6cf2a73cSMauro Carvalho Chehab 274*6cf2a73cSMauro Carvalho Chehab skip_block_zeroing: 275*6cf2a73cSMauro Carvalho Chehab Skip the zeroing of newly-provisioned blocks. 276*6cf2a73cSMauro Carvalho Chehab 277*6cf2a73cSMauro Carvalho Chehab ignore_discard: 278*6cf2a73cSMauro Carvalho Chehab Disable discard support. 279*6cf2a73cSMauro Carvalho Chehab 280*6cf2a73cSMauro Carvalho Chehab no_discard_passdown: 281*6cf2a73cSMauro Carvalho Chehab Don't pass discards down to the underlying 282*6cf2a73cSMauro Carvalho Chehab data device, but just remove the mapping. 283*6cf2a73cSMauro Carvalho Chehab 284*6cf2a73cSMauro Carvalho Chehab read_only: 285*6cf2a73cSMauro Carvalho Chehab Don't allow any changes to be made to the pool 286*6cf2a73cSMauro Carvalho Chehab metadata. This mode is only available after the 287*6cf2a73cSMauro Carvalho Chehab thin-pool has been created and first used in full 288*6cf2a73cSMauro Carvalho Chehab read/write mode. It cannot be specified on initial 289*6cf2a73cSMauro Carvalho Chehab thin-pool creation. 290*6cf2a73cSMauro Carvalho Chehab 291*6cf2a73cSMauro Carvalho Chehab error_if_no_space: 292*6cf2a73cSMauro Carvalho Chehab Error IOs, instead of queueing, if no space. 293*6cf2a73cSMauro Carvalho Chehab 294*6cf2a73cSMauro Carvalho Chehab Data block size must be between 64KB (128 sectors) and 1GB 295*6cf2a73cSMauro Carvalho Chehab (2097152 sectors) inclusive. 296*6cf2a73cSMauro Carvalho Chehab 297*6cf2a73cSMauro Carvalho Chehab 298*6cf2a73cSMauro Carvalho Chehabii) Status 299*6cf2a73cSMauro Carvalho Chehab 300*6cf2a73cSMauro Carvalho Chehab :: 301*6cf2a73cSMauro Carvalho Chehab 302*6cf2a73cSMauro Carvalho Chehab <transaction id> <used metadata blocks>/<total metadata blocks> 303*6cf2a73cSMauro Carvalho Chehab <used data blocks>/<total data blocks> <held metadata root> 304*6cf2a73cSMauro Carvalho Chehab ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space 305*6cf2a73cSMauro Carvalho Chehab needs_check|- metadata_low_watermark 306*6cf2a73cSMauro Carvalho Chehab 307*6cf2a73cSMauro Carvalho Chehab transaction id: 308*6cf2a73cSMauro Carvalho Chehab A 64-bit number used by userspace to help synchronise with metadata 309*6cf2a73cSMauro Carvalho Chehab from volume managers. 310*6cf2a73cSMauro Carvalho Chehab 311*6cf2a73cSMauro Carvalho Chehab used data blocks / total data blocks 312*6cf2a73cSMauro Carvalho Chehab If the number of free blocks drops below the pool's low water mark a 313*6cf2a73cSMauro Carvalho Chehab dm event will be sent to userspace. This event is edge-triggered and 314*6cf2a73cSMauro Carvalho Chehab it will occur only once after each resume so volume manager writers 315*6cf2a73cSMauro Carvalho Chehab should register for the event and then check the target's status. 316*6cf2a73cSMauro Carvalho Chehab 317*6cf2a73cSMauro Carvalho Chehab held metadata root: 318*6cf2a73cSMauro Carvalho Chehab The location, in blocks, of the metadata root that has been 319*6cf2a73cSMauro Carvalho Chehab 'held' for userspace read access. '-' indicates there is no 320*6cf2a73cSMauro Carvalho Chehab held root. 321*6cf2a73cSMauro Carvalho Chehab 322*6cf2a73cSMauro Carvalho Chehab discard_passdown|no_discard_passdown 323*6cf2a73cSMauro Carvalho Chehab Whether or not discards are actually being passed down to the 324*6cf2a73cSMauro Carvalho Chehab underlying device. When this is enabled when loading the table, 325*6cf2a73cSMauro Carvalho Chehab it can get disabled if the underlying device doesn't support it. 326*6cf2a73cSMauro Carvalho Chehab 327*6cf2a73cSMauro Carvalho Chehab ro|rw|out_of_data_space 328*6cf2a73cSMauro Carvalho Chehab If the pool encounters certain types of device failures it will 329*6cf2a73cSMauro Carvalho Chehab drop into a read-only metadata mode in which no changes to 330*6cf2a73cSMauro Carvalho Chehab the pool metadata (like allocating new blocks) are permitted. 331*6cf2a73cSMauro Carvalho Chehab 332*6cf2a73cSMauro Carvalho Chehab In serious cases where even a read-only mode is deemed unsafe 333*6cf2a73cSMauro Carvalho Chehab no further I/O will be permitted and the status will just 334*6cf2a73cSMauro Carvalho Chehab contain the string 'Fail'. The userspace recovery tools 335*6cf2a73cSMauro Carvalho Chehab should then be used. 336*6cf2a73cSMauro Carvalho Chehab 337*6cf2a73cSMauro Carvalho Chehab error_if_no_space|queue_if_no_space 338*6cf2a73cSMauro Carvalho Chehab If the pool runs out of data or metadata space, the pool will 339*6cf2a73cSMauro Carvalho Chehab either queue or error the IO destined to the data device. The 340*6cf2a73cSMauro Carvalho Chehab default is to queue the IO until more space is added or the 341*6cf2a73cSMauro Carvalho Chehab 'no_space_timeout' expires. The 'no_space_timeout' dm-thin-pool 342*6cf2a73cSMauro Carvalho Chehab module parameter can be used to change this timeout -- it 343*6cf2a73cSMauro Carvalho Chehab defaults to 60 seconds but may be disabled using a value of 0. 344*6cf2a73cSMauro Carvalho Chehab 345*6cf2a73cSMauro Carvalho Chehab needs_check 346*6cf2a73cSMauro Carvalho Chehab A metadata operation has failed, resulting in the needs_check 347*6cf2a73cSMauro Carvalho Chehab flag being set in the metadata's superblock. The metadata 348*6cf2a73cSMauro Carvalho Chehab device must be deactivated and checked/repaired before the 349*6cf2a73cSMauro Carvalho Chehab thin-pool can be made fully operational again. '-' indicates 350*6cf2a73cSMauro Carvalho Chehab needs_check is not set. 351*6cf2a73cSMauro Carvalho Chehab 352*6cf2a73cSMauro Carvalho Chehab metadata_low_watermark: 353*6cf2a73cSMauro Carvalho Chehab Value of metadata low watermark in blocks. The kernel sets this 354*6cf2a73cSMauro Carvalho Chehab value internally but userspace needs to know this value to 355*6cf2a73cSMauro Carvalho Chehab determine if an event was caused by crossing this threshold. 356*6cf2a73cSMauro Carvalho Chehab 357*6cf2a73cSMauro Carvalho Chehabiii) Messages 358*6cf2a73cSMauro Carvalho Chehab 359*6cf2a73cSMauro Carvalho Chehab create_thin <dev id> 360*6cf2a73cSMauro Carvalho Chehab Create a new thinly-provisioned device. 361*6cf2a73cSMauro Carvalho Chehab <dev id> is an arbitrary unique 24-bit identifier chosen by 362*6cf2a73cSMauro Carvalho Chehab the caller. 363*6cf2a73cSMauro Carvalho Chehab 364*6cf2a73cSMauro Carvalho Chehab create_snap <dev id> <origin id> 365*6cf2a73cSMauro Carvalho Chehab Create a new snapshot of another thinly-provisioned device. 366*6cf2a73cSMauro Carvalho Chehab <dev id> is an arbitrary unique 24-bit identifier chosen by 367*6cf2a73cSMauro Carvalho Chehab the caller. 368*6cf2a73cSMauro Carvalho Chehab <origin id> is the identifier of the thinly-provisioned device 369*6cf2a73cSMauro Carvalho Chehab of which the new device will be a snapshot. 370*6cf2a73cSMauro Carvalho Chehab 371*6cf2a73cSMauro Carvalho Chehab delete <dev id> 372*6cf2a73cSMauro Carvalho Chehab Deletes a thin device. Irreversible. 373*6cf2a73cSMauro Carvalho Chehab 374*6cf2a73cSMauro Carvalho Chehab set_transaction_id <current id> <new id> 375*6cf2a73cSMauro Carvalho Chehab Userland volume managers, such as LVM, need a way to 376*6cf2a73cSMauro Carvalho Chehab synchronise their external metadata with the internal metadata of the 377*6cf2a73cSMauro Carvalho Chehab pool target. The thin-pool target offers to store an 378*6cf2a73cSMauro Carvalho Chehab arbitrary 64-bit transaction id and return it on the target's 379*6cf2a73cSMauro Carvalho Chehab status line. To avoid races you must provide what you think 380*6cf2a73cSMauro Carvalho Chehab the current transaction id is when you change it with this 381*6cf2a73cSMauro Carvalho Chehab compare-and-swap message. 382*6cf2a73cSMauro Carvalho Chehab 383*6cf2a73cSMauro Carvalho Chehab reserve_metadata_snap 384*6cf2a73cSMauro Carvalho Chehab Reserve a copy of the data mapping btree for use by userland. 385*6cf2a73cSMauro Carvalho Chehab This allows userland to inspect the mappings as they were when 386*6cf2a73cSMauro Carvalho Chehab this message was executed. Use the pool's status command to 387*6cf2a73cSMauro Carvalho Chehab get the root block associated with the metadata snapshot. 388*6cf2a73cSMauro Carvalho Chehab 389*6cf2a73cSMauro Carvalho Chehab release_metadata_snap 390*6cf2a73cSMauro Carvalho Chehab Release a previously reserved copy of the data mapping btree. 391*6cf2a73cSMauro Carvalho Chehab 392*6cf2a73cSMauro Carvalho Chehab'thin' target 393*6cf2a73cSMauro Carvalho Chehab------------- 394*6cf2a73cSMauro Carvalho Chehab 395*6cf2a73cSMauro Carvalho Chehabi) Constructor 396*6cf2a73cSMauro Carvalho Chehab 397*6cf2a73cSMauro Carvalho Chehab :: 398*6cf2a73cSMauro Carvalho Chehab 399*6cf2a73cSMauro Carvalho Chehab thin <pool dev> <dev id> [<external origin dev>] 400*6cf2a73cSMauro Carvalho Chehab 401*6cf2a73cSMauro Carvalho Chehab pool dev: 402*6cf2a73cSMauro Carvalho Chehab the thin-pool device, e.g. /dev/mapper/my_pool or 253:0 403*6cf2a73cSMauro Carvalho Chehab 404*6cf2a73cSMauro Carvalho Chehab dev id: 405*6cf2a73cSMauro Carvalho Chehab the internal device identifier of the device to be 406*6cf2a73cSMauro Carvalho Chehab activated. 407*6cf2a73cSMauro Carvalho Chehab 408*6cf2a73cSMauro Carvalho Chehab external origin dev: 409*6cf2a73cSMauro Carvalho Chehab an optional block device outside the pool to be treated as a 410*6cf2a73cSMauro Carvalho Chehab read-only snapshot origin: reads to unprovisioned areas of the 411*6cf2a73cSMauro Carvalho Chehab thin target will be mapped to this device. 412*6cf2a73cSMauro Carvalho Chehab 413*6cf2a73cSMauro Carvalho ChehabThe pool doesn't store any size against the thin devices. If you 414*6cf2a73cSMauro Carvalho Chehabload a thin target that is smaller than you've been using previously, 415*6cf2a73cSMauro Carvalho Chehabthen you'll have no access to blocks mapped beyond the end. If you 416*6cf2a73cSMauro Carvalho Chehabload a target that is bigger than before, then extra blocks will be 417*6cf2a73cSMauro Carvalho Chehabprovisioned as and when needed. 418*6cf2a73cSMauro Carvalho Chehab 419*6cf2a73cSMauro Carvalho Chehabii) Status 420*6cf2a73cSMauro Carvalho Chehab 421*6cf2a73cSMauro Carvalho Chehab <nr mapped sectors> <highest mapped sector> 422*6cf2a73cSMauro Carvalho Chehab If the pool has encountered device errors and failed, the status 423*6cf2a73cSMauro Carvalho Chehab will just contain the string 'Fail'. The userspace recovery 424*6cf2a73cSMauro Carvalho Chehab tools should then be used. 425*6cf2a73cSMauro Carvalho Chehab 426*6cf2a73cSMauro Carvalho Chehab In the case where <nr mapped sectors> is 0, there is no highest 427*6cf2a73cSMauro Carvalho Chehab mapped sector and the value of <highest mapped sector> is unspecified. 428