1.. SPDX-License-Identifier: GPL-2.0-only
2
3========
4dm-clone
5========
6
7Introduction
8============
9
10dm-clone is a device mapper target which produces a one-to-one copy of an
11existing, read-only source device into a writable destination device: It
12presents a virtual block device which makes all data appear immediately, and
13redirects reads and writes accordingly.
14
15The main use case of dm-clone is to clone a potentially remote, high-latency,
16read-only, archival-type block device into a writable, fast, primary-type device
17for fast, low-latency I/O. The cloned device is visible/mountable immediately
18and the copy of the source device to the destination device happens in the
19background, in parallel with user I/O.
20
21For example, one could restore an application backup from a read-only copy,
22accessible through a network storage protocol (NBD, Fibre Channel, iSCSI, AoE,
23etc.), into a local SSD or NVMe device, and start using the device immediately,
24without waiting for the restore to complete.
25
26When the cloning completes, the dm-clone table can be removed altogether and be
27replaced, e.g., by a linear table, mapping directly to the destination device.
28
29The dm-clone target reuses the metadata library used by the thin-provisioning
30target.
31
32Glossary
33========
34
35   Hydration
36     The process of filling a region of the destination device with data from
37     the same region of the source device, i.e., copying the region from the
38     source to the destination device.
39
40Once a region gets hydrated we redirect all I/O regarding it to the destination
41device.
42
43Design
44======
45
46Sub-devices
47-----------
48
49The target is constructed by passing three devices to it (along with other
50parameters detailed later):
51
521. A source device - the read-only device that gets cloned and source of the
53   hydration.
54
552. A destination device - the destination of the hydration, which will become a
56   clone of the source device.
57
583. A small metadata device - it records which regions are already valid in the
59   destination device, i.e., which regions have already been hydrated, or have
60   been written to directly, via user I/O.
61
62The size of the destination device must be at least equal to the size of the
63source device.
64
65Regions
66-------
67
68dm-clone divides the source and destination devices in fixed sized regions.
69Regions are the unit of hydration, i.e., the minimum amount of data copied from
70the source to the destination device.
71
72The region size is configurable when you first create the dm-clone device. The
73recommended region size is the same as the file system block size, which usually
74is 4KB. The region size must be between 8 sectors (4KB) and 2097152 sectors
75(1GB) and a power of two.
76
77Reads and writes from/to hydrated regions are serviced from the destination
78device.
79
80A read to a not yet hydrated region is serviced directly from the source device.
81
82A write to a not yet hydrated region will be delayed until the corresponding
83region has been hydrated and the hydration of the region starts immediately.
84
85Note that a write request with size equal to region size will skip copying of
86the corresponding region from the source device and overwrite the region of the
87destination device directly.
88
89Discards
90--------
91
92dm-clone interprets a discard request to a range that hasn't been hydrated yet
93as a hint to skip hydration of the regions covered by the request, i.e., it
94skips copying the region's data from the source to the destination device, and
95only updates its metadata.
96
97If the destination device supports discards, then by default dm-clone will pass
98down discard requests to it.
99
100Background Hydration
101--------------------
102
103dm-clone copies continuously from the source to the destination device, until
104all of the device has been copied.
105
106Copying data from the source to the destination device uses bandwidth. The user
107can set a throttle to prevent more than a certain amount of copying occurring at
108any one time. Moreover, dm-clone takes into account user I/O traffic going to
109the devices and pauses the background hydration when there is I/O in-flight.
110
111A message `hydration_threshold <#regions>` can be used to set the maximum number
112of regions being copied, the default being 1 region.
113
114dm-clone employs dm-kcopyd for copying portions of the source device to the
115destination device. By default, we issue copy requests of size equal to the
116region size. A message `hydration_batch_size <#regions>` can be used to tune the
117size of these copy requests. Increasing the hydration batch size results in
118dm-clone trying to batch together contiguous regions, so we copy the data in
119batches of this many regions.
120
121When the hydration of the destination device finishes, a dm event will be sent
122to user space.
123
124Updating on-disk metadata
125-------------------------
126
127On-disk metadata is committed every time a FLUSH or FUA bio is written. If no
128such requests are made then commits will occur every second. This means the
129dm-clone device behaves like a physical disk that has a volatile write cache. If
130power is lost you may lose some recent writes. The metadata should always be
131consistent in spite of any crash.
132
133Target Interface
134================
135
136Constructor
137-----------
138
139  ::
140
141   clone <metadata dev> <destination dev> <source dev> <region size>
142         [<#feature args> [<feature arg>]* [<#core args> [<core arg>]*]]
143
144 ================ ==============================================================
145 metadata dev     Fast device holding the persistent metadata
146 destination dev  The destination device, where the source will be cloned
147 source dev       Read only device containing the data that gets cloned
148 region size      The size of a region in sectors
149
150 #feature args    Number of feature arguments passed
151 feature args     no_hydration or no_discard_passdown
152
153 #core args       An even number of arguments corresponding to key/value pairs
154                  passed to dm-clone
155 core args        Key/value pairs passed to dm-clone, e.g. `hydration_threshold
156                  256`
157 ================ ==============================================================
158
159Optional feature arguments are:
160
161 ==================== =========================================================
162 no_hydration         Create a dm-clone instance with background hydration
163                      disabled
164 no_discard_passdown  Disable passing down discards to the destination device
165 ==================== =========================================================
166
167Optional core arguments are:
168
169 ================================ ==============================================
170 hydration_threshold <#regions>   Maximum number of regions being copied from
171                                  the source to the destination device at any
172                                  one time, during background hydration.
173 hydration_batch_size <#regions>  During background hydration, try to batch
174                                  together contiguous regions, so we copy data
175                                  from the source to the destination device in
176                                  batches of this many regions.
177 ================================ ==============================================
178
179Status
180------
181
182  ::
183
184   <metadata block size> <#used metadata blocks>/<#total metadata blocks>
185   <region size> <#hydrated regions>/<#total regions> <#hydrating regions>
186   <#feature args> <feature args>* <#core args> <core args>*
187   <clone metadata mode>
188
189 ======================= =======================================================
190 metadata block size     Fixed block size for each metadata block in sectors
191 #used metadata blocks   Number of metadata blocks used
192 #total metadata blocks  Total number of metadata blocks
193 region size             Configurable region size for the device in sectors
194 #hydrated regions       Number of regions that have finished hydrating
195 #total regions          Total number of regions to hydrate
196 #hydrating regions      Number of regions currently hydrating
197 #feature args           Number of feature arguments to follow
198 feature args            Feature arguments, e.g. `no_hydration`
199 #core args              Even number of core arguments to follow
200 core args               Key/value pairs for tuning the core, e.g.
201                         `hydration_threshold 256`
202 clone metadata mode     ro if read-only, rw if read-write
203
204                         In serious cases where even a read-only mode is deemed
205                         unsafe no further I/O will be permitted and the status
206                         will just contain the string 'Fail'. If the metadata
207                         mode changes, a dm event will be sent to user space.
208 ======================= =======================================================
209
210Messages
211--------
212
213  `disable_hydration`
214      Disable the background hydration of the destination device.
215
216  `enable_hydration`
217      Enable the background hydration of the destination device.
218
219  `hydration_threshold <#regions>`
220      Set background hydration threshold.
221
222  `hydration_batch_size <#regions>`
223      Set background hydration batch size.
224
225Examples
226========
227
228Clone a device containing a file system
229---------------------------------------
230
2311. Create the dm-clone device.
232
233   ::
234
235    dmsetup create clone --table "0 1048576000 clone $metadata_dev $dest_dev \
236      $source_dev 8 1 no_hydration"
237
2382. Mount the device and trim the file system. dm-clone interprets the discards
239   sent by the file system and it will not hydrate the unused space.
240
241   ::
242
243    mount /dev/mapper/clone /mnt/cloned-fs
244    fstrim /mnt/cloned-fs
245
2463. Enable background hydration of the destination device.
247
248   ::
249
250    dmsetup message clone 0 enable_hydration
251
2524. When the hydration finishes, we can replace the dm-clone table with a linear
253   table.
254
255   ::
256
257    dmsetup suspend clone
258    dmsetup load clone --table "0 1048576000 linear $dest_dev 0"
259    dmsetup resume clone
260
261   The metadata device is no longer needed and can be safely discarded or reused
262   for other purposes.
263
264Known issues
265============
266
2671. We redirect reads, to not-yet-hydrated regions, to the source device. If
268   reading the source device has high latency and the user repeatedly reads from
269   the same regions, this behaviour could degrade performance. We should use
270   these reads as hints to hydrate the relevant regions sooner. Currently, we
271   rely on the page cache to cache these regions, so we hopefully don't end up
272   reading them multiple times from the source device.
273
2742. Release in-core resources, i.e., the bitmaps tracking which regions are
275   hydrated, after the hydration has finished.
276
2773. During background hydration, if we fail to read the source or write to the
278   destination device, we print an error message, but the hydration process
279   continues indefinitely, until it succeeds. We should stop the background
280   hydration after a number of failures and emit a dm event for user space to
281   notice.
282
283Why not...?
284===========
285
286We explored the following alternatives before implementing dm-clone:
287
2881. Use dm-cache with cache size equal to the source device and implement a new
289   cloning policy:
290
291   * The resulting cache device is not a one-to-one mirror of the source device
292     and thus we cannot remove the cache device once cloning completes.
293
294   * dm-cache writes to the source device, which violates our requirement that
295     the source device must be treated as read-only.
296
297   * Caching is semantically different from cloning.
298
2992. Use dm-snapshot with a COW device equal to the source device:
300
301   * dm-snapshot stores its metadata in the COW device, so the resulting device
302     is not a one-to-one mirror of the source device.
303
304   * No background copying mechanism.
305
306   * dm-snapshot needs to commit its metadata whenever a pending exception
307     completes, to ensure snapshot consistency. In the case of cloning, we don't
308     need to be so strict and can rely on committing metadata every time a FLUSH
309     or FUA bio is written, or periodically, like dm-thin and dm-cache do. This
310     improves the performance significantly.
311
3123. Use dm-mirror: The mirror target has a background copying/mirroring
313   mechanism, but it writes to all mirrors, thus violating our requirement that
314   the source device must be treated as read-only.
315
3164. Use dm-thin's external snapshot functionality. This approach is the most
317   promising among all alternatives, as the thinly-provisioned volume is a
318   one-to-one mirror of the source device and handles reads and writes to
319   un-provisioned/not-yet-cloned areas the same way as dm-clone does.
320
321   Still:
322
323   * There is no background copying mechanism, though one could be implemented.
324
325   * Most importantly, we want to support arbitrary block devices as the
326     destination of the cloning process and not restrict ourselves to
327     thinly-provisioned volumes. Thin-provisioning has an inherent metadata
328     overhead, for maintaining the thin volume mappings, which significantly
329     degrades performance.
330
331   Moreover, cloning a device shouldn't force the use of thin-provisioning. On
332   the other hand, if we wish to use thin provisioning, we can just use a thin
333   LV as dm-clone's destination device.
334