1== General == 2 3A qcow2 image file is organized in units of constant size, which are called 4(host) clusters. A cluster is the unit in which all allocations are done, 5both for actual guest data and for image metadata. 6 7Likewise, the virtual disk as seen by the guest is divided into (guest) 8clusters of the same size. 9 10All numbers in qcow2 are stored in Big Endian byte order. 11 12 13== Header == 14 15The first cluster of a qcow2 image contains the file header: 16 17 Byte 0 - 3: magic 18 QCOW magic string ("QFI\xfb") 19 20 4 - 7: version 21 Version number (valid values are 2 and 3) 22 23 8 - 15: backing_file_offset 24 Offset into the image file at which the backing file name 25 is stored (NB: The string is not null terminated). 0 if the 26 image doesn't have a backing file. 27 28 16 - 19: backing_file_size 29 Length of the backing file name in bytes. Must not be 30 longer than 1023 bytes. Undefined if the image doesn't have 31 a backing file. 32 33 20 - 23: cluster_bits 34 Number of bits that are used for addressing an offset 35 within a cluster (1 << cluster_bits is the cluster size). 36 Must not be less than 9 (i.e. 512 byte clusters). 37 38 Note: qemu as of today has an implementation limit of 2 MB 39 as the maximum cluster size and won't be able to open images 40 with larger cluster sizes. 41 42 24 - 31: size 43 Virtual disk size in bytes 44 45 32 - 35: crypt_method 46 0 for no encryption 47 1 for AES encryption 48 49 36 - 39: l1_size 50 Number of entries in the active L1 table 51 52 40 - 47: l1_table_offset 53 Offset into the image file at which the active L1 table 54 starts. Must be aligned to a cluster boundary. 55 56 48 - 55: refcount_table_offset 57 Offset into the image file at which the refcount table 58 starts. Must be aligned to a cluster boundary. 59 60 56 - 59: refcount_table_clusters 61 Number of clusters that the refcount table occupies 62 63 60 - 63: nb_snapshots 64 Number of snapshots contained in the image 65 66 64 - 71: snapshots_offset 67 Offset into the image file at which the snapshot table 68 starts. Must be aligned to a cluster boundary. 69 70If the version is 3 or higher, the header has the following additional fields. 71For version 2, the values are assumed to be zero, unless specified otherwise 72in the description of a field. 73 74 72 - 79: incompatible_features 75 Bitmask of incompatible features. An implementation must 76 fail to open an image if an unknown bit is set. 77 78 Bit 0: Dirty bit. If this bit is set then refcounts 79 may be inconsistent, make sure to scan L1/L2 80 tables to repair refcounts before accessing the 81 image. 82 83 Bit 1: Corrupt bit. If this bit is set then any data 84 structure may be corrupt and the image must not 85 be written to (unless for regaining 86 consistency). 87 88 Bits 2-63: Reserved (set to 0) 89 90 80 - 87: compatible_features 91 Bitmask of compatible features. An implementation can 92 safely ignore any unknown bits that are set. 93 94 Bit 0: Lazy refcounts bit. If this bit is set then 95 lazy refcount updates can be used. This means 96 marking the image file dirty and postponing 97 refcount metadata updates. 98 99 Bits 1-63: Reserved (set to 0) 100 101 88 - 95: autoclear_features 102 Bitmask of auto-clear features. An implementation may only 103 write to an image with unknown auto-clear features if it 104 clears the respective bits from this field first. 105 106 Bit 0: Bitmaps extension bit 107 This bit indicates consistency for the bitmaps 108 extension data. 109 110 It is an error if this bit is set without the 111 bitmaps extension present. 112 113 If the bitmaps extension is present but this 114 bit is unset, the bitmaps extension data must be 115 considered inconsistent. 116 117 Bits 1-63: Reserved (set to 0) 118 119 96 - 99: refcount_order 120 Describes the width of a reference count block entry (width 121 in bits: refcount_bits = 1 << refcount_order). For version 2 122 images, the order is always assumed to be 4 123 (i.e. refcount_bits = 16). 124 This value may not exceed 6 (i.e. refcount_bits = 64). 125 126 100 - 103: header_length 127 Length of the header structure in bytes. For version 2 128 images, the length is always assumed to be 72 bytes. 129 130Directly after the image header, optional sections called header extensions can 131be stored. Each extension has a structure like the following: 132 133 Byte 0 - 3: Header extension type: 134 0x00000000 - End of the header extension area 135 0xE2792ACA - Backing file format name 136 0x6803f857 - Feature name table 137 0x23852875 - Bitmaps extension 138 other - Unknown header extension, can be safely 139 ignored 140 141 4 - 7: Length of the header extension data 142 143 8 - n: Header extension data 144 145 n - m: Padding to round up the header extension size to the next 146 multiple of 8. 147 148Unless stated otherwise, each header extension type shall appear at most once 149in the same image. 150 151If the image has a backing file then the backing file name should be stored in 152the remaining space between the end of the header extension area and the end of 153the first cluster. It is not allowed to store other data here, so that an 154implementation can safely modify the header and add extensions without harming 155data of compatible features that it doesn't support. Compatible features that 156need space for additional data can use a header extension. 157 158 159== Feature name table == 160 161The feature name table is an optional header extension that contains the name 162for features used by the image. It can be used by applications that don't know 163the respective feature (e.g. because the feature was introduced only later) to 164display a useful error message. 165 166The number of entries in the feature name table is determined by the length of 167the header extension data. Each entry look like this: 168 169 Byte 0: Type of feature (select feature bitmap) 170 0: Incompatible feature 171 1: Compatible feature 172 2: Autoclear feature 173 174 1: Bit number within the selected feature bitmap (valid 175 values: 0-63) 176 177 2 - 47: Feature name (padded with zeros, but not necessarily null 178 terminated if it has full length) 179 180 181== Bitmaps extension == 182 183The bitmaps extension is an optional header extension. It provides the ability 184to store bitmaps related to a virtual disk. For now, there is only one bitmap 185type: the dirty tracking bitmap, which tracks virtual disk changes from some 186point in time. 187 188The data of the extension should be considered consistent only if the 189corresponding auto-clear feature bit is set, see autoclear_features above. 190 191The fields of the bitmaps extension are: 192 193 Byte 0 - 3: nb_bitmaps 194 The number of bitmaps contained in the image. Must be 195 greater than or equal to 1. 196 197 Note: Qemu currently only supports up to 65535 bitmaps per 198 image. 199 200 4 - 7: Reserved, must be zero. 201 202 8 - 15: bitmap_directory_size 203 Size of the bitmap directory in bytes. It is the cumulative 204 size of all (nb_bitmaps) bitmap headers. 205 206 16 - 23: bitmap_directory_offset 207 Offset into the image file at which the bitmap directory 208 starts. Must be aligned to a cluster boundary. 209 210 211== Host cluster management == 212 213qcow2 manages the allocation of host clusters by maintaining a reference count 214for each host cluster. A refcount of 0 means that the cluster is free, 1 means 215that it is used, and >= 2 means that it is used and any write access must 216perform a COW (copy on write) operation. 217 218The refcounts are managed in a two-level table. The first level is called 219refcount table and has a variable size (which is stored in the header). The 220refcount table can cover multiple clusters, however it needs to be contiguous 221in the image file. 222 223It contains pointers to the second level structures which are called refcount 224blocks and are exactly one cluster in size. 225 226Given a offset into the image file, the refcount of its cluster can be obtained 227as follows: 228 229 refcount_block_entries = (cluster_size * 8 / refcount_bits) 230 231 refcount_block_index = (offset / cluster_size) % refcount_block_entries 232 refcount_table_index = (offset / cluster_size) / refcount_block_entries 233 234 refcount_block = load_cluster(refcount_table[refcount_table_index]); 235 return refcount_block[refcount_block_index]; 236 237Refcount table entry: 238 239 Bit 0 - 8: Reserved (set to 0) 240 241 9 - 63: Bits 9-63 of the offset into the image file at which the 242 refcount block starts. Must be aligned to a cluster 243 boundary. 244 245 If this is 0, the corresponding refcount block has not yet 246 been allocated. All refcounts managed by this refcount block 247 are 0. 248 249Refcount block entry (x = refcount_bits - 1): 250 251 Bit 0 - x: Reference count of the cluster. If refcount_bits implies a 252 sub-byte width, note that bit 0 means the least significant 253 bit in this context. 254 255 256== Cluster mapping == 257 258Just as for refcounts, qcow2 uses a two-level structure for the mapping of 259guest clusters to host clusters. They are called L1 and L2 table. 260 261The L1 table has a variable size (stored in the header) and may use multiple 262clusters, however it must be contiguous in the image file. L2 tables are 263exactly one cluster in size. 264 265Given a offset into the virtual disk, the offset into the image file can be 266obtained as follows: 267 268 l2_entries = (cluster_size / sizeof(uint64_t)) 269 270 l2_index = (offset / cluster_size) % l2_entries 271 l1_index = (offset / cluster_size) / l2_entries 272 273 l2_table = load_cluster(l1_table[l1_index]); 274 cluster_offset = l2_table[l2_index]; 275 276 return cluster_offset + (offset % cluster_size) 277 278L1 table entry: 279 280 Bit 0 - 8: Reserved (set to 0) 281 282 9 - 55: Bits 9-55 of the offset into the image file at which the L2 283 table starts. Must be aligned to a cluster boundary. If the 284 offset is 0, the L2 table and all clusters described by this 285 L2 table are unallocated. 286 287 56 - 62: Reserved (set to 0) 288 289 63: 0 for an L2 table that is unused or requires COW, 1 if its 290 refcount is exactly one. This information is only accurate 291 in the active L1 table. 292 293L2 table entry: 294 295 Bit 0 - 61: Cluster descriptor 296 297 62: 0 for standard clusters 298 1 for compressed clusters 299 300 63: 0 for a cluster that is unused or requires COW, 1 if its 301 refcount is exactly one. This information is only accurate 302 in L2 tables that are reachable from the active L1 303 table. 304 305Standard Cluster Descriptor: 306 307 Bit 0: If set to 1, the cluster reads as all zeros. The host 308 cluster offset can be used to describe a preallocation, 309 but it won't be used for reading data from this cluster, 310 nor is data read from the backing file if the cluster is 311 unallocated. 312 313 With version 2, this is always 0. 314 315 1 - 8: Reserved (set to 0) 316 317 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a 318 cluster boundary. If the offset is 0, the cluster is 319 unallocated. 320 321 56 - 61: Reserved (set to 0) 322 323 324Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)): 325 326 Bit 0 - x: Host cluster offset. This is usually _not_ aligned to a 327 cluster boundary! 328 329 x+1 - 61: Compressed size of the images in sectors of 512 bytes 330 331If a cluster is unallocated, read requests shall read the data from the backing 332file (except if bit 0 in the Standard Cluster Descriptor is set). If there is 333no backing file or the backing file is smaller than the image, they shall read 334zeros for all parts that are not covered by the backing file. 335 336 337== Snapshots == 338 339qcow2 supports internal snapshots. Their basic principle of operation is to 340switch the active L1 table, so that a different set of host clusters are 341exposed to the guest. 342 343When creating a snapshot, the L1 table should be copied and the refcount of all 344L2 tables and clusters reachable from this L1 table must be increased, so that 345a write causes a COW and isn't visible in other snapshots. 346 347When loading a snapshot, bit 63 of all entries in the new active L1 table and 348all L2 tables referenced by it must be reconstructed from the refcount table 349as it doesn't need to be accurate in inactive L1 tables. 350 351A directory of all snapshots is stored in the snapshot table, a contiguous area 352in the image file, whose starting offset and length are given by the header 353fields snapshots_offset and nb_snapshots. The entries of the snapshot table 354have variable length, depending on the length of ID, name and extra data. 355 356Snapshot table entry: 357 358 Byte 0 - 7: Offset into the image file at which the L1 table for the 359 snapshot starts. Must be aligned to a cluster boundary. 360 361 8 - 11: Number of entries in the L1 table of the snapshots 362 363 12 - 13: Length of the unique ID string describing the snapshot 364 365 14 - 15: Length of the name of the snapshot 366 367 16 - 19: Time at which the snapshot was taken in seconds since the 368 Epoch 369 370 20 - 23: Subsecond part of the time at which the snapshot was taken 371 in nanoseconds 372 373 24 - 31: Time that the guest was running until the snapshot was 374 taken in nanoseconds 375 376 32 - 35: Size of the VM state in bytes. 0 if no VM state is saved. 377 If there is VM state, it starts at the first cluster 378 described by first L1 table entry that doesn't describe a 379 regular guest cluster (i.e. VM state is stored like guest 380 disk content, except that it is stored at offsets that are 381 larger than the virtual disk presented to the guest) 382 383 36 - 39: Size of extra data in the table entry (used for future 384 extensions of the format) 385 386 variable: Extra data for future extensions. Unknown fields must be 387 ignored. Currently defined are (offset relative to snapshot 388 table entry): 389 390 Byte 40 - 47: Size of the VM state in bytes. 0 if no VM 391 state is saved. If this field is present, 392 the 32-bit value in bytes 32-35 is ignored. 393 394 Byte 48 - 55: Virtual disk size of the snapshot in bytes 395 396 Version 3 images must include extra data at least up to 397 byte 55. 398 399 variable: Unique ID string for the snapshot (not null terminated) 400 401 variable: Name of the snapshot (not null terminated) 402 403 variable: Padding to round up the snapshot table entry size to the 404 next multiple of 8. 405 406 407== Bitmaps == 408 409As mentioned above, the bitmaps extension provides the ability to store bitmaps 410related to a virtual disk. This section describes how these bitmaps are stored. 411 412All stored bitmaps are related to the virtual disk stored in the same image, so 413each bitmap size is equal to the virtual disk size. 414 415Each bit of the bitmap is responsible for strictly defined range of the virtual 416disk. For bit number bit_nr the corresponding range (in bytes) will be: 417 418 [bit_nr * bitmap_granularity .. (bit_nr + 1) * bitmap_granularity - 1] 419 420Granularity is a property of the concrete bitmap, see below. 421 422 423=== Bitmap directory === 424 425Each bitmap saved in the image is described in a bitmap directory entry. The 426bitmap directory is a contiguous area in the image file, whose starting offset 427and length are given by the header extension fields bitmap_directory_offset and 428bitmap_directory_size. The entries of the bitmap directory have variable 429length, depending on the lengths of the bitmap name and extra data. These 430entries are also called bitmap headers. 431 432Structure of a bitmap directory entry: 433 434 Byte 0 - 7: bitmap_table_offset 435 Offset into the image file at which the bitmap table 436 (described below) for the bitmap starts. Must be aligned to 437 a cluster boundary. 438 439 8 - 11: bitmap_table_size 440 Number of entries in the bitmap table of the bitmap. 441 442 12 - 15: flags 443 Bit 444 0: in_use 445 The bitmap was not saved correctly and may be 446 inconsistent. 447 448 1: auto 449 The bitmap must reflect all changes of the virtual 450 disk by any application that would write to this qcow2 451 file (including writes, snapshot switching, etc.). The 452 type of this bitmap must be 'dirty tracking bitmap'. 453 454 2: extra_data_compatible 455 This flags is meaningful when the extra data is 456 unknown to the software (currently any extra data is 457 unknown to Qemu). 458 If it is set, the bitmap may be used as expected, extra 459 data must be left as is. 460 If it is not set, the bitmap must not be used, but 461 both it and its extra data be left as is. 462 463 Bits 3 - 31 are reserved and must be 0. 464 465 16: type 466 This field describes the sort of the bitmap. 467 Values: 468 1: Dirty tracking bitmap 469 470 Values 0, 2 - 255 are reserved. 471 472 17: granularity_bits 473 Granularity bits. Valid values: 0 - 63. 474 475 Note: Qemu currently doesn't support granularity_bits 476 greater than 31. 477 478 Granularity is calculated as 479 granularity = 1 << granularity_bits 480 481 A bitmap's granularity is how many bytes of the image 482 accounts for one bit of the bitmap. 483 484 18 - 19: name_size 485 Size of the bitmap name. Must be non-zero. 486 487 Note: Qemu currently doesn't support values greater than 488 1023. 489 490 20 - 23: extra_data_size 491 Size of type-specific extra data. 492 493 For now, as no extra data is defined, extra_data_size is 494 reserved and should be zero. If it is non-zero the 495 behavior is defined by extra_data_compatible flag. 496 497 variable: extra_data 498 Extra data for the bitmap, occupying extra_data_size bytes. 499 Extra data must never contain references to clusters or in 500 some other way allocate additional clusters. 501 502 variable: name 503 The name of the bitmap (not null terminated), occupying 504 name_size bytes. Must be unique among all bitmap names 505 within the bitmaps extension. 506 507 variable: Padding to round up the bitmap directory entry size to the 508 next multiple of 8. All bytes of the padding must be zero. 509 510 511=== Bitmap table === 512 513Each bitmap is stored using a one-level structure (as opposed to two-level 514structures like for refcounts and guest clusters mapping) for the mapping of 515bitmap data to host clusters. This structure is called the bitmap table. 516 517Each bitmap table has a variable size (stored in the bitmap directory entry) 518and may use multiple clusters, however, it must be contiguous in the image 519file. 520 521Structure of a bitmap table entry: 522 523 Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. 524 If bits 9 - 55 are zero: 525 0: Cluster should be read as all zeros. 526 1: Cluster should be read as all ones. 527 528 1 - 8: Reserved and must be zero. 529 530 9 - 55: Bits 9 - 55 of the host cluster offset. Must be aligned to 531 a cluster boundary. If the offset is 0, the cluster is 532 unallocated; in that case, bit 0 determines how this 533 cluster should be treated during reads. 534 535 56 - 63: Reserved and must be zero. 536 537 538=== Bitmap data === 539 540As noted above, bitmap data is stored in separate clusters, described by the 541bitmap table. Given an offset (in bytes) into the bitmap data, the offset into 542the image file can be obtained as follows: 543 544 image_offset(bitmap_data_offset) = 545 bitmap_table[bitmap_data_offset / cluster_size] + 546 (bitmap_data_offset % cluster_size) 547 548This offset is not defined if bits 9 - 55 of bitmap table entry are zero (see 549above). 550 551Given an offset byte_nr into the virtual disk and the bitmap's granularity, the 552bit offset into the image file to the corresponding bit of the bitmap can be 553calculated like this: 554 555 bit_offset(byte_nr) = 556 image_offset(byte_nr / granularity / 8) * 8 + 557 (byte_nr / granularity) % 8 558 559If the size of the bitmap data is not a multiple of the cluster size then the 560last cluster of the bitmap data contains some unused tail bits. These bits must 561be zero. 562 563 564=== Dirty tracking bitmaps === 565 566Bitmaps with 'type' field equal to one are dirty tracking bitmaps. 567 568When the virtual disk is in use dirty tracking bitmap may be 'enabled' or 569'disabled'. While the bitmap is 'enabled', all writes to the virtual disk 570should be reflected in the bitmap. A set bit in the bitmap means that the 571corresponding range of the virtual disk (see above) was written to while the 572bitmap was 'enabled'. An unset bit means that this range was not written to. 573 574The software doesn't have to sync the bitmap in the image file with its 575representation in RAM after each write. Flag 'in_use' should be set while the 576bitmap is not synced. 577 578In the image file the 'enabled' state is reflected by the 'auto' flag. If this 579flag is set, the software must consider the bitmap as 'enabled' and start 580tracking virtual disk changes to this bitmap from the first write to the 581virtual disk. If this flag is not set then the bitmap is disabled. 582