xref: /openbmc/qemu/docs/interop/qed_spec.txt (revision 7e56accdaf35234b69c33c85e4a44a5d56325e53)
1*d59157eaSPaolo Bonzini=Specification=
2*d59157eaSPaolo Bonzini
3*d59157eaSPaolo BonziniThe file format looks like this:
4*d59157eaSPaolo Bonzini
5*d59157eaSPaolo Bonzini +----------+----------+----------+-----+
6*d59157eaSPaolo Bonzini | cluster0 | cluster1 | cluster2 | ... |
7*d59157eaSPaolo Bonzini +----------+----------+----------+-----+
8*d59157eaSPaolo Bonzini
9*d59157eaSPaolo BonziniThe first cluster begins with the '''header'''.  The header contains information about where regular clusters start; this allows the header to be extensible and store extra information about the image file.  A regular cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''.  L1 and L2 tables are composed of one or more contiguous clusters.
10*d59157eaSPaolo Bonzini
11*d59157eaSPaolo BonziniNormally the file size will be a multiple of the cluster size.  If the file size is not a multiple, extra information after the last cluster may not be preserved if data is written.  Legitimate extra information should use space between the header and the first regular cluster.
12*d59157eaSPaolo Bonzini
13*d59157eaSPaolo BonziniAll fields are little-endian.
14*d59157eaSPaolo Bonzini
15*d59157eaSPaolo Bonzini==Header==
16*d59157eaSPaolo Bonzini Header {
17*d59157eaSPaolo Bonzini     uint32_t magic;               /* QED\0 */
18*d59157eaSPaolo Bonzini
19*d59157eaSPaolo Bonzini     uint32_t cluster_size;        /* in bytes */
20*d59157eaSPaolo Bonzini     uint32_t table_size;          /* for L1 and L2 tables, in clusters */
21*d59157eaSPaolo Bonzini     uint32_t header_size;         /* in clusters */
22*d59157eaSPaolo Bonzini
23*d59157eaSPaolo Bonzini     uint64_t features;            /* format feature bits */
24*d59157eaSPaolo Bonzini     uint64_t compat_features;     /* compat feature bits */
25*d59157eaSPaolo Bonzini     uint64_t autoclear_features;  /* self-resetting feature bits */
26*d59157eaSPaolo Bonzini
27*d59157eaSPaolo Bonzini     uint64_t l1_table_offset;     /* in bytes */
28*d59157eaSPaolo Bonzini     uint64_t image_size;          /* total logical image size, in bytes */
29*d59157eaSPaolo Bonzini
30*d59157eaSPaolo Bonzini     /* if (features & QED_F_BACKING_FILE) */
31*d59157eaSPaolo Bonzini     uint32_t backing_filename_offset; /* in bytes from start of header */
32*d59157eaSPaolo Bonzini     uint32_t backing_filename_size;   /* in bytes */
33*d59157eaSPaolo Bonzini }
34*d59157eaSPaolo Bonzini
35*d59157eaSPaolo BonziniField descriptions:
36*d59157eaSPaolo Bonzini* ''cluster_size'' must be a power of 2 in range [2^12, 2^26].
37*d59157eaSPaolo Bonzini* ''table_size'' must be a power of 2 in range [1, 16].
38*d59157eaSPaolo Bonzini* ''header_size'' is the number of clusters used by the header and any additional information stored before regular clusters.
39*d59157eaSPaolo Bonzini* ''features'', ''compat_features'', and ''autoclear_features'' are file format extension bitmaps.  They work as follows:
40*d59157eaSPaolo Bonzini** An image with unknown ''features'' bits enabled must not be opened.  File format changes that are not backwards-compatible must use ''features'' bits.
41*d59157eaSPaolo Bonzini** An image with unknown ''compat_features'' bits enabled can be opened safely.  The unknown features are simply ignored and represent backwards-compatible changes to the file format.
42*d59157eaSPaolo Bonzini** An image with unknown ''autoclear_features'' bits enable can be opened safely after clearing the unknown bits.  This allows for backwards-compatible changes to the file format which degrade gracefully and can be re-enabled again by a new program later.
43*d59157eaSPaolo Bonzini* ''l1_table_offset'' is the offset of the first byte of the L1 table in the image file and must be a multiple of ''cluster_size''.
44*d59157eaSPaolo Bonzini* ''image_size'' is the block device size seen by the guest and must be a multiple of 512 bytes.
45*d59157eaSPaolo Bonzini* ''backing_filename_offset'' and ''backing_filename_size'' describe a string in (byte offset, byte size) form.  It is not NUL-terminated and has no alignment constraints.  The string must be stored within the first ''header_size'' clusters.  The backing filename may be an absolute path or relative to the image file.
46*d59157eaSPaolo Bonzini
47*d59157eaSPaolo BonziniFeature bits:
48*d59157eaSPaolo Bonzini* QED_F_BACKING_FILE = 0x01.  The image uses a backing file.
49*d59157eaSPaolo Bonzini* QED_F_NEED_CHECK = 0x02.  The image needs a consistency check before use.
50*d59157eaSPaolo Bonzini* QED_F_BACKING_FORMAT_NO_PROBE = 0x04.  The backing file is a raw disk image and no file format autodetection should be attempted.  This should be used to ensure that raw backing files are never detected as an image format if they happen to contain magic constants.
51*d59157eaSPaolo Bonzini
52*d59157eaSPaolo BonziniThere are currently no defined ''compat_features'' or ''autoclear_features'' bits.
53*d59157eaSPaolo Bonzini
54*d59157eaSPaolo BonziniFields predicated on a feature bit are only used when that feature is set.  The fields always take up header space, regardless of whether or not the feature bit is set.
55*d59157eaSPaolo Bonzini
56*d59157eaSPaolo Bonzini==Tables==
57*d59157eaSPaolo Bonzini
58*d59157eaSPaolo BonziniTables provide the translation from logical offsets in the block device to cluster offsets in the file.
59*d59157eaSPaolo Bonzini
60*d59157eaSPaolo Bonzini #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
61*d59157eaSPaolo Bonzini
62*d59157eaSPaolo Bonzini Table {
63*d59157eaSPaolo Bonzini     uint64_t offsets[TABLE_NOFFSETS];
64*d59157eaSPaolo Bonzini }
65*d59157eaSPaolo Bonzini
66*d59157eaSPaolo BonziniThe tables are organized as follows:
67*d59157eaSPaolo Bonzini
68*d59157eaSPaolo Bonzini                    +----------+
69*d59157eaSPaolo Bonzini                    | L1 table |
70*d59157eaSPaolo Bonzini                    +----------+
71*d59157eaSPaolo Bonzini               ,------'  |  '------.
72*d59157eaSPaolo Bonzini          +----------+   |    +----------+
73*d59157eaSPaolo Bonzini          | L2 table |  ...   | L2 table |
74*d59157eaSPaolo Bonzini          +----------+        +----------+
75*d59157eaSPaolo Bonzini      ,------'  |  '------.
76*d59157eaSPaolo Bonzini +----------+   |    +----------+
77*d59157eaSPaolo Bonzini |   Data   |  ...   |   Data   |
78*d59157eaSPaolo Bonzini +----------+        +----------+
79*d59157eaSPaolo Bonzini
80*d59157eaSPaolo BonziniA table is made up of one or more contiguous clusters.  The table_size header field determines table size for an image file.  For example, cluster_size=64 KB and table_size=4 results in 256 KB tables.
81*d59157eaSPaolo Bonzini
82*d59157eaSPaolo BonziniThe logical image size must be less than or equal to the maximum possible size of clusters rooted by the L1 table:
83*d59157eaSPaolo Bonzini header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size
84*d59157eaSPaolo Bonzini
85*d59157eaSPaolo BonziniL1, L2, and data cluster offsets must be aligned to header.cluster_size.  The following offsets have special meanings:
86*d59157eaSPaolo Bonzini
87*d59157eaSPaolo Bonzini===L2 table offsets===
88*d59157eaSPaolo Bonzini* 0 - unallocated.  The L2 table is not yet allocated.
89*d59157eaSPaolo Bonzini
90*d59157eaSPaolo Bonzini===Data cluster offsets===
91*d59157eaSPaolo Bonzini* 0 - unallocated.  The data cluster is not yet allocated.
92*d59157eaSPaolo Bonzini* 1 - zero.  The data cluster contents are all zeroes and no cluster is allocated.
93*d59157eaSPaolo Bonzini
94*d59157eaSPaolo BonziniFuture format extensions may wish to store per-offset information.  The least significant 12 bits of an offset are reserved for this purpose and must be set to zero.  Image files with cluster_size > 2^12 will have more unused bits which should also be zeroed.
95*d59157eaSPaolo Bonzini
96*d59157eaSPaolo Bonzini===Unallocated L2 tables and data clusters===
97*d59157eaSPaolo BonziniReads to an unallocated area of the image file access the backing file.  If there is no backing file, then zeroes are produced.  The backing file may be smaller than the image file and reads of unallocated areas beyond the end of the backing file produce zeroes.
98*d59157eaSPaolo Bonzini
99*d59157eaSPaolo BonziniWrites to an unallocated area cause a new data clusters to be allocated, and a new L2 table if that is also unallocated.  The new data cluster is populated with data from the backing file (or zeroes if no backing file) and the data being written.
100*d59157eaSPaolo Bonzini
101*d59157eaSPaolo Bonzini===Zero data clusters===
102*d59157eaSPaolo BonziniZero data clusters are a space-efficient way of storing zeroed regions of the image.
103*d59157eaSPaolo Bonzini
104*d59157eaSPaolo BonziniReads to a zero data cluster produce zeroes.  Note that the difference between an unallocated and a zero data cluster is that zero data clusters stop the reading of contents from the backing file.
105*d59157eaSPaolo Bonzini
106*d59157eaSPaolo BonziniWrites to a zero data cluster cause a new data cluster to be allocated.  The new data cluster is populated with zeroes and the data being written.
107*d59157eaSPaolo Bonzini
108*d59157eaSPaolo Bonzini===Logical offset translation===
109*d59157eaSPaolo BonziniLogical offsets are translated into cluster offsets as follows:
110*d59157eaSPaolo Bonzini
111*d59157eaSPaolo Bonzini  table_bits table_bits    cluster_bits
112*d59157eaSPaolo Bonzini  <--------> <--------> <--------------->
113*d59157eaSPaolo Bonzini +----------+----------+-----------------+
114*d59157eaSPaolo Bonzini | L1 index | L2 index |     byte offset |
115*d59157eaSPaolo Bonzini +----------+----------+-----------------+
116*d59157eaSPaolo Bonzini
117*d59157eaSPaolo Bonzini       Structure of a logical offset
118*d59157eaSPaolo Bonzini
119*d59157eaSPaolo Bonzini offset_mask = ~(cluster_size - 1) # mask for the image file byte offset
120*d59157eaSPaolo Bonzini
121*d59157eaSPaolo Bonzini def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
122*d59157eaSPaolo Bonzini   l2_offset = l1_table[l1_index]
123*d59157eaSPaolo Bonzini   l2_table = load_table(l2_offset)
124*d59157eaSPaolo Bonzini   cluster_offset = l2_table[l2_index] & offset_mask
125*d59157eaSPaolo Bonzini   return cluster_offset + byte_offset
126*d59157eaSPaolo Bonzini
127*d59157eaSPaolo Bonzini==Consistency checking==
128*d59157eaSPaolo Bonzini
129*d59157eaSPaolo BonziniThis section is informational and included to provide background on the use of the QED_F_NEED_CHECK ''features'' bit.
130*d59157eaSPaolo Bonzini
131*d59157eaSPaolo BonziniThe QED_F_NEED_CHECK bit is used to mark an image as dirty before starting an operation that could leave the image in an inconsistent state if interrupted by a crash or power failure.  A dirty image must be checked on open because its metadata may not be consistent.
132*d59157eaSPaolo Bonzini
133*d59157eaSPaolo BonziniConsistency check includes the following invariants:
134*d59157eaSPaolo Bonzini# Each cluster is referenced once and only once.  It is an inconsistency to have a cluster referenced more than once by L1 or L2 tables.  A cluster has been leaked if it has no references.
135*d59157eaSPaolo Bonzini# Offsets must be within the image file size and must be ''cluster_size'' aligned.
136*d59157eaSPaolo Bonzini# Table offsets must at least ''table_size'' * ''cluster_size'' bytes from the end of the image file so that there is space for the entire table.
137*d59157eaSPaolo Bonzini
138*d59157eaSPaolo BonziniThe consistency check process starts by from ''l1_table_offset'' and scans all L2 tables.  After the check completes with no other errors besides leaks, the QED_F_NEED_CHECK bit can be cleared and the image can be accessed.
139