1qcow2 L2/refcount cache configuration 2===================================== 3Copyright (C) 2015, 2018 Igalia, S.L. 4Author: Alberto Garcia <berto@igalia.com> 5 6This work is licensed under the terms of the GNU GPL, version 2 or 7later. See the COPYING file in the top-level directory. 8 9Introduction 10------------ 11The QEMU qcow2 driver has two caches that can improve the I/O 12performance significantly. However, setting the right cache sizes is 13not a straightforward operation. 14 15This document attempts to give an overview of the L2 and refcount 16caches, and how to configure them. 17 18Please refer to the docs/interop/qcow2.txt file for an in-depth 19technical description of the qcow2 file format. 20 21 22Clusters 23-------- 24A qcow2 file is organized in units of constant size called clusters. 25 26The cluster size is configurable, but it must be a power of two and 27its value 512 bytes or higher. QEMU currently defaults to 64 KB 28clusters, and it does not support sizes larger than 2MB. 29 30The 'qemu-img create' command supports specifying the size using the 31cluster_size option: 32 33 qemu-img create -f qcow2 -o cluster_size=128K hd.qcow2 4G 34 35 36The L2 tables 37------------- 38The qcow2 format uses a two-level structure to map the virtual disk as 39seen by the guest to the disk image in the host. These structures are 40called the L1 and L2 tables. 41 42There is one single L1 table per disk image. The table is small and is 43always kept in memory. 44 45There can be many L2 tables, depending on how much space has been 46allocated in the image. Each table is one cluster in size. In order to 47read or write data from the virtual disk, QEMU needs to read its 48corresponding L2 table to find out where that data is located. Since 49reading the table for each I/O operation can be expensive, QEMU keeps 50an L2 cache in memory to speed up disk access. 51 52The size of the L2 cache can be configured, and setting the right 53value can improve the I/O performance significantly. 54 55 56The refcount blocks 57------------------- 58The qcow2 format also maintains a reference count for each cluster. 59Reference counts are used for cluster allocation and internal 60snapshots. The data is stored in a two-level structure similar to the 61L1/L2 tables described above. 62 63The second level structures are called refcount blocks, are also one 64cluster in size and the number is also variable and dependent on the 65amount of allocated space. 66 67Each block contains a number of refcount entries. Their size (in bits) 68is a power of two and must not be higher than 64. It defaults to 16 69bits, but a different value can be set using the refcount_bits option: 70 71 qemu-img create -f qcow2 -o refcount_bits=8 hd.qcow2 4G 72 73QEMU keeps a refcount cache to speed up I/O much like the 74aforementioned L2 cache, and its size can also be configured. 75 76 77Choosing the right cache sizes 78------------------------------ 79In order to choose the cache sizes we need to know how they relate to 80the amount of allocated space. 81 82The part of the virtual disk that can be mapped by the L2 and refcount 83caches (in bytes) is: 84 85 disk_size = l2_cache_size * cluster_size / 8 86 disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits 87 88With the default values for cluster_size (64KB) and refcount_bits 89(16), this becomes: 90 91 disk_size = l2_cache_size * 8192 92 disk_size = refcount_cache_size * 32768 93 94So in order to cover n GB of disk space with the default values we 95need: 96 97 l2_cache_size = disk_size_GB * 131072 98 refcount_cache_size = disk_size_GB * 32768 99 100For example, 1MB of L2 cache is needed to cover every 8 GB of the virtual 101image size (given that the default cluster size is used): 102 103 8 GB / 8192 = 1 MB 104 105The refcount cache is 4 times the cluster size by default. With the default 106cluster size of 64 KB, it is 256 KB (262144 bytes). This is sufficient for 1078 GB of image size: 108 109 262144 * 32768 = 8 GB 110 111 112How to configure the cache sizes 113-------------------------------- 114Cache sizes can be configured using the -drive option in the 115command-line, or the 'blockdev-add' QMP command. 116 117There are three options available, and all of them take bytes: 118 119"l2-cache-size": maximum size of the L2 table cache 120"refcount-cache-size": maximum size of the refcount block cache 121"cache-size": maximum size of both caches combined 122 123There are a few things that need to be taken into account: 124 125 - Both caches must have a size that is a multiple of the cluster size 126 (or the cache entry size: see "Using smaller cache sizes" below). 127 128 - The maximum L2 cache size is 32 MB by default on Linux platforms (enough 129 for full coverage of 256 GB images, with the default cluster size). This 130 value can be modified using the "l2-cache-size" option. QEMU will not use 131 more memory than needed to hold all of the image's L2 tables, regardless 132 of this max. value. 133 On non-Linux platforms the maximal value is smaller by default (8 MB) and 134 this difference stems from the fact that on Linux the cache can be cleared 135 periodically if needed, using the "cache-clean-interval" option (see below). 136 The minimal L2 cache size is 2 clusters (or 2 cache entries, see below). 137 138 - The default (and minimum) refcount cache size is 4 clusters. 139 140 - If only "cache-size" is specified then QEMU will assign as much 141 memory as possible to the L2 cache before increasing the refcount 142 cache size. 143 144 - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size" 145 can be set simultaneously. 146 147Unlike L2 tables, refcount blocks are not used during normal I/O but 148only during allocations and internal snapshots. In most cases they are 149accessed sequentially (even during random guest I/O) so increasing the 150refcount cache size won't have any measurable effect in performance 151(this can change if you are using internal snapshots, so you may want 152to think about increasing the cache size if you use them heavily). 153 154Before QEMU 2.12 the refcount cache had a default size of 1/4 of the 155L2 cache size. This resulted in unnecessarily large caches, so now the 156refcount cache is as small as possible unless overridden by the user. 157 158 159Using smaller cache entries 160--------------------------- 161The qcow2 L2 cache stores complete tables by default. This means that 162if QEMU needs an entry from an L2 table then the whole table is read 163from disk and is kept in the cache. If the cache is full then a 164complete table needs to be evicted first. 165 166This can be inefficient with large cluster sizes since it results in 167more disk I/O and wastes more cache memory. 168 169Since QEMU 2.12 you can change the size of the L2 cache entry and make 170it smaller than the cluster size. This can be configured using the 171"l2-cache-entry-size" parameter: 172 173 -drive file=hd.qcow2,l2-cache-size=2097152,l2-cache-entry-size=4096 174 175Some things to take into account: 176 177 - The L2 cache entry size has the same restrictions as the cluster 178 size (power of two, at least 512 bytes). 179 180 - Smaller entry sizes generally improve the cache efficiency and make 181 disk I/O faster. This is particularly true with solid state drives 182 so it's a good idea to reduce the entry size in those cases. With 183 rotating hard drives the situation is a bit more complicated so you 184 should test it first and stay with the default size if unsure. 185 186 - Try different entry sizes to see which one gives faster performance 187 in your case. The block size of the host filesystem is generally a 188 good default (usually 4096 bytes in the case of ext4). 189 190 - Only the L2 cache can be configured this way. The refcount cache 191 always uses the cluster size as the entry size. 192 193 - If the L2 cache is big enough to hold all of the image's L2 tables 194 (as explained in the "Choosing the right cache sizes" and "How to 195 configure the cache sizes" sections in this document) then none of 196 this is necessary and you can omit the "l2-cache-entry-size" 197 parameter altogether. 198 199 200Reducing the memory usage 201------------------------- 202It is possible to clean unused cache entries in order to reduce the 203memory usage during periods of low I/O activity. 204 205The parameter "cache-clean-interval" defines an interval (in seconds), 206after which all the cache entries that haven't been accessed during the 207interval are removed from memory. Setting this parameter to 0 disables this 208feature. 209 210The following example removes all unused cache entries every 15 minutes: 211 212 -drive file=hd.qcow2,cache-clean-interval=900 213 214If unset, the default value for this parameter is 600 on platforms which 215support this functionality, and is 0 (disabled) on other platforms. 216 217This functionality currently relies on the MADV_DONTNEED argument for 218madvise() to actually free the memory. This is a Linux-specific feature, 219so cache-clean-interval is not supported on other systems. 220