xref: /openbmc/qemu/docs/xbzrle.txt (revision ec150c7e)
1XBZRLE (Xor Based Zero Run Length Encoding)
2===========================================
3
4Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction
5of VM downtime and the total live-migration time of Virtual machines.
6It is particularly useful for virtual machines running memory write intensive
7workloads that are typical of large enterprise applications such as SAP ERP
8Systems, and generally speaking for any application that uses a sparse memory
9update pattern.
10
11Instead of sending the changed guest memory page this solution will send a
12compressed version of the updates, thus reducing the amount of data sent during
13live migration.
14In order to be able to calculate the update, the previous memory pages need to
15be stored on the source. Those pages are stored in a dedicated cache
16(hash table) and are accessed by their address.
17The larger the cache size the better the chances are that the page has already
18been stored in the cache.
19A small cache size will result in high cache miss rate.
20Cache size can be changed before and during migration.
21
22Format
23=======
24
25The compression format performs a XOR between the previous and current content
26of the page, where zero represents an unchanged value.
27The page data delta is represented by zero and non zero runs.
28A zero run is represented by its length (in bytes).
29A non zero run is represented by its length (in bytes) and the new data.
30The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
31
32There can be more than one valid encoding, the sender may send a longer encoding
33for the benefit of reducing computation cost.
34
35page = zrun nzrun
36       | zrun nzrun page
37
38zrun = length
39
40nzrun = length byte...
41
42length = uleb128 encoded integer
43
44On the sender side XBZRLE is used as a compact delta encoding of page updates,
45retrieving the old page content from the cache (default size of 64MB). The
46receiving side uses the existing page's content and XBZRLE to decode the new
47page's content.
48
49This work was originally based on research results published
50VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live
51Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth.
52Additionally the delta encoder XBRLE was improved further using the XBZRLE
53instead.
54
55XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
56ideal for in-line, real-time encoding such as is needed for live-migration.
57
58Example
59old buffer:
601001 zeros
6105 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d
623074 zeros
63
64new buffer:
651001 zeros
6601 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69
673074 zeros
68
69encoded buffer:
70
71encoded length 24
72e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69
73
74Cache update strategy
75=====================
76Keeping the hot pages in the cache is effective for decreasing cache
77misses. XBZRLE uses a counter as the age of each page. The counter will
78increase after each ram dirty bitmap sync. When a cache conflict is
79detected, XBZRLE will only evict pages in the cache that are older than
80a threshold.
81
82Usage
83======================
841. Verify the destination QEMU version is able to decode the new format.
85    {qemu} info migrate_capabilities
86    {qemu} xbzrle: off , ...
87
882. Activate xbzrle on both source and destination:
89   {qemu} migrate_set_capability xbzrle on
90
913. Set the XBZRLE cache size - the cache size is in MBytes and should be a
92power of 2. The cache default value is 64MBytes. (on source only)
93    {qemu} migrate_set_cache_size 256m
94
954. Start outgoing migration
96    {qemu} migrate -d tcp:destination.host:4444
97    {qemu} info migrate
98    capabilities: xbzrle: on
99    Migration status: active
100    transferred ram: A kbytes
101    remaining ram: B kbytes
102    total ram: C kbytes
103    total time: D milliseconds
104    duplicate: E pages
105    normal: F pages
106    normal bytes: G kbytes
107    cache size: H bytes
108    xbzrle transferred: I kbytes
109    xbzrle pages: J pages
110    xbzrle cache miss: K
111    xbzrle overflow : L
112
113xbzrle cache-miss: the number of cache misses to date - high cache-miss rate
114indicates that the cache size is set too low.
115xbzrle overflow: the number of overflows in the decoding which where the delta
116could not be compressed. This can happen if the changes in the pages are too
117large or there are many short changes; for example, changing every second byte
118(half a page).
119
120Testing: Testing indicated that live migration with XBZRLE was completed in 110
121seconds, whereas without it would not be able to complete.
122
123A simple synthetic memory r/w load generator:
124..    include <stdlib.h>
125..    include <stdio.h>
126..    int main()
127..    {
128..        char *buf = (char *) calloc(4096, 4096);
129..        while (1) {
130..            int i;
131..            for (i = 0; i < 4096 * 4; i++) {
132..                buf[i * 4096 / 4]++;
133..            }
134..            printf(".");
135..        }
136..    }
137