1Use multiple thread (de)compression in live migration 2===================================================== 3Copyright (C) 2015 Intel Corporation 4Author: Liang Li <liang.z.li@intel.com> 5 6This work is licensed under the terms of the GNU GPLv2 or later. See 7the COPYING file in the top-level directory. 8 9Contents: 10========= 11* Introduction 12* When to use 13* Performance 14* Usage 15* TODO 16 17Introduction 18============ 19Instead of sending the guest memory directly, this solution will 20compress the RAM page before sending; after receiving, the data will 21be decompressed. Using compression in live migration can help 22to reduce the data transferred about 60%, this is very useful when the 23bandwidth is limited, and the total migration time can also be reduced 24about 70% in a typical case. In addition to this, the VM downtime can be 25reduced about 50%. The benefit depends on data's compressibility in VM. 26 27The process of compression will consume additional CPU cycles, and the 28extra CPU cycles will increase the migration time. On the other hand, 29the amount of data transferred will decrease; this factor can reduce 30the total migration time. If the process of the compression is quick 31enough, then the total migration time can be reduced, and multiple 32thread compression can be used to accelerate the compression process. 33 34The decompression speed of Zlib is at least 4 times as quick as 35compression, if the source and destination CPU have equal speed, 36keeping the compression thread count 4 times the decompression 37thread count can avoid resource waste. 38 39Compression level can be used to control the compression speed and the 40compression ratio. High compression ratio will take more time, level 0 41stands for no compression, level 1 stands for the best compression 42speed, and level 9 stands for the best compression ratio. Users can 43select a level number between 0 and 9. 44 45 46When to use the multiple thread compression in live migration 47============================================================= 48Compression of data will consume extra CPU cycles; so in a system with 49high overhead of CPU, avoid using this feature. When the network 50bandwidth is very limited and the CPU resource is adequate, use of 51multiple thread compression will be very helpful. If both the CPU and 52the network bandwidth are adequate, use of multiple thread compression 53can still help to reduce the migration time. 54 55Performance 56=========== 57Test environment: 58 59CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz 60Socket Count: 2 61RAM: 128G 62NIC: Intel I350 (10/100/1000Mbps) 63Host OS: CentOS 7 64-bit 64Guest OS: RHEL 6.5 64-bit 65Parameter: qemu-system-x86_64 -enable-kvm -smp 4 -m 4096 66 /share/ia32e_rhel6u5.qcow -monitor stdio 67 68There is no additional application is running on the guest when doing 69the test. 70 71 72Speed limit: 1000Gb/s 73--------------------------------------------------------------- 74 | original | compress thread: 8 75 | way | decompress thread: 2 76 | | compression level: 1 77--------------------------------------------------------------- 78total time(msec): | 3333 | 1833 79--------------------------------------------------------------- 80downtime(msec): | 100 | 27 81--------------------------------------------------------------- 82transferred ram(kB):| 363536 | 107819 83--------------------------------------------------------------- 84throughput(mbps): | 893.73 | 482.22 85--------------------------------------------------------------- 86total ram(kB): | 4211524 | 4211524 87--------------------------------------------------------------- 88 89There is an application running on the guest which write random numbers 90to RAM block areas periodically. 91 92Speed limit: 1000Gb/s 93--------------------------------------------------------------- 94 | original | compress thread: 8 95 | way | decompress thread: 2 96 | | compression level: 1 97--------------------------------------------------------------- 98total time(msec): | 37369 | 15989 99--------------------------------------------------------------- 100downtime(msec): | 337 | 173 101--------------------------------------------------------------- 102transferred ram(kB):| 4274143 | 1699824 103--------------------------------------------------------------- 104throughput(mbps): | 936.99 | 870.95 105--------------------------------------------------------------- 106total ram(kB): | 4211524 | 4211524 107--------------------------------------------------------------- 108 109Usage 110===== 1111. Verify both the source and destination QEMU are able 112to support the multiple thread compression migration: 113 {qemu} info_migrate_capabilities 114 {qemu} ... compress: off ... 115 1162. Activate compression on the source: 117 {qemu} migrate_set_capability compress on 118 1193. Set the compression thread count on source: 120 {qemu} migrate_set_parameter compress_threads 12 121 1224. Set the compression level on the source: 123 {qemu} migrate_set_parameter compress_level 1 124 1255. Set the decompression thread count on destination: 126 {qemu} migrate_set_parameter decompress_threads 3 127 1286. Start outgoing migration: 129 {qemu} migrate -d tcp:destination.host:4444 130 {qemu} info migrate 131 Capabilities: ... compress: on 132 ... 133 134The following are the default settings: 135 compress: off 136 compress_threads: 8 137 decompress_threads: 2 138 compress_level: 1 (which means best speed) 139 140So, only the first two steps are required to use the multiple 141thread compression in migration. You can do more if the default 142settings are not appropriate. 143 144TODO 145==== 146Some faster (de)compression method such as LZ4 and Quicklz can help 147to reduce the CPU consumption when doing (de)compression. If using 148these faster (de)compression method, less (de)compression threads 149are needed when doing the migration. 150