1263170e6SLiang LiUse multiple thread (de)compression in live migration 2263170e6SLiang Li===================================================== 3263170e6SLiang LiCopyright (C) 2015 Intel Corporation 4263170e6SLiang LiAuthor: Liang Li <liang.z.li@intel.com> 5263170e6SLiang Li 6263170e6SLiang LiThis work is licensed under the terms of the GNU GPLv2 or later. See 7263170e6SLiang Lithe COPYING file in the top-level directory. 8263170e6SLiang Li 9263170e6SLiang LiContents: 10263170e6SLiang Li========= 11263170e6SLiang Li* Introduction 12263170e6SLiang Li* When to use 13263170e6SLiang Li* Performance 14263170e6SLiang Li* Usage 15263170e6SLiang Li* TODO 16263170e6SLiang Li 17263170e6SLiang LiIntroduction 18263170e6SLiang Li============ 19263170e6SLiang LiInstead of sending the guest memory directly, this solution will 20263170e6SLiang Licompress the RAM page before sending; after receiving, the data will 21263170e6SLiang Libe decompressed. Using compression in live migration can help 22263170e6SLiang Lito reduce the data transferred about 60%, this is very useful when the 23263170e6SLiang Libandwidth is limited, and the total migration time can also be reduced 24263170e6SLiang Liabout 70% in a typical case. In addition to this, the VM downtime can be 25263170e6SLiang Lireduced about 50%. The benefit depends on data's compressibility in VM. 26263170e6SLiang Li 27263170e6SLiang LiThe process of compression will consume additional CPU cycles, and the 28263170e6SLiang Liextra CPU cycles will increase the migration time. On the other hand, 29263170e6SLiang Lithe amount of data transferred will decrease; this factor can reduce 30263170e6SLiang Lithe total migration time. If the process of the compression is quick 31263170e6SLiang Lienough, then the total migration time can be reduced, and multiple 32263170e6SLiang Lithread compression can be used to accelerate the compression process. 33263170e6SLiang Li 34263170e6SLiang LiThe decompression speed of Zlib is at least 4 times as quick as 35263170e6SLiang Licompression, if the source and destination CPU have equal speed, 36263170e6SLiang Likeeping the compression thread count 4 times the decompression 37263170e6SLiang Lithread count can avoid resource waste. 38263170e6SLiang Li 39263170e6SLiang LiCompression level can be used to control the compression speed and the 40263170e6SLiang Licompression ratio. High compression ratio will take more time, level 0 41263170e6SLiang Listands for no compression, level 1 stands for the best compression 42263170e6SLiang Lispeed, and level 9 stands for the best compression ratio. Users can 43263170e6SLiang Liselect a level number between 0 and 9. 44263170e6SLiang Li 45263170e6SLiang Li 46263170e6SLiang LiWhen to use the multiple thread compression in live migration 47263170e6SLiang Li============================================================= 48263170e6SLiang LiCompression of data will consume extra CPU cycles; so in a system with 49263170e6SLiang Lihigh overhead of CPU, avoid using this feature. When the network 50263170e6SLiang Libandwidth is very limited and the CPU resource is adequate, use of 51263170e6SLiang Limultiple thread compression will be very helpful. If both the CPU and 52263170e6SLiang Lithe network bandwidth are adequate, use of multiple thread compression 53263170e6SLiang Lican still help to reduce the migration time. 54263170e6SLiang Li 55263170e6SLiang LiPerformance 56263170e6SLiang Li=========== 57263170e6SLiang LiTest environment: 58263170e6SLiang Li 59263170e6SLiang LiCPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz 60263170e6SLiang LiSocket Count: 2 61263170e6SLiang LiRAM: 128G 62263170e6SLiang LiNIC: Intel I350 (10/100/1000Mbps) 63263170e6SLiang LiHost OS: CentOS 7 64-bit 64263170e6SLiang LiGuest OS: RHEL 6.5 64-bit 65a1d30f28SThomas HuthParameter: qemu-system-x86_64 -accel kvm -smp 4 -m 4096 66263170e6SLiang Li /share/ia32e_rhel6u5.qcow -monitor stdio 67263170e6SLiang Li 68263170e6SLiang LiThere is no additional application is running on the guest when doing 69263170e6SLiang Lithe test. 70263170e6SLiang Li 71263170e6SLiang Li 72263170e6SLiang LiSpeed limit: 1000Gb/s 73263170e6SLiang Li--------------------------------------------------------------- 74263170e6SLiang Li | original | compress thread: 8 75263170e6SLiang Li | way | decompress thread: 2 76263170e6SLiang Li | | compression level: 1 77263170e6SLiang Li--------------------------------------------------------------- 78263170e6SLiang Litotal time(msec): | 3333 | 1833 79263170e6SLiang Li--------------------------------------------------------------- 80263170e6SLiang Lidowntime(msec): | 100 | 27 81263170e6SLiang Li--------------------------------------------------------------- 82263170e6SLiang Litransferred ram(kB):| 363536 | 107819 83263170e6SLiang Li--------------------------------------------------------------- 84263170e6SLiang Lithroughput(mbps): | 893.73 | 482.22 85263170e6SLiang Li--------------------------------------------------------------- 86263170e6SLiang Litotal ram(kB): | 4211524 | 4211524 87263170e6SLiang Li--------------------------------------------------------------- 88263170e6SLiang Li 89263170e6SLiang LiThere is an application running on the guest which write random numbers 90263170e6SLiang Lito RAM block areas periodically. 91263170e6SLiang Li 92263170e6SLiang LiSpeed limit: 1000Gb/s 93263170e6SLiang Li--------------------------------------------------------------- 94263170e6SLiang Li | original | compress thread: 8 95263170e6SLiang Li | way | decompress thread: 2 96263170e6SLiang Li | | compression level: 1 97263170e6SLiang Li--------------------------------------------------------------- 98263170e6SLiang Litotal time(msec): | 37369 | 15989 99263170e6SLiang Li--------------------------------------------------------------- 100263170e6SLiang Lidowntime(msec): | 337 | 173 101263170e6SLiang Li--------------------------------------------------------------- 102263170e6SLiang Litransferred ram(kB):| 4274143 | 1699824 103263170e6SLiang Li--------------------------------------------------------------- 104263170e6SLiang Lithroughput(mbps): | 936.99 | 870.95 105263170e6SLiang Li--------------------------------------------------------------- 106263170e6SLiang Litotal ram(kB): | 4211524 | 4211524 107263170e6SLiang Li--------------------------------------------------------------- 108263170e6SLiang Li 109263170e6SLiang LiUsage 110263170e6SLiang Li===== 111263170e6SLiang Li1. Verify both the source and destination QEMU are able 112263170e6SLiang Lito support the multiple thread compression migration: 113aa5982e0SWei Jiangang {qemu} info migrate_capabilities 114263170e6SLiang Li {qemu} ... compress: off ... 115263170e6SLiang Li 116263170e6SLiang Li2. Activate compression on the source: 117263170e6SLiang Li {qemu} migrate_set_capability compress on 118263170e6SLiang Li 119263170e6SLiang Li3. Set the compression thread count on source: 120*6356785dSMarkus Armbruster {qemu} migrate_set_parameter compress-threads 12 121263170e6SLiang Li 122263170e6SLiang Li4. Set the compression level on the source: 123*6356785dSMarkus Armbruster {qemu} migrate_set_parameter compress-level 1 124263170e6SLiang Li 125263170e6SLiang Li5. Set the decompression thread count on destination: 126*6356785dSMarkus Armbruster {qemu} migrate_set_parameter decompress-threads 3 127263170e6SLiang Li 128263170e6SLiang Li6. Start outgoing migration: 129263170e6SLiang Li {qemu} migrate -d tcp:destination.host:4444 130263170e6SLiang Li {qemu} info migrate 131263170e6SLiang Li Capabilities: ... compress: on 132263170e6SLiang Li ... 133263170e6SLiang Li 134263170e6SLiang LiThe following are the default settings: 135263170e6SLiang Li compress: off 136*6356785dSMarkus Armbruster compress-threads: 8 137*6356785dSMarkus Armbruster decompress-threads: 2 138*6356785dSMarkus Armbruster compress-level: 1 (which means best speed) 139263170e6SLiang Li 140263170e6SLiang LiSo, only the first two steps are required to use the multiple 141263170e6SLiang Lithread compression in migration. You can do more if the default 142263170e6SLiang Lisettings are not appropriate. 143263170e6SLiang Li 144263170e6SLiang LiTODO 145263170e6SLiang Li==== 146263170e6SLiang LiSome faster (de)compression method such as LZ4 and Quicklz can help 147263170e6SLiang Lito reduce the CPU consumption when doing (de)compression. If using 148263170e6SLiang Lithese faster (de)compression method, less (de)compression threads 149263170e6SLiang Liare needed when doing the migration. 150