Lines Matching +full:keep +full:- +full:a +full:- +full:live
2 RDMA Live Migration Specification, Version # 1
31 data copies by bypassing the host networking stack. In particular, a TCP-based
32 migration, under certain types of memory-bound workloads, may take a more
34 memory tracked during each live migration iteration round cannot keep pace
38 over Converged Ethernet) as well as Infiniband-based. This implementation of
47 for a working build of QEMU to run successfully using RDMA Migration.
56 of RDMA migration may in fact be harmful to co-located VMs or other
65 bulk-phase round of the migration and can be enabled for extremely
66 high-performance RDMA hardware using the following command:
69 $ migrate_set_capability rdma-pin-all on # disabled by default
92 $ migrate_set_parameter max-bandwidth 40g # or whatever is the MAX of your RDMA device
96 qemu ..... -incoming rdma:host:port
101 $ migrate -d rdma:host:port
106 Here is a brief summary of total migration time and downtime using RDMA:
107 Using a 40gbps infiniband link performing a worst-case stress test,
111 $ apt-get install stress
112 $ stress --vm-bytes 7500M --vm 1 --vm-keep
123 1. rdma-pin-all disabled total time: approximately 7.5 seconds @ 9.5 Gbps
124 2. rdma-pin-all enabled total time: approximately 4 seconds @ 26 Gbps
132 the bulk round and does not need to be re-registered during the successive
141 2. Everything else (a control channel is introduced)
143 "Everything else" is transmitted using a formal
148 The only difference between a SEND message and an RDMA
170 receiver must have reserved space (using a receive
173 a control transport for migration of device state.
176 as follows (migration-rdma.c):
185 At this point, we define a control channel on top of SEND messages
186 which is described by a formal protocol. Each SEND message has a
187 header portion and a data portion (but together are transmitted
188 as a single SEND message).
196 in a single message without any need to change the protocol itself
201 The maximum number of repeats is hard-coded to 4096. This is a conservative
202 limit based on the maximum size of a SEND message along with empirical
208 3. Ready (control-channel is available)
209 4. QEMU File (for sending non-live device state)
219 A single control message, as hinted above, can contain within the data
226 After ram block exchange is completed, we have two protocol-level
227 functions, responsible for communicating control-channel commands
234 1. We transmit a READY command to let the sender know that
238 3. Block on a CQ event channel and wait for the SEND to arrive.
240 5. Verify that the command-type and version received matches the one we expected.
244 1. Block on the CQ event channel waiting for a READY command
247 2. Optionally: if we are expecting a response from the command
249 work request to receive that data a few moments later.
251 unblock us and we immediately post a RQ work request
255 5. Optionally, if we are expecting a response (as before),
268 a description of each RAMBlock on the server side as well as the virtual addresses
272 2. During runtime, once a 'chunk' becomes full of pages ready to
277 when transmitting non-live state, such as devices or to send
279 4. Finally, zero pages are only checked if a page has not yet been registered
284 zero, then we send a compress command to zap the page on the other side.
294 librdmacm provides the user with a 'private data' area to be exchanged
295 at connection-setup time before any infiniband traffic is generated.
308 This private data area is a convenient place to check for protocol
310 transmit a few bytes of version information.
312 This is also a convenient place to negotiate capabilities
322 Finally: Negotiation happens with the Flags field: If the primary-VM
323 sets a flag, but the destination does not support this capability, it
324 will return a zero-bit for that flag and the primary-VM will understand
326 capability on the primary-VM side.
331 QEMUFileRDMA introduces a couple of new functions:
337 describe above to deliver bytes without changing the upper-level
338 users of QEMUFile that depend on a bytestream abstraction.
342 Again, because we're trying to "fake" a bytestream abstraction
344 to hold on to the bytes received from control-channel's SEND
347 Each time we receive a complete "QEMU File" control-channel
348 message, the bytes from SEND are copied into a small local holding area.
356 asking for a new SEND message to re-fill the buffer.
361 At the beginning of the migration, (migration-rdma.c),
363 to be registered with each other into a structure.
364 Then, using the aforementioned protocol, they exchange a
367 a list of all the RAMBlocks, their offsets and lengths, virtual
368 addresses and possibly includes pre-registered RDMA keys in case dynamic
369 page registration was disabled on the server-side, otherwise not.
374 Pages are migrated in "chunks" (hard-coded to 1 Megabyte right now).
375 Chunk size is not dynamic, but it could be in a future implementation.
378 When a chunk is full (or a flush() occurs), the memory backed by
388 Only the last chunk in a batch must be signaled.
389 This helps keep everything as asynchronous as possible
390 and helps keep the hardware busy performing RDMA operations.
392 Error-handling:
395 Infiniband has what is called a "Reliable, Connected"
399 If a *single* message fails,
406 socket is broken during a non-RDMA based migration.
410 1. Currently, 'ulimit -l' mlock() limits as well as cgroups swap limits
415 3. Also, some form of balloon-device usage tracking would also
417 4. Use LRU to provide more fine-grained direction of UNREGISTER
419 5. Expose UNREGISTER support to the user by way of workload-specific