Live migration, also called migration, refers to the process of moving a running virtual machine (VM) or application between different physical machines without disconnecting the client or application. Memory, storage, and network connectivity of the virtual machine are transferred from the original guest machine to the destination. The time between stopping the VM or application on the source and resuming it on destination is called 'downtime'. When the downtime of a VM during live migration is small enough that it is not noticeable by the end user, it is called a 'seamless' live migration.
Two techniques for moving the virtual machine's memory state from the source to the destination are pre-copy memory migration and post-copy memory migration.
In pre-copy phase,[1] the Hypervisor copies all the memory pages from source to destination while the VM is still running on the source. If some memory pages change (become 'dirty') during the pre-copy phase, they will be copied again and again over several 'pre-copy rounds'. Usually the pre-copy phase ends when the number of dirtied pages remaining becomes small enough to yield a short stop-and-copy phase. However, if a VM keeps dirtying memory faster than can be re-copied to the destination, then pre-copy phase will end after a set time limit or maximum number of pre-copy rounds to begin the next stop-and-copy phase.
After the pre-copy phase, the VM will be paused on the source host, the remaining dirty pages will be copied to the destination, and the VM will be resumed at the destination. The downtime due to this phase can range from a few milliseconds to seconds depending on the number of dirty pages transferred during downtime. VMs that dirty a lot of memory during the pre-copy phase tend to have a larger downtime.
Post-copy[2] VM migration is initiated by suspending the VM at the source. With the VM suspended, a minimal subset of the execution state of the VM (CPU state, registers and, optionally, non-pageable memory) is transferred to the target. The VM is then resumed at the target. Concurrently, the source actively pushes the remaining memory pages of the VM to the target - an activity known as pre-paging. At the target, if the VM tries to access a page that has not yet been transferred, it generates a page-fault. These faults, known as network faults, are trapped at the target and redirected to the source, which responds with the faulted page. Too many network faults can degrade performance of applications running inside the VM. Hence pre-paging can dynamically adapt the page transmission order to network faults by actively pushing pages in the vicinity of the last fault. An ideal pre-paging scheme would mask large majority of network faults, although its performance depends upon the memory access pattern of the VM's workload.
Post-copy sends each page exactly once over the network whereas pre-copy can transfer the same page multiple times if the page is dirtied repeatedly at the source during migration. On the other hand, pre-copy retains an up-to-date state of the VM at the source during migration, whereas during post-copy, the VM's state is split across the source and the destination. If the destination fails during live migration, pre-copy can recover the VM, whereas post-copy cannot.