1cfc80d9aSSridhar Samudrala.. SPDX-License-Identifier: GPL-2.0 2cfc80d9aSSridhar Samudrala 3cfc80d9aSSridhar Samudrala============ 4cfc80d9aSSridhar SamudralaNET_FAILOVER 5cfc80d9aSSridhar Samudrala============ 6cfc80d9aSSridhar Samudrala 7cfc80d9aSSridhar SamudralaOverview 8cfc80d9aSSridhar Samudrala======== 9cfc80d9aSSridhar Samudrala 10cfc80d9aSSridhar SamudralaThe net_failover driver provides an automated failover mechanism via APIs 11f8a0fea9SJonathan Neuschäferto create and destroy a failover master netdev and manages a primary and 12cfc80d9aSSridhar Samudralastandby slave netdevs that get registered via the generic failover 13f8a0fea9SJonathan Neuschäferinfrastructure. 14cfc80d9aSSridhar Samudrala 15cfc80d9aSSridhar SamudralaThe failover netdev acts a master device and controls 2 slave devices. The 16cfc80d9aSSridhar Samudralaoriginal paravirtual interface is registered as 'standby' slave netdev and 17cfc80d9aSSridhar Samudralaa passthru/vf device with the same MAC gets registered as 'primary' slave 18cfc80d9aSSridhar Samudralanetdev. Both 'standby' and 'failover' netdevs are associated with the same 19cfc80d9aSSridhar Samudrala'pci' device. The user accesses the network interface via 'failover' netdev. 20cfc80d9aSSridhar SamudralaThe 'failover' netdev chooses 'primary' netdev as default for transmits when 21cfc80d9aSSridhar Samudralait is available with link up and running. 22cfc80d9aSSridhar Samudrala 23cfc80d9aSSridhar SamudralaThis can be used by paravirtual drivers to enable an alternate low latency 24cfc80d9aSSridhar Samudraladatapath. It also enables hypervisor controlled live migration of a VM with 25cfc80d9aSSridhar Samudraladirect attached VF by failing over to the paravirtual datapath when the VF 26cfc80d9aSSridhar Samudralais unplugged. 27ba5e4426SSridhar Samudrala 28ba5e4426SSridhar Samudralavirtio-net accelerated datapath: STANDBY mode 29ba5e4426SSridhar Samudrala============================================= 30ba5e4426SSridhar Samudrala 31ba5e4426SSridhar Samudralanet_failover enables hypervisor controlled accelerated datapath to virtio-net 32f8a0fea9SJonathan Neuschäferenabled VMs in a transparent manner with no/minimal guest userspace changes. 33ba5e4426SSridhar Samudrala 34ba5e4426SSridhar SamudralaTo support this, the hypervisor needs to enable VIRTIO_NET_F_STANDBY 35ba5e4426SSridhar Samudralafeature on the virtio-net interface and assign the same MAC address to both 36ba5e4426SSridhar Samudralavirtio-net and VF interfaces. 37ba5e4426SSridhar Samudrala 38738baea4SVasudev KamathHere is an example libvirt XML snippet that shows such configuration: 3928809849STobin C. Harding:: 40ba5e4426SSridhar Samudrala 41ba5e4426SSridhar Samudrala <interface type='network'> 42ba5e4426SSridhar Samudrala <mac address='52:54:00:00:12:53'/> 43ba5e4426SSridhar Samudrala <source network='enp66s0f0_br'/> 44ba5e4426SSridhar Samudrala <target dev='tap01'/> 45ba5e4426SSridhar Samudrala <model type='virtio'/> 46ba5e4426SSridhar Samudrala <driver name='vhost' queues='4'/> 47ba5e4426SSridhar Samudrala <link state='down'/> 48738baea4SVasudev Kamath <teaming type='persistent'/> 49738baea4SVasudev Kamath <alias name='ua-backup0'/> 50ba5e4426SSridhar Samudrala </interface> 51ba5e4426SSridhar Samudrala <interface type='hostdev' managed='yes'> 52ba5e4426SSridhar Samudrala <mac address='52:54:00:00:12:53'/> 53ba5e4426SSridhar Samudrala <source> 54ba5e4426SSridhar Samudrala <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/> 55ba5e4426SSridhar Samudrala </source> 56738baea4SVasudev Kamath <teaming type='transient' persistent='ua-backup0'/> 57ba5e4426SSridhar Samudrala </interface> 58ba5e4426SSridhar Samudrala 59738baea4SVasudev KamathIn this configuration, the first device definition is for the virtio-net 60738baea4SVasudev Kamathinterface and this acts as the 'persistent' device indicating that this 61738baea4SVasudev Kamathinterface will always be plugged in. This is specified by the 'teaming' tag with 62738baea4SVasudev Kamathrequired attribute type having value 'persistent'. The link state for the 63738baea4SVasudev Kamathvirtio-net device is set to 'down' to ensure that the 'failover' netdev prefers 64738baea4SVasudev Kamaththe VF passthrough device for normal communication. The virtio-net device will 65738baea4SVasudev Kamathbe brought UP during live migration to allow uninterrupted communication. 66738baea4SVasudev Kamath 67738baea4SVasudev KamathThe second device definition is for the VF passthrough interface. Here the 68738baea4SVasudev Kamath'teaming' tag is provided with type 'transient' indicating that this device may 69738baea4SVasudev Kamathperiodically be unplugged. A second attribute - 'persistent' is provided and 70738baea4SVasudev Kamathpoints to the alias name declared for the virtio-net device. 71738baea4SVasudev Kamath 72ba5e4426SSridhar SamudralaBooting a VM with the above configuration will result in the following 3 73738baea4SVasudev Kamathinterfaces created in the VM: 7428809849STobin C. Harding:: 75ba5e4426SSridhar Samudrala 76ba5e4426SSridhar Samudrala 4: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 77ba5e4426SSridhar Samudrala link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff 78ba5e4426SSridhar Samudrala inet 192.168.12.53/24 brd 192.168.12.255 scope global dynamic ens10 79ba5e4426SSridhar Samudrala valid_lft 42482sec preferred_lft 42482sec 80ba5e4426SSridhar Samudrala inet6 fe80::97d8:db2:8c10:b6d6/64 scope link 81ba5e4426SSridhar Samudrala valid_lft forever preferred_lft forever 82738baea4SVasudev Kamath 5: ens10nsby: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master ens10 state DOWN group default qlen 1000 83ba5e4426SSridhar Samudrala link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff 84ba5e4426SSridhar Samudrala 7: ens11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ens10 state UP group default qlen 1000 85ba5e4426SSridhar Samudrala link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff 86ba5e4426SSridhar Samudrala 87738baea4SVasudev KamathHere, ens10 is the 'failover' master interface, ens10nsby is the slave 'standby' 88738baea4SVasudev Kamathvirtio-net interface, and ens11 is the slave 'primary' VF passthrough interface. 89738baea4SVasudev Kamath 90738baea4SVasudev KamathOne point to note here is that some user space network configuration daemons 91738baea4SVasudev Kamathlike systemd-networkd, ifupdown, etc, do not understand the 'net_failover' 92738baea4SVasudev Kamathdevice; and on the first boot, the VM might end up with both 'failover' device 93*a266ef69SRandy Dunlapand VF acquiring IP addresses (either same or different) from the DHCP server. 94738baea4SVasudev KamathThis will result in lack of connectivity to the VM. So some tweaks might be 95738baea4SVasudev Kamathneeded to these network configuration daemons to make sure that an IP is 96738baea4SVasudev Kamathreceived only on the 'failover' device. 97738baea4SVasudev Kamath 98738baea4SVasudev KamathBelow is the patch snippet used with 'cloud-ifupdown-helper' script found on 99738baea4SVasudev KamathDebian cloud images: 100738baea4SVasudev Kamath 101738baea4SVasudev Kamath:: 102738baea4SVasudev Kamath @@ -27,6 +27,8 @@ do_setup() { 103738baea4SVasudev Kamath local working="$cfgdir/.$INTERFACE" 104738baea4SVasudev Kamath local final="$cfgdir/$INTERFACE" 105738baea4SVasudev Kamath 106738baea4SVasudev Kamath + if [ -d "/sys/class/net/${INTERFACE}/master" ]; then exit 0; fi 107738baea4SVasudev Kamath + 108738baea4SVasudev Kamath if ifup --no-act "$INTERFACE" > /dev/null 2>&1; then 109738baea4SVasudev Kamath # interface is already known to ifupdown, no need to generate cfg 110738baea4SVasudev Kamath log "Skipping configuration generation for $INTERFACE" 111738baea4SVasudev Kamath 112ba5e4426SSridhar Samudrala 113ba5e4426SSridhar SamudralaLive Migration of a VM with SR-IOV VF & virtio-net in STANDBY mode 114ba5e4426SSridhar Samudrala================================================================== 115ba5e4426SSridhar Samudrala 116ba5e4426SSridhar Samudralanet_failover also enables hypervisor controlled live migration to be supported 117ba5e4426SSridhar Samudralawith VMs that have direct attached SR-IOV VF devices by automatic failover to 118ba5e4426SSridhar Samudralathe paravirtual datapath when the VF is unplugged. 119ba5e4426SSridhar Samudrala 120738baea4SVasudev KamathHere is a sample script that shows the steps to initiate live migration from 121738baea4SVasudev Kamaththe source hypervisor. Note: It is assumed that the VM is connected to a 122738baea4SVasudev Kamathsoftware bridge 'br0' which has a single VF attached to it along with the vnet 123738baea4SVasudev Kamathdevice to the VM. This is not the VF that was passthrough'd to the VM (seen in 124738baea4SVasudev Kamaththe vf.xml file). 12528809849STobin C. Harding:: 126ba5e4426SSridhar Samudrala 127738baea4SVasudev Kamath # cat vf.xml 128ba5e4426SSridhar Samudrala <interface type='hostdev' managed='yes'> 129ba5e4426SSridhar Samudrala <mac address='52:54:00:00:12:53'/> 130ba5e4426SSridhar Samudrala <source> 131ba5e4426SSridhar Samudrala <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/> 132ba5e4426SSridhar Samudrala </source> 133738baea4SVasudev Kamath <teaming type='transient' persistent='ua-backup0'/> 134ba5e4426SSridhar Samudrala </interface> 135ba5e4426SSridhar Samudrala 136738baea4SVasudev Kamath # Source Hypervisor migrate.sh 137ba5e4426SSridhar Samudrala #!/bin/bash 138ba5e4426SSridhar Samudrala 139738baea4SVasudev Kamath DOMAIN=vm-01 140738baea4SVasudev Kamath PF=ens6np0 141738baea4SVasudev Kamath VF=ens6v1 # VF attached to the bridge. 142738baea4SVasudev Kamath VF_NUM=1 143738baea4SVasudev Kamath TAP_IF=vmtap01 # virtio-net interface in the VM. 144738baea4SVasudev Kamath VF_XML=vf.xml 145ba5e4426SSridhar Samudrala 146ba5e4426SSridhar Samudrala MAC=52:54:00:00:12:53 147ba5e4426SSridhar Samudrala ZERO_MAC=00:00:00:00:00:00 148ba5e4426SSridhar Samudrala 149738baea4SVasudev Kamath # Set the virtio-net interface up. 150ba5e4426SSridhar Samudrala virsh domif-setlink $DOMAIN $TAP_IF up 151738baea4SVasudev Kamath 152738baea4SVasudev Kamath # Remove the VF that was passthrough'd to the VM. 153738baea4SVasudev Kamath virsh detach-device --live --config $DOMAIN $VF_XML 154738baea4SVasudev Kamath 155ba5e4426SSridhar Samudrala ip link set $PF vf $VF_NUM mac $ZERO_MAC 156ba5e4426SSridhar Samudrala 157738baea4SVasudev Kamath # Add FDB entry for traffic to continue going to the VM via 158738baea4SVasudev Kamath # the VF -> br0 -> vnet interface path. 159738baea4SVasudev Kamath bridge fdb add $MAC dev $VF 160738baea4SVasudev Kamath bridge fdb add $MAC dev $TAP_IF master 161ba5e4426SSridhar Samudrala 162738baea4SVasudev Kamath # Migrate the VM 163738baea4SVasudev Kamath virsh migrate --live --persistent $DOMAIN qemu+ssh://$REMOTE_HOST/system 164738baea4SVasudev Kamath 165738baea4SVasudev Kamath # Clean up FDB entries after migration completes. 166738baea4SVasudev Kamath bridge fdb del $MAC dev $VF 167738baea4SVasudev Kamath bridge fdb del $MAC dev $TAP_IF master 168738baea4SVasudev Kamath 169738baea4SVasudev KamathOn the destination hypervisor, a shared bridge 'br0' is created before migration 170738baea4SVasudev Kamathstarts, and a VF from the destination PF is added to the bridge. Similarly an 171738baea4SVasudev Kamathappropriate FDB entry is added. 172738baea4SVasudev Kamath 173738baea4SVasudev KamathThe following script is executed on the destination hypervisor once migration 174738baea4SVasudev Kamathcompletes, and it reattaches the VF to the VM and brings down the virtio-net 175738baea4SVasudev Kamathinterface. 176738baea4SVasudev Kamath 177738baea4SVasudev Kamath:: 178738baea4SVasudev Kamath # reattach-vf.sh 179ba5e4426SSridhar Samudrala #!/bin/bash 180ba5e4426SSridhar Samudrala 181738baea4SVasudev Kamath bridge fdb del 52:54:00:00:12:53 dev ens36v0 182738baea4SVasudev Kamath bridge fdb del 52:54:00:00:12:53 dev vmtap01 master 183738baea4SVasudev Kamath virsh attach-device --config --live vm01 vf.xml 184738baea4SVasudev Kamath virsh domif-setlink vm01 vmtap01 down 185