xref: /openbmc/linux/Documentation/networking/net_failover.rst (revision 9a87ffc99ec8eb8d35eed7c4f816d75f5cc9662e)
1cfc80d9aSSridhar Samudrala.. SPDX-License-Identifier: GPL-2.0
2cfc80d9aSSridhar Samudrala
3cfc80d9aSSridhar Samudrala============
4cfc80d9aSSridhar SamudralaNET_FAILOVER
5cfc80d9aSSridhar Samudrala============
6cfc80d9aSSridhar Samudrala
7cfc80d9aSSridhar SamudralaOverview
8cfc80d9aSSridhar Samudrala========
9cfc80d9aSSridhar Samudrala
10cfc80d9aSSridhar SamudralaThe net_failover driver provides an automated failover mechanism via APIs
11f8a0fea9SJonathan Neuschäferto create and destroy a failover master netdev and manages a primary and
12cfc80d9aSSridhar Samudralastandby slave netdevs that get registered via the generic failover
13f8a0fea9SJonathan Neuschäferinfrastructure.
14cfc80d9aSSridhar Samudrala
15cfc80d9aSSridhar SamudralaThe failover netdev acts a master device and controls 2 slave devices. The
16cfc80d9aSSridhar Samudralaoriginal paravirtual interface is registered as 'standby' slave netdev and
17cfc80d9aSSridhar Samudralaa passthru/vf device with the same MAC gets registered as 'primary' slave
18cfc80d9aSSridhar Samudralanetdev. Both 'standby' and 'failover' netdevs are associated with the same
19cfc80d9aSSridhar Samudrala'pci' device. The user accesses the network interface via 'failover' netdev.
20cfc80d9aSSridhar SamudralaThe 'failover' netdev chooses 'primary' netdev as default for transmits when
21cfc80d9aSSridhar Samudralait is available with link up and running.
22cfc80d9aSSridhar Samudrala
23cfc80d9aSSridhar SamudralaThis can be used by paravirtual drivers to enable an alternate low latency
24cfc80d9aSSridhar Samudraladatapath. It also enables hypervisor controlled live migration of a VM with
25cfc80d9aSSridhar Samudraladirect attached VF by failing over to the paravirtual datapath when the VF
26cfc80d9aSSridhar Samudralais unplugged.
27ba5e4426SSridhar Samudrala
28ba5e4426SSridhar Samudralavirtio-net accelerated datapath: STANDBY mode
29ba5e4426SSridhar Samudrala=============================================
30ba5e4426SSridhar Samudrala
31ba5e4426SSridhar Samudralanet_failover enables hypervisor controlled accelerated datapath to virtio-net
32f8a0fea9SJonathan Neuschäferenabled VMs in a transparent manner with no/minimal guest userspace changes.
33ba5e4426SSridhar Samudrala
34ba5e4426SSridhar SamudralaTo support this, the hypervisor needs to enable VIRTIO_NET_F_STANDBY
35ba5e4426SSridhar Samudralafeature on the virtio-net interface and assign the same MAC address to both
36ba5e4426SSridhar Samudralavirtio-net and VF interfaces.
37ba5e4426SSridhar Samudrala
38738baea4SVasudev KamathHere is an example libvirt XML snippet that shows such configuration:
3928809849STobin C. Harding::
40ba5e4426SSridhar Samudrala
41ba5e4426SSridhar Samudrala  <interface type='network'>
42ba5e4426SSridhar Samudrala    <mac address='52:54:00:00:12:53'/>
43ba5e4426SSridhar Samudrala    <source network='enp66s0f0_br'/>
44ba5e4426SSridhar Samudrala    <target dev='tap01'/>
45ba5e4426SSridhar Samudrala    <model type='virtio'/>
46ba5e4426SSridhar Samudrala    <driver name='vhost' queues='4'/>
47ba5e4426SSridhar Samudrala    <link state='down'/>
48738baea4SVasudev Kamath    <teaming type='persistent'/>
49738baea4SVasudev Kamath    <alias name='ua-backup0'/>
50ba5e4426SSridhar Samudrala  </interface>
51ba5e4426SSridhar Samudrala  <interface type='hostdev' managed='yes'>
52ba5e4426SSridhar Samudrala    <mac address='52:54:00:00:12:53'/>
53ba5e4426SSridhar Samudrala    <source>
54ba5e4426SSridhar Samudrala      <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/>
55ba5e4426SSridhar Samudrala    </source>
56738baea4SVasudev Kamath    <teaming type='transient' persistent='ua-backup0'/>
57ba5e4426SSridhar Samudrala  </interface>
58ba5e4426SSridhar Samudrala
59738baea4SVasudev KamathIn this configuration, the first device definition is for the virtio-net
60738baea4SVasudev Kamathinterface and this acts as the 'persistent' device indicating that this
61738baea4SVasudev Kamathinterface will always be plugged in. This is specified by the 'teaming' tag with
62738baea4SVasudev Kamathrequired attribute type having value 'persistent'. The link state for the
63738baea4SVasudev Kamathvirtio-net device is set to 'down' to ensure that the 'failover' netdev prefers
64738baea4SVasudev Kamaththe VF passthrough device for normal communication. The virtio-net device will
65738baea4SVasudev Kamathbe brought UP during live migration to allow uninterrupted communication.
66738baea4SVasudev Kamath
67738baea4SVasudev KamathThe second device definition is for the VF passthrough interface. Here the
68738baea4SVasudev Kamath'teaming' tag is provided with type 'transient' indicating that this device may
69738baea4SVasudev Kamathperiodically be unplugged. A second attribute - 'persistent' is provided and
70738baea4SVasudev Kamathpoints to the alias name declared for the virtio-net device.
71738baea4SVasudev Kamath
72ba5e4426SSridhar SamudralaBooting a VM with the above configuration will result in the following 3
73738baea4SVasudev Kamathinterfaces created in the VM:
7428809849STobin C. Harding::
75ba5e4426SSridhar Samudrala
76ba5e4426SSridhar Samudrala  4: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
77ba5e4426SSridhar Samudrala      link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff
78ba5e4426SSridhar Samudrala      inet 192.168.12.53/24 brd 192.168.12.255 scope global dynamic ens10
79ba5e4426SSridhar Samudrala         valid_lft 42482sec preferred_lft 42482sec
80ba5e4426SSridhar Samudrala      inet6 fe80::97d8:db2:8c10:b6d6/64 scope link
81ba5e4426SSridhar Samudrala         valid_lft forever preferred_lft forever
82738baea4SVasudev Kamath  5: ens10nsby: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master ens10 state DOWN group default qlen 1000
83ba5e4426SSridhar Samudrala      link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff
84ba5e4426SSridhar Samudrala  7: ens11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ens10 state UP group default qlen 1000
85ba5e4426SSridhar Samudrala      link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff
86ba5e4426SSridhar Samudrala
87738baea4SVasudev KamathHere, ens10 is the 'failover' master interface, ens10nsby is the slave 'standby'
88738baea4SVasudev Kamathvirtio-net interface, and ens11 is the slave 'primary' VF passthrough interface.
89738baea4SVasudev Kamath
90738baea4SVasudev KamathOne point to note here is that some user space network configuration daemons
91738baea4SVasudev Kamathlike systemd-networkd, ifupdown, etc, do not understand the 'net_failover'
92738baea4SVasudev Kamathdevice; and on the first boot, the VM might end up with both 'failover' device
93*a266ef69SRandy Dunlapand VF acquiring IP addresses (either same or different) from the DHCP server.
94738baea4SVasudev KamathThis will result in lack of connectivity to the VM. So some tweaks might be
95738baea4SVasudev Kamathneeded to these network configuration daemons to make sure that an IP is
96738baea4SVasudev Kamathreceived only on the 'failover' device.
97738baea4SVasudev Kamath
98738baea4SVasudev KamathBelow is the patch snippet used with 'cloud-ifupdown-helper' script found on
99738baea4SVasudev KamathDebian cloud images:
100738baea4SVasudev Kamath
101738baea4SVasudev Kamath::
102738baea4SVasudev Kamath  @@ -27,6 +27,8 @@ do_setup() {
103738baea4SVasudev Kamath       local working="$cfgdir/.$INTERFACE"
104738baea4SVasudev Kamath       local final="$cfgdir/$INTERFACE"
105738baea4SVasudev Kamath
106738baea4SVasudev Kamath  +    if [ -d "/sys/class/net/${INTERFACE}/master" ]; then exit 0; fi
107738baea4SVasudev Kamath  +
108738baea4SVasudev Kamath       if ifup --no-act "$INTERFACE" > /dev/null 2>&1; then
109738baea4SVasudev Kamath           # interface is already known to ifupdown, no need to generate cfg
110738baea4SVasudev Kamath           log "Skipping configuration generation for $INTERFACE"
111738baea4SVasudev Kamath
112ba5e4426SSridhar Samudrala
113ba5e4426SSridhar SamudralaLive Migration of a VM with SR-IOV VF & virtio-net in STANDBY mode
114ba5e4426SSridhar Samudrala==================================================================
115ba5e4426SSridhar Samudrala
116ba5e4426SSridhar Samudralanet_failover also enables hypervisor controlled live migration to be supported
117ba5e4426SSridhar Samudralawith VMs that have direct attached SR-IOV VF devices by automatic failover to
118ba5e4426SSridhar Samudralathe paravirtual datapath when the VF is unplugged.
119ba5e4426SSridhar Samudrala
120738baea4SVasudev KamathHere is a sample script that shows the steps to initiate live migration from
121738baea4SVasudev Kamaththe source hypervisor. Note: It is assumed that the VM is connected to a
122738baea4SVasudev Kamathsoftware bridge 'br0' which has a single VF attached to it along with the vnet
123738baea4SVasudev Kamathdevice to the VM. This is not the VF that was passthrough'd to the VM (seen in
124738baea4SVasudev Kamaththe vf.xml file).
12528809849STobin C. Harding::
126ba5e4426SSridhar Samudrala
127738baea4SVasudev Kamath  # cat vf.xml
128ba5e4426SSridhar Samudrala  <interface type='hostdev' managed='yes'>
129ba5e4426SSridhar Samudrala    <mac address='52:54:00:00:12:53'/>
130ba5e4426SSridhar Samudrala    <source>
131ba5e4426SSridhar Samudrala      <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/>
132ba5e4426SSridhar Samudrala    </source>
133738baea4SVasudev Kamath    <teaming type='transient' persistent='ua-backup0'/>
134ba5e4426SSridhar Samudrala  </interface>
135ba5e4426SSridhar Samudrala
136738baea4SVasudev Kamath  # Source Hypervisor migrate.sh
137ba5e4426SSridhar Samudrala  #!/bin/bash
138ba5e4426SSridhar Samudrala
139738baea4SVasudev Kamath  DOMAIN=vm-01
140738baea4SVasudev Kamath  PF=ens6np0
141738baea4SVasudev Kamath  VF=ens6v1             # VF attached to the bridge.
142738baea4SVasudev Kamath  VF_NUM=1
143738baea4SVasudev Kamath  TAP_IF=vmtap01        # virtio-net interface in the VM.
144738baea4SVasudev Kamath  VF_XML=vf.xml
145ba5e4426SSridhar Samudrala
146ba5e4426SSridhar Samudrala  MAC=52:54:00:00:12:53
147ba5e4426SSridhar Samudrala  ZERO_MAC=00:00:00:00:00:00
148ba5e4426SSridhar Samudrala
149738baea4SVasudev Kamath  # Set the virtio-net interface up.
150ba5e4426SSridhar Samudrala  virsh domif-setlink $DOMAIN $TAP_IF up
151738baea4SVasudev Kamath
152738baea4SVasudev Kamath  # Remove the VF that was passthrough'd to the VM.
153738baea4SVasudev Kamath  virsh detach-device --live --config $DOMAIN $VF_XML
154738baea4SVasudev Kamath
155ba5e4426SSridhar Samudrala  ip link set $PF vf $VF_NUM mac $ZERO_MAC
156ba5e4426SSridhar Samudrala
157738baea4SVasudev Kamath  # Add FDB entry for traffic to continue going to the VM via
158738baea4SVasudev Kamath  # the VF -> br0 -> vnet interface path.
159738baea4SVasudev Kamath  bridge fdb add $MAC dev $VF
160738baea4SVasudev Kamath  bridge fdb add $MAC dev $TAP_IF master
161ba5e4426SSridhar Samudrala
162738baea4SVasudev Kamath  # Migrate the VM
163738baea4SVasudev Kamath  virsh migrate --live --persistent $DOMAIN qemu+ssh://$REMOTE_HOST/system
164738baea4SVasudev Kamath
165738baea4SVasudev Kamath  # Clean up FDB entries after migration completes.
166738baea4SVasudev Kamath  bridge fdb del $MAC dev $VF
167738baea4SVasudev Kamath  bridge fdb del $MAC dev $TAP_IF master
168738baea4SVasudev Kamath
169738baea4SVasudev KamathOn the destination hypervisor, a shared bridge 'br0' is created before migration
170738baea4SVasudev Kamathstarts, and a VF from the destination PF is added to the bridge. Similarly an
171738baea4SVasudev Kamathappropriate FDB entry is added.
172738baea4SVasudev Kamath
173738baea4SVasudev KamathThe following script is executed on the destination hypervisor once migration
174738baea4SVasudev Kamathcompletes, and it reattaches the VF to the VM and brings down the virtio-net
175738baea4SVasudev Kamathinterface.
176738baea4SVasudev Kamath
177738baea4SVasudev Kamath::
178738baea4SVasudev Kamath  # reattach-vf.sh
179ba5e4426SSridhar Samudrala  #!/bin/bash
180ba5e4426SSridhar Samudrala
181738baea4SVasudev Kamath  bridge fdb del 52:54:00:00:12:53 dev ens36v0
182738baea4SVasudev Kamath  bridge fdb del 52:54:00:00:12:53 dev vmtap01 master
183738baea4SVasudev Kamath  virsh attach-device --config --live vm01 vf.xml
184738baea4SVasudev Kamath  virsh domif-setlink vm01 vmtap01 down
185