1================================================================= 2Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) 3================================================================= 4 5Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature 6supports Ethernet functionality over Omni-Path fabric by encapsulating 7the Ethernet packets between HFI nodes. 8 9Architecture 10============= 11The patterns of exchanges of Omni-Path encapsulated Ethernet packets 12involves one or more virtual Ethernet switches overlaid on the Omni-Path 13fabric topology. A subset of HFI nodes on the Omni-Path fabric are 14permitted to exchange encapsulated Ethernet packets across a particular 15virtual Ethernet switch. The virtual Ethernet switches are logical 16abstractions achieved by configuring the HFI nodes on the fabric for 17header generation and processing. In the simplest configuration all HFI 18nodes across the fabric exchange encapsulated Ethernet packets over a 19single virtual Ethernet switch. A virtual Ethernet switch, is effectively 20an independent Ethernet network. The configuration is performed by an 21Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM) 22application. HFI nodes can have multiple VNICs each connected to a 23different virtual Ethernet switch. The below diagram presents a case 24of two virtual Ethernet switches with two HFI nodes:: 25 26 +-------------------+ 27 | Subnet/ | 28 | Ethernet | 29 | Manager | 30 +-------------------+ 31 / / 32 / / 33 / / 34 / / 35 +-----------------------------+ +------------------------------+ 36 | Virtual Ethernet Switch | | Virtual Ethernet Switch | 37 | +---------+ +---------+ | | +---------+ +---------+ | 38 | | VPORT | | VPORT | | | | VPORT | | VPORT | | 39 +--+---------+----+---------+-+ +-+---------+----+---------+---+ 40 | \ / | 41 | \ / | 42 | \/ | 43 | / \ | 44 | / \ | 45 +-----------+------------+ +-----------+------------+ 46 | VNIC | VNIC | | VNIC | VNIC | 47 +-----------+------------+ +-----------+------------+ 48 | HFI | | HFI | 49 +------------------------+ +------------------------+ 50 51 52The Omni-Path encapsulated Ethernet packet format is as described below. 53 54==================== ================================ 55Bits Field 56==================== ================================ 57Quad Word 0: 580-19 SLID (lower 20 bits) 5920-30 Length (in Quad Words) 6031 BECN bit 6132-51 DLID (lower 20 bits) 6252-56 SC (Service Class) 6357-59 RC (Routing Control) 6460 FECN bit 6561-62 L2 (=10, 16B format) 6663 LT (=1, Link Transfer Head Flit) 67 68Quad Word 1: 690-7 L4 type (=0x78 ETHERNET) 708-11 SLID[23:20] 7112-15 DLID[23:20] 7216-31 PKEY 7332-47 Entropy 7448-63 Reserved 75 76Quad Word 2: 770-15 Reserved 7816-31 L4 header 7932-63 Ethernet Packet 80 81Quad Words 3 to N-1: 820-63 Ethernet packet (pad extended) 83 84Quad Word N (last): 850-23 Ethernet packet (pad extended) 8624-55 ICRC 8756-61 Tail 8862-63 LT (=01, Link Transfer Tail Flit) 89==================== ================================ 90 91Ethernet packet is padded on the transmit side to ensure that the VNIC OPA 92packet is quad word aligned. The 'Tail' field contains the number of bytes 93padded. On the receive side the 'Tail' field is read and the padding is 94removed (along with ICRC, Tail and OPA header) before passing packet up 95the network stack. 96 97The L4 header field contains the virtual Ethernet switch id the VNIC port 98belongs to. On the receive side, this field is used to de-multiplex the 99received VNIC packets to different VNIC ports. 100 101Driver Design 102============== 103Intel OPA VNIC software design is presented in the below diagram. 104OPA VNIC functionality has a HW dependent component and a HW 105independent component. 106 107The support has been added for IB device to allocate and free the RDMA 108netdev devices. The RDMA netdev supports interfacing with the network 109stack thus creating standard network interfaces. OPA_VNIC is an RDMA 110netdev device type. 111 112The HW dependent VNIC functionality is part of the HFI1 driver. It 113implements the verbs to allocate and free the OPA_VNIC RDMA netdev. 114It involves HW resource allocation/management for VNIC functionality. 115It interfaces with the network stack and implements the required 116net_device_ops functions. It expects Omni-Path encapsulated Ethernet 117packets in the transmit path and provides HW access to them. It strips 118the Omni-Path header from the received packets before passing them up 119the network stack. It also implements the RDMA netdev control operations. 120 121The OPA VNIC module implements the HW independent VNIC functionality. 122It consists of two parts. The VNIC Ethernet Management Agent (VEMA) 123registers itself with IB core as an IB client and interfaces with the 124IB MAD stack. It exchanges the management information with the Ethernet 125Manager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees 126the OPA_VNIC RDMA netdev devices. It overrides the net_device_ops functions 127set by HW dependent VNIC driver where required to accommodate any control 128operation. It also handles the encapsulation of Ethernet packets with an 129Omni-Path header in the transmit path. For each VNIC interface, the 130information required for encapsulation is configured by the EM via VEMA MAD 131interface. It also passes any control information to the HW dependent driver 132by invoking the RDMA netdev control operations:: 133 134 +-------------------+ +----------------------+ 135 | | | Linux | 136 | IB MAD | | Network | 137 | | | Stack | 138 +-------------------+ +----------------------+ 139 | | | 140 | | | 141 +----------------------------+ | 142 | | | 143 | OPA VNIC Module | | 144 | (OPA VNIC RDMA Netdev | | 145 | & EMA functions) | | 146 | | | 147 +----------------------------+ | 148 | | 149 | | 150 +------------------+ | 151 | IB core | | 152 +------------------+ | 153 | | 154 | | 155 +--------------------------------------------+ 156 | | 157 | HFI1 Driver with VNIC support | 158 | | 159 +--------------------------------------------+ 160