vrf.c (9d066a252786e1a18484a6283f82614d42a9f4ac) | vrf.c (0d240e7811c4ec1965760ee4643b5bbc9cfacbb3) |
---|---|
1/* 2 * vrf.c: device driver to encapsulate a VRF space 3 * 4 * Copyright (c) 2015 Cumulus Networks. All rights reserved. 5 * Copyright (c) 2015 Shrijeet Mukherjee <shm@cumulusnetworks.com> 6 * Copyright (c) 2015 David Ahern <dsa@cumulusnetworks.com> 7 * 8 * Based on dummy, team and ipvlan drivers --- 21 unchanged lines hidden (view full) --- 30#include <net/arp.h> 31#include <net/ip.h> 32#include <net/ip_fib.h> 33#include <net/ip6_fib.h> 34#include <net/ip6_route.h> 35#include <net/route.h> 36#include <net/addrconf.h> 37#include <net/l3mdev.h> | 1/* 2 * vrf.c: device driver to encapsulate a VRF space 3 * 4 * Copyright (c) 2015 Cumulus Networks. All rights reserved. 5 * Copyright (c) 2015 Shrijeet Mukherjee <shm@cumulusnetworks.com> 6 * Copyright (c) 2015 David Ahern <dsa@cumulusnetworks.com> 7 * 8 * Based on dummy, team and ipvlan drivers --- 21 unchanged lines hidden (view full) --- 30#include <net/arp.h> 31#include <net/ip.h> 32#include <net/ip_fib.h> 33#include <net/ip6_fib.h> 34#include <net/ip6_route.h> 35#include <net/route.h> 36#include <net/addrconf.h> 37#include <net/l3mdev.h> |
38#include <net/fib_rules.h> |
|
38 39#define RT_FL_TOS(oldflp4) \ 40 ((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK)) 41 42#define DRV_NAME "vrf" 43#define DRV_VERSION "1.0" 44 | 39 40#define RT_FL_TOS(oldflp4) \ 41 ((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK)) 42 43#define DRV_NAME "vrf" 44#define DRV_VERSION "1.0" 45 |
46#define FIB_RULE_PREF 1000 /* default preference for FIB rules */ 47static bool add_fib_rules = true; 48 |
|
45struct net_vrf { 46 struct rtable __rcu *rth; | 49struct net_vrf { 50 struct rtable __rcu *rth; |
51 struct rtable __rcu *rth_local; |
|
47 struct rt6_info __rcu *rt6; | 52 struct rt6_info __rcu *rt6; |
53 struct rt6_info __rcu *rt6_local; |
|
48 u32 tb_id; 49}; 50 51struct pcpu_dstats { 52 u64 tx_pkts; 53 u64 tx_bytes; 54 u64 tx_drps; 55 u64 rx_pkts; 56 u64 rx_bytes; | 54 u32 tb_id; 55}; 56 57struct pcpu_dstats { 58 u64 tx_pkts; 59 u64 tx_bytes; 60 u64 tx_drps; 61 u64 rx_pkts; 62 u64 rx_bytes; |
63 u64 rx_drps; |
|
57 struct u64_stats_sync syncp; 58}; 59 | 64 struct u64_stats_sync syncp; 65}; 66 |
67static void vrf_rx_stats(struct net_device *dev, int len) 68{ 69 struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats); 70 71 u64_stats_update_begin(&dstats->syncp); 72 dstats->rx_pkts++; 73 dstats->rx_bytes += len; 74 u64_stats_update_end(&dstats->syncp); 75} 76 |
|
60static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb) 61{ 62 vrf_dev->stats.tx_errors++; 63 kfree_skb(skb); 64} 65 66static struct rtnl_link_stats64 *vrf_get_stats64(struct net_device *dev, 67 struct rtnl_link_stats64 *stats) --- 18 unchanged lines hidden (view full) --- 86 stats->tx_packets += tpkts; 87 stats->tx_dropped += tdrops; 88 stats->rx_bytes += rbytes; 89 stats->rx_packets += rpkts; 90 } 91 return stats; 92} 93 | 77static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb) 78{ 79 vrf_dev->stats.tx_errors++; 80 kfree_skb(skb); 81} 82 83static struct rtnl_link_stats64 *vrf_get_stats64(struct net_device *dev, 84 struct rtnl_link_stats64 *stats) --- 18 unchanged lines hidden (view full) --- 103 stats->tx_packets += tpkts; 104 stats->tx_dropped += tdrops; 105 stats->rx_bytes += rbytes; 106 stats->rx_packets += rpkts; 107 } 108 return stats; 109} 110 |
111/* Local traffic destined to local address. Reinsert the packet to rx 112 * path, similar to loopback handling. 113 */ 114static int vrf_local_xmit(struct sk_buff *skb, struct net_device *dev, 115 struct dst_entry *dst) 116{ 117 int len = skb->len; 118 119 skb_orphan(skb); 120 121 skb_dst_set(skb, dst); 122 skb_dst_force(skb); 123 124 /* set pkt_type to avoid skb hitting packet taps twice - 125 * once on Tx and again in Rx processing 126 */ 127 skb->pkt_type = PACKET_LOOPBACK; 128 129 skb->protocol = eth_type_trans(skb, dev); 130 131 if (likely(netif_rx(skb) == NET_RX_SUCCESS)) 132 vrf_rx_stats(dev, len); 133 else 134 this_cpu_inc(dev->dstats->rx_drps); 135 136 return NETDEV_TX_OK; 137} 138 |
|
94#if IS_ENABLED(CONFIG_IPV6) 95static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb, 96 struct net_device *dev) 97{ 98 const struct ipv6hdr *iph = ipv6_hdr(skb); 99 struct net *net = dev_net(skb->dev); 100 struct flowi6 fl6 = { 101 /* needed to match OIF rule */ --- 10 unchanged lines hidden (view full) --- 112 struct dst_entry *dst; 113 struct dst_entry *dst_null = &net->ipv6.ip6_null_entry->dst; 114 115 dst = ip6_route_output(net, NULL, &fl6); 116 if (dst == dst_null) 117 goto err; 118 119 skb_dst_drop(skb); | 139#if IS_ENABLED(CONFIG_IPV6) 140static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb, 141 struct net_device *dev) 142{ 143 const struct ipv6hdr *iph = ipv6_hdr(skb); 144 struct net *net = dev_net(skb->dev); 145 struct flowi6 fl6 = { 146 /* needed to match OIF rule */ --- 10 unchanged lines hidden (view full) --- 157 struct dst_entry *dst; 158 struct dst_entry *dst_null = &net->ipv6.ip6_null_entry->dst; 159 160 dst = ip6_route_output(net, NULL, &fl6); 161 if (dst == dst_null) 162 goto err; 163 164 skb_dst_drop(skb); |
165 166 /* if dst.dev is loopback or the VRF device again this is locally 167 * originated traffic destined to a local address. Short circuit 168 * to Rx path using our local dst 169 */ 170 if (dst->dev == net->loopback_dev || dst->dev == dev) { 171 struct net_vrf *vrf = netdev_priv(dev); 172 struct rt6_info *rt6_local; 173 174 /* release looked up dst and use cached local dst */ 175 dst_release(dst); 176 177 rcu_read_lock(); 178 179 rt6_local = rcu_dereference(vrf->rt6_local); 180 if (unlikely(!rt6_local)) { 181 rcu_read_unlock(); 182 goto err; 183 } 184 185 /* Ordering issue: cached local dst is created on newlink 186 * before the IPv6 initialization. Using the local dst 187 * requires rt6i_idev to be set so make sure it is. 188 */ 189 if (unlikely(!rt6_local->rt6i_idev)) { 190 rt6_local->rt6i_idev = in6_dev_get(dev); 191 if (!rt6_local->rt6i_idev) { 192 rcu_read_unlock(); 193 goto err; 194 } 195 } 196 197 dst = &rt6_local->dst; 198 dst_hold(dst); 199 200 rcu_read_unlock(); 201 202 return vrf_local_xmit(skb, dev, &rt6_local->dst); 203 } 204 |
|
120 skb_dst_set(skb, dst); 121 | 205 skb_dst_set(skb, dst); 206 |
207 /* strip the ethernet header added for pass through VRF device */ 208 __skb_pull(skb, skb_network_offset(skb)); 209 |
|
122 ret = ip6_local_out(net, skb->sk, skb); 123 if (unlikely(net_xmit_eval(ret))) 124 dev->stats.tx_errors++; 125 else 126 ret = NET_XMIT_SUCCESS; 127 128 return ret; 129err: --- 4 unchanged lines hidden (view full) --- 134static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb, 135 struct net_device *dev) 136{ 137 vrf_tx_error(dev, skb); 138 return NET_XMIT_DROP; 139} 140#endif 141 | 210 ret = ip6_local_out(net, skb->sk, skb); 211 if (unlikely(net_xmit_eval(ret))) 212 dev->stats.tx_errors++; 213 else 214 ret = NET_XMIT_SUCCESS; 215 216 return ret; 217err: --- 4 unchanged lines hidden (view full) --- 222static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb, 223 struct net_device *dev) 224{ 225 vrf_tx_error(dev, skb); 226 return NET_XMIT_DROP; 227} 228#endif 229 |
142static int vrf_send_v4_prep(struct sk_buff *skb, struct flowi4 *fl4, 143 struct net_device *vrf_dev) 144{ 145 struct rtable *rt; 146 int err = 1; 147 148 rt = ip_route_output_flow(dev_net(vrf_dev), fl4, NULL); 149 if (IS_ERR(rt)) 150 goto out; 151 152 /* TO-DO: what about broadcast ? */ 153 if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) { 154 ip_rt_put(rt); 155 goto out; 156 } 157 158 skb_dst_drop(skb); 159 skb_dst_set(skb, &rt->dst); 160 err = 0; 161out: 162 return err; 163} 164 | |
165static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb, 166 struct net_device *vrf_dev) 167{ 168 struct iphdr *ip4h = ip_hdr(skb); 169 int ret = NET_XMIT_DROP; 170 struct flowi4 fl4 = { 171 /* needed to match OIF rule */ 172 .flowi4_oif = vrf_dev->ifindex, 173 .flowi4_iif = LOOPBACK_IFINDEX, 174 .flowi4_tos = RT_TOS(ip4h->tos), 175 .flowi4_flags = FLOWI_FLAG_ANYSRC | FLOWI_FLAG_L3MDEV_SRC | 176 FLOWI_FLAG_SKIP_NH_OIF, 177 .daddr = ip4h->daddr, 178 }; | 230static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb, 231 struct net_device *vrf_dev) 232{ 233 struct iphdr *ip4h = ip_hdr(skb); 234 int ret = NET_XMIT_DROP; 235 struct flowi4 fl4 = { 236 /* needed to match OIF rule */ 237 .flowi4_oif = vrf_dev->ifindex, 238 .flowi4_iif = LOOPBACK_IFINDEX, 239 .flowi4_tos = RT_TOS(ip4h->tos), 240 .flowi4_flags = FLOWI_FLAG_ANYSRC | FLOWI_FLAG_L3MDEV_SRC | 241 FLOWI_FLAG_SKIP_NH_OIF, 242 .daddr = ip4h->daddr, 243 }; |
244 struct net *net = dev_net(vrf_dev); 245 struct rtable *rt; |
|
179 | 246 |
180 if (vrf_send_v4_prep(skb, &fl4, vrf_dev)) | 247 rt = ip_route_output_flow(net, &fl4, NULL); 248 if (IS_ERR(rt)) |
181 goto err; 182 | 249 goto err; 250 |
251 if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) { 252 ip_rt_put(rt); 253 goto err; 254 } 255 256 skb_dst_drop(skb); 257 258 /* if dst.dev is loopback or the VRF device again this is locally 259 * originated traffic destined to a local address. Short circuit 260 * to Rx path using our local dst 261 */ 262 if (rt->dst.dev == net->loopback_dev || rt->dst.dev == vrf_dev) { 263 struct net_vrf *vrf = netdev_priv(vrf_dev); 264 struct rtable *rth_local; 265 struct dst_entry *dst = NULL; 266 267 ip_rt_put(rt); 268 269 rcu_read_lock(); 270 271 rth_local = rcu_dereference(vrf->rth_local); 272 if (likely(rth_local)) { 273 dst = &rth_local->dst; 274 dst_hold(dst); 275 } 276 277 rcu_read_unlock(); 278 279 if (unlikely(!dst)) 280 goto err; 281 282 return vrf_local_xmit(skb, vrf_dev, dst); 283 } 284 285 skb_dst_set(skb, &rt->dst); 286 287 /* strip the ethernet header added for pass through VRF device */ 288 __skb_pull(skb, skb_network_offset(skb)); 289 |
|
183 if (!ip4h->saddr) { 184 ip4h->saddr = inet_select_addr(skb_dst(skb)->dev, 0, 185 RT_SCOPE_LINK); 186 } 187 188 ret = ip_local_out(dev_net(skb_dst(skb)->dev), skb->sk, skb); 189 if (unlikely(net_xmit_eval(ret))) 190 vrf_dev->stats.tx_errors++; --- 4 unchanged lines hidden (view full) --- 195 return ret; 196err: 197 vrf_tx_error(vrf_dev, skb); 198 goto out; 199} 200 201static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev) 202{ | 290 if (!ip4h->saddr) { 291 ip4h->saddr = inet_select_addr(skb_dst(skb)->dev, 0, 292 RT_SCOPE_LINK); 293 } 294 295 ret = ip_local_out(dev_net(skb_dst(skb)->dev), skb->sk, skb); 296 if (unlikely(net_xmit_eval(ret))) 297 vrf_dev->stats.tx_errors++; --- 4 unchanged lines hidden (view full) --- 302 return ret; 303err: 304 vrf_tx_error(vrf_dev, skb); 305 goto out; 306} 307 308static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev) 309{ |
203 /* strip the ethernet header added for pass through VRF device */ 204 __skb_pull(skb, skb_network_offset(skb)); 205 | |
206 switch (skb->protocol) { 207 case htons(ETH_P_IP): 208 return vrf_process_v4_outbound(skb, dev); 209 case htons(ETH_P_IPV6): 210 return vrf_process_v6_outbound(skb, dev); 211 default: 212 vrf_tx_error(dev, skb); 213 return NET_XMIT_DROP; --- 55 unchanged lines hidden (view full) --- 269{ 270 return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, 271 net, sk, skb, NULL, skb_dst(skb)->dev, 272 vrf_finish_output6, 273 !(IP6CB(skb)->flags & IP6SKB_REROUTED)); 274} 275 276/* holding rtnl */ | 310 switch (skb->protocol) { 311 case htons(ETH_P_IP): 312 return vrf_process_v4_outbound(skb, dev); 313 case htons(ETH_P_IPV6): 314 return vrf_process_v6_outbound(skb, dev); 315 default: 316 vrf_tx_error(dev, skb); 317 return NET_XMIT_DROP; --- 55 unchanged lines hidden (view full) --- 373{ 374 return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, 375 net, sk, skb, NULL, skb_dst(skb)->dev, 376 vrf_finish_output6, 377 !(IP6CB(skb)->flags & IP6SKB_REROUTED)); 378} 379 380/* holding rtnl */ |
277static void vrf_rt6_release(struct net_vrf *vrf) | 381static void vrf_rt6_release(struct net_device *dev, struct net_vrf *vrf) |
278{ 279 struct rt6_info *rt6 = rtnl_dereference(vrf->rt6); | 382{ 383 struct rt6_info *rt6 = rtnl_dereference(vrf->rt6); |
384 struct rt6_info *rt6_local = rtnl_dereference(vrf->rt6_local); 385 struct net *net = dev_net(dev); 386 struct dst_entry *dst; |
|
280 | 387 |
281 rcu_assign_pointer(vrf->rt6, NULL); | 388 RCU_INIT_POINTER(vrf->rt6, NULL); 389 RCU_INIT_POINTER(vrf->rt6_local, NULL); 390 synchronize_rcu(); |
282 | 391 |
283 if (rt6) 284 dst_release(&rt6->dst); | 392 /* move dev in dst's to loopback so this VRF device can be deleted 393 * - based on dst_ifdown 394 */ 395 if (rt6) { 396 dst = &rt6->dst; 397 dev_put(dst->dev); 398 dst->dev = net->loopback_dev; 399 dev_hold(dst->dev); 400 dst_release(dst); 401 } 402 403 if (rt6_local) { 404 if (rt6_local->rt6i_idev) 405 in6_dev_put(rt6_local->rt6i_idev); 406 407 dst = &rt6_local->dst; 408 dev_put(dst->dev); 409 dst->dev = net->loopback_dev; 410 dev_hold(dst->dev); 411 dst_release(dst); 412 } |
285} 286 287static int vrf_rt6_create(struct net_device *dev) 288{ | 413} 414 415static int vrf_rt6_create(struct net_device *dev) 416{ |
417 int flags = DST_HOST | DST_NOPOLICY | DST_NOXFRM | DST_NOCACHE; |
|
289 struct net_vrf *vrf = netdev_priv(dev); 290 struct net *net = dev_net(dev); 291 struct fib6_table *rt6i_table; | 418 struct net_vrf *vrf = netdev_priv(dev); 419 struct net *net = dev_net(dev); 420 struct fib6_table *rt6i_table; |
292 struct rt6_info *rt6; | 421 struct rt6_info *rt6, *rt6_local; |
293 int rc = -ENOMEM; 294 | 422 int rc = -ENOMEM; 423 |
424 /* IPv6 can be CONFIG enabled and then disabled runtime */ 425 if (!ipv6_mod_enabled()) 426 return 0; 427 |
|
295 rt6i_table = fib6_new_table(net, vrf->tb_id); 296 if (!rt6i_table) 297 goto out; 298 | 428 rt6i_table = fib6_new_table(net, vrf->tb_id); 429 if (!rt6i_table) 430 goto out; 431 |
299 rt6 = ip6_dst_alloc(net, dev, 300 DST_HOST | DST_NOPOLICY | DST_NOXFRM | DST_NOCACHE); | 432 /* create a dst for routing packets out a VRF device */ 433 rt6 = ip6_dst_alloc(net, dev, flags); |
301 if (!rt6) 302 goto out; 303 304 dst_hold(&rt6->dst); 305 306 rt6->rt6i_table = rt6i_table; 307 rt6->dst.output = vrf_output6; | 434 if (!rt6) 435 goto out; 436 437 dst_hold(&rt6->dst); 438 439 rt6->rt6i_table = rt6i_table; 440 rt6->dst.output = vrf_output6; |
441 442 /* create a dst for local routing - packets sent locally 443 * to local address via the VRF device as a loopback 444 */ 445 rt6_local = ip6_dst_alloc(net, dev, flags); 446 if (!rt6_local) { 447 dst_release(&rt6->dst); 448 goto out; 449 } 450 451 dst_hold(&rt6_local->dst); 452 453 rt6_local->rt6i_idev = in6_dev_get(dev); 454 rt6_local->rt6i_flags = RTF_UP | RTF_NONEXTHOP | RTF_LOCAL; 455 rt6_local->rt6i_table = rt6i_table; 456 rt6_local->dst.input = ip6_input; 457 |
|
308 rcu_assign_pointer(vrf->rt6, rt6); | 458 rcu_assign_pointer(vrf->rt6, rt6); |
459 rcu_assign_pointer(vrf->rt6_local, rt6_local); |
|
309 310 rc = 0; 311out: 312 return rc; 313} 314#else | 460 461 rc = 0; 462out: 463 return rc; 464} 465#else |
315static void vrf_rt6_release(struct net_vrf *vrf) | 466static void vrf_rt6_release(struct net_device *dev, struct net_vrf *vrf) |
316{ 317} 318 319static int vrf_rt6_create(struct net_device *dev) 320{ 321 return 0; 322} 323#endif --- 52 unchanged lines hidden (view full) --- 376 377 return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, 378 net, sk, skb, NULL, dev, 379 vrf_finish_output, 380 !(IPCB(skb)->flags & IPSKB_REROUTED)); 381} 382 383/* holding rtnl */ | 467{ 468} 469 470static int vrf_rt6_create(struct net_device *dev) 471{ 472 return 0; 473} 474#endif --- 52 unchanged lines hidden (view full) --- 527 528 return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, 529 net, sk, skb, NULL, dev, 530 vrf_finish_output, 531 !(IPCB(skb)->flags & IPSKB_REROUTED)); 532} 533 534/* holding rtnl */ |
384static void vrf_rtable_release(struct net_vrf *vrf) | 535static void vrf_rtable_release(struct net_device *dev, struct net_vrf *vrf) |
385{ 386 struct rtable *rth = rtnl_dereference(vrf->rth); | 536{ 537 struct rtable *rth = rtnl_dereference(vrf->rth); |
538 struct rtable *rth_local = rtnl_dereference(vrf->rth_local); 539 struct net *net = dev_net(dev); 540 struct dst_entry *dst; |
|
387 | 541 |
388 rcu_assign_pointer(vrf->rth, NULL); | 542 RCU_INIT_POINTER(vrf->rth, NULL); 543 RCU_INIT_POINTER(vrf->rth_local, NULL); 544 synchronize_rcu(); |
389 | 545 |
390 if (rth) 391 dst_release(&rth->dst); | 546 /* move dev in dst's to loopback so this VRF device can be deleted 547 * - based on dst_ifdown 548 */ 549 if (rth) { 550 dst = &rth->dst; 551 dev_put(dst->dev); 552 dst->dev = net->loopback_dev; 553 dev_hold(dst->dev); 554 dst_release(dst); 555 } 556 557 if (rth_local) { 558 dst = &rth_local->dst; 559 dev_put(dst->dev); 560 dst->dev = net->loopback_dev; 561 dev_hold(dst->dev); 562 dst_release(dst); 563 } |
392} 393 394static int vrf_rtable_create(struct net_device *dev) 395{ 396 struct net_vrf *vrf = netdev_priv(dev); | 564} 565 566static int vrf_rtable_create(struct net_device *dev) 567{ 568 struct net_vrf *vrf = netdev_priv(dev); |
397 struct rtable *rth; | 569 struct rtable *rth, *rth_local; |
398 399 if (!fib_new_table(dev_net(dev), vrf->tb_id)) 400 return -ENOMEM; 401 | 570 571 if (!fib_new_table(dev_net(dev), vrf->tb_id)) 572 return -ENOMEM; 573 |
574 /* create a dst for routing packets out through a VRF device */ |
|
402 rth = rt_dst_alloc(dev, 0, RTN_UNICAST, 1, 1, 0); 403 if (!rth) 404 return -ENOMEM; 405 | 575 rth = rt_dst_alloc(dev, 0, RTN_UNICAST, 1, 1, 0); 576 if (!rth) 577 return -ENOMEM; 578 |
579 /* create a dst for local ingress routing - packets sent locally 580 * to local address via the VRF device as a loopback 581 */ 582 rth_local = rt_dst_alloc(dev, RTCF_LOCAL, RTN_LOCAL, 1, 1, 0); 583 if (!rth_local) { 584 dst_release(&rth->dst); 585 return -ENOMEM; 586 } 587 |
|
406 rth->dst.output = vrf_output; 407 rth->rt_table_id = vrf->tb_id; 408 | 588 rth->dst.output = vrf_output; 589 rth->rt_table_id = vrf->tb_id; 590 |
591 rth_local->rt_table_id = vrf->tb_id; 592 |
|
409 rcu_assign_pointer(vrf->rth, rth); | 593 rcu_assign_pointer(vrf->rth, rth); |
594 rcu_assign_pointer(vrf->rth_local, rth_local); |
|
410 411 return 0; 412} 413 414/**************************** device handling ********************/ 415 416/* cycle interface to flush neighbor cache and move routes across tables */ 417static void cycle_netdev(struct net_device *dev) --- 54 unchanged lines hidden (view full) --- 472} 473 474static void vrf_dev_uninit(struct net_device *dev) 475{ 476 struct net_vrf *vrf = netdev_priv(dev); 477 struct net_device *port_dev; 478 struct list_head *iter; 479 | 595 596 return 0; 597} 598 599/**************************** device handling ********************/ 600 601/* cycle interface to flush neighbor cache and move routes across tables */ 602static void cycle_netdev(struct net_device *dev) --- 54 unchanged lines hidden (view full) --- 657} 658 659static void vrf_dev_uninit(struct net_device *dev) 660{ 661 struct net_vrf *vrf = netdev_priv(dev); 662 struct net_device *port_dev; 663 struct list_head *iter; 664 |
480 vrf_rtable_release(vrf); 481 vrf_rt6_release(vrf); | 665 vrf_rtable_release(dev, vrf); 666 vrf_rt6_release(dev, vrf); |
482 483 netdev_for_each_lower_dev(dev, port_dev, iter) 484 vrf_del_slave(dev, port_dev); 485 486 free_percpu(dev->dstats); 487 dev->dstats = NULL; 488} 489 --- 9 unchanged lines hidden (view full) --- 499 if (vrf_rtable_create(dev) != 0) 500 goto out_stats; 501 502 if (vrf_rt6_create(dev) != 0) 503 goto out_rth; 504 505 dev->flags = IFF_MASTER | IFF_NOARP; 506 | 667 668 netdev_for_each_lower_dev(dev, port_dev, iter) 669 vrf_del_slave(dev, port_dev); 670 671 free_percpu(dev->dstats); 672 dev->dstats = NULL; 673} 674 --- 9 unchanged lines hidden (view full) --- 684 if (vrf_rtable_create(dev) != 0) 685 goto out_stats; 686 687 if (vrf_rt6_create(dev) != 0) 688 goto out_rth; 689 690 dev->flags = IFF_MASTER | IFF_NOARP; 691 |
692 /* MTU is irrelevant for VRF device; set to 64k similar to lo */ 693 dev->mtu = 64 * 1024; 694 695 /* similarly, oper state is irrelevant; set to up to avoid confusion */ 696 dev->operstate = IF_OPER_UP; 697 netdev_lockdep_set_classes(dev); |
|
507 return 0; 508 509out_rth: | 698 return 0; 699 700out_rth: |
510 vrf_rtable_release(vrf); | 701 vrf_rtable_release(dev, vrf); |
511out_stats: 512 free_percpu(dev->dstats); 513 dev->dstats = NULL; 514out_nomem: 515 return -ENOMEM; 516} 517 518static const struct net_device_ops vrf_netdev_ops = { --- 99 unchanged lines hidden (view full) --- 618 break; 619 } 620 } 621 622out: 623 return rc; 624} 625 | 702out_stats: 703 free_percpu(dev->dstats); 704 dev->dstats = NULL; 705out_nomem: 706 return -ENOMEM; 707} 708 709static const struct net_device_ops vrf_netdev_ops = { --- 99 unchanged lines hidden (view full) --- 809 break; 810 } 811 } 812 813out: 814 return rc; 815} 816 |
817static struct rt6_info *vrf_ip6_route_lookup(struct net *net, 818 const struct net_device *dev, 819 struct flowi6 *fl6, 820 int ifindex, 821 int flags) 822{ 823 struct net_vrf *vrf = netdev_priv(dev); 824 struct fib6_table *table = NULL; 825 struct rt6_info *rt6; 826 827 rcu_read_lock(); 828 829 /* fib6_table does not have a refcnt and can not be freed */ 830 rt6 = rcu_dereference(vrf->rt6); 831 if (likely(rt6)) 832 table = rt6->rt6i_table; 833 834 rcu_read_unlock(); 835 836 if (!table) 837 return NULL; 838 839 return ip6_pol_route(net, table, ifindex, fl6, flags); 840} 841 842static void vrf_ip6_input_dst(struct sk_buff *skb, struct net_device *vrf_dev, 843 int ifindex) 844{ 845 const struct ipv6hdr *iph = ipv6_hdr(skb); 846 struct flowi6 fl6 = { 847 .daddr = iph->daddr, 848 .saddr = iph->saddr, 849 .flowlabel = ip6_flowinfo(iph), 850 .flowi6_mark = skb->mark, 851 .flowi6_proto = iph->nexthdr, 852 .flowi6_iif = ifindex, 853 }; 854 struct net *net = dev_net(vrf_dev); 855 struct rt6_info *rt6; 856 857 rt6 = vrf_ip6_route_lookup(net, vrf_dev, &fl6, ifindex, 858 RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_IFACE); 859 if (unlikely(!rt6)) 860 return; 861 862 if (unlikely(&rt6->dst == &net->ipv6.ip6_null_entry->dst)) 863 return; 864 865 skb_dst_set(skb, &rt6->dst); 866} 867 |
|
626static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev, 627 struct sk_buff *skb) 628{ | 868static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev, 869 struct sk_buff *skb) 870{ |
629 /* if packet is NDISC keep the ingress interface */ 630 if (!ipv6_ndisc_frame(skb)) { | 871 int orig_iif = skb->skb_iif; 872 bool need_strict; 873 874 /* loopback traffic; do not push through packet taps again. 875 * Reset pkt_type for upper layers to process skb 876 */ 877 if (skb->pkt_type == PACKET_LOOPBACK) { |
631 skb->dev = vrf_dev; 632 skb->skb_iif = vrf_dev->ifindex; | 878 skb->dev = vrf_dev; 879 skb->skb_iif = vrf_dev->ifindex; |
880 skb->pkt_type = PACKET_HOST; 881 goto out; 882 } |
|
633 | 883 |
884 /* if packet is NDISC or addressed to multicast or link-local 885 * then keep the ingress interface 886 */ 887 need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr); 888 if (!ipv6_ndisc_frame(skb) && !need_strict) { 889 skb->dev = vrf_dev; 890 skb->skb_iif = vrf_dev->ifindex; 891 |
|
634 skb_push(skb, skb->mac_len); 635 dev_queue_xmit_nit(skb, vrf_dev); 636 skb_pull(skb, skb->mac_len); 637 638 IP6CB(skb)->flags |= IP6SKB_L3SLAVE; 639 } 640 | 892 skb_push(skb, skb->mac_len); 893 dev_queue_xmit_nit(skb, vrf_dev); 894 skb_pull(skb, skb->mac_len); 895 896 IP6CB(skb)->flags |= IP6SKB_L3SLAVE; 897 } 898 |
899 if (need_strict) 900 vrf_ip6_input_dst(skb, vrf_dev, orig_iif); 901 902out: |
|
641 return skb; 642} 643 644#else 645static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev, 646 struct sk_buff *skb) 647{ 648 return skb; 649} 650#endif 651 652static struct sk_buff *vrf_ip_rcv(struct net_device *vrf_dev, 653 struct sk_buff *skb) 654{ 655 skb->dev = vrf_dev; 656 skb->skb_iif = vrf_dev->ifindex; 657 | 903 return skb; 904} 905 906#else 907static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev, 908 struct sk_buff *skb) 909{ 910 return skb; 911} 912#endif 913 914static struct sk_buff *vrf_ip_rcv(struct net_device *vrf_dev, 915 struct sk_buff *skb) 916{ 917 skb->dev = vrf_dev; 918 skb->skb_iif = vrf_dev->ifindex; 919 |
920 /* loopback traffic; do not push through packet taps again. 921 * Reset pkt_type for upper layers to process skb 922 */ 923 if (skb->pkt_type == PACKET_LOOPBACK) { 924 skb->pkt_type = PACKET_HOST; 925 goto out; 926 } 927 |
|
658 skb_push(skb, skb->mac_len); 659 dev_queue_xmit_nit(skb, vrf_dev); 660 skb_pull(skb, skb->mac_len); 661 | 928 skb_push(skb, skb->mac_len); 929 dev_queue_xmit_nit(skb, vrf_dev); 930 skb_pull(skb, skb->mac_len); 931 |
932out: |
|
662 return skb; 663} 664 665/* called with rcu lock held */ 666static struct sk_buff *vrf_l3_rcv(struct net_device *vrf_dev, 667 struct sk_buff *skb, 668 u16 proto) 669{ --- 4 unchanged lines hidden (view full) --- 674 return vrf_ip6_rcv(vrf_dev, skb); 675 } 676 677 return skb; 678} 679 680#if IS_ENABLED(CONFIG_IPV6) 681static struct dst_entry *vrf_get_rt6_dst(const struct net_device *dev, | 933 return skb; 934} 935 936/* called with rcu lock held */ 937static struct sk_buff *vrf_l3_rcv(struct net_device *vrf_dev, 938 struct sk_buff *skb, 939 u16 proto) 940{ --- 4 unchanged lines hidden (view full) --- 945 return vrf_ip6_rcv(vrf_dev, skb); 946 } 947 948 return skb; 949} 950 951#if IS_ENABLED(CONFIG_IPV6) 952static struct dst_entry *vrf_get_rt6_dst(const struct net_device *dev, |
682 const struct flowi6 *fl6) | 953 struct flowi6 *fl6) |
683{ | 954{ |
955 bool need_strict = rt6_need_strict(&fl6->daddr); 956 struct net_vrf *vrf = netdev_priv(dev); 957 struct net *net = dev_net(dev); |
|
684 struct dst_entry *dst = NULL; | 958 struct dst_entry *dst = NULL; |
959 struct rt6_info *rt; |
|
685 | 960 |
686 if (!(fl6->flowi6_flags & FLOWI_FLAG_L3MDEV_SRC)) { 687 struct net_vrf *vrf = netdev_priv(dev); 688 struct rt6_info *rt; | 961 /* send to link-local or multicast address */ 962 if (need_strict) { 963 int flags = RT6_LOOKUP_F_IFACE; |
689 | 964 |
965 /* VRF device does not have a link-local address and 966 * sending packets to link-local or mcast addresses over 967 * a VRF device does not make sense 968 */ 969 if (fl6->flowi6_oif == dev->ifindex) { 970 struct dst_entry *dst = &net->ipv6.ip6_null_entry->dst; 971 972 dst_hold(dst); 973 return dst; 974 } 975 976 if (!ipv6_addr_any(&fl6->saddr)) 977 flags |= RT6_LOOKUP_F_HAS_SADDR; 978 979 rt = vrf_ip6_route_lookup(net, dev, fl6, fl6->flowi6_oif, flags); 980 if (rt) 981 dst = &rt->dst; 982 983 } else if (!(fl6->flowi6_flags & FLOWI_FLAG_L3MDEV_SRC)) { 984 |
|
690 rcu_read_lock(); 691 692 rt = rcu_dereference(vrf->rt6); 693 if (likely(rt)) { 694 dst = &rt->dst; 695 dst_hold(dst); 696 } 697 698 rcu_read_unlock(); 699 } 700 | 985 rcu_read_lock(); 986 987 rt = rcu_dereference(vrf->rt6); 988 if (likely(rt)) { 989 dst = &rt->dst; 990 dst_hold(dst); 991 } 992 993 rcu_read_unlock(); 994 } 995 |
996 /* make sure oif is set to VRF device for lookup */ 997 if (!need_strict) 998 fl6->flowi6_oif = dev->ifindex; 999 |
|
701 return dst; 702} | 1000 return dst; 1001} |
1002 1003/* called under rcu_read_lock */ 1004static int vrf_get_saddr6(struct net_device *dev, const struct sock *sk, 1005 struct flowi6 *fl6) 1006{ 1007 struct net *net = dev_net(dev); 1008 struct dst_entry *dst; 1009 struct rt6_info *rt; 1010 int err; 1011 1012 if (rt6_need_strict(&fl6->daddr)) { 1013 rt = vrf_ip6_route_lookup(net, dev, fl6, fl6->flowi6_oif, 1014 RT6_LOOKUP_F_IFACE); 1015 if (unlikely(!rt)) 1016 return 0; 1017 1018 dst = &rt->dst; 1019 } else { 1020 __u8 flags = fl6->flowi6_flags; 1021 1022 fl6->flowi6_flags |= FLOWI_FLAG_L3MDEV_SRC; 1023 fl6->flowi6_flags |= FLOWI_FLAG_SKIP_NH_OIF; 1024 1025 dst = ip6_route_output(net, sk, fl6); 1026 rt = (struct rt6_info *)dst; 1027 1028 fl6->flowi6_flags = flags; 1029 } 1030 1031 err = dst->error; 1032 if (!err) { 1033 err = ip6_route_get_saddr(net, rt, &fl6->daddr, 1034 sk ? inet6_sk(sk)->srcprefs : 0, 1035 &fl6->saddr); 1036 } 1037 1038 dst_release(dst); 1039 1040 return err; 1041} |
|
703#endif 704 705static const struct l3mdev_ops vrf_l3mdev_ops = { 706 .l3mdev_fib_table = vrf_fib_table, 707 .l3mdev_get_rtable = vrf_get_rtable, 708 .l3mdev_get_saddr = vrf_get_saddr, 709 .l3mdev_l3_rcv = vrf_l3_rcv, 710#if IS_ENABLED(CONFIG_IPV6) 711 .l3mdev_get_rt6_dst = vrf_get_rt6_dst, | 1042#endif 1043 1044static const struct l3mdev_ops vrf_l3mdev_ops = { 1045 .l3mdev_fib_table = vrf_fib_table, 1046 .l3mdev_get_rtable = vrf_get_rtable, 1047 .l3mdev_get_saddr = vrf_get_saddr, 1048 .l3mdev_l3_rcv = vrf_l3_rcv, 1049#if IS_ENABLED(CONFIG_IPV6) 1050 .l3mdev_get_rt6_dst = vrf_get_rt6_dst, |
1051 .l3mdev_get_saddr6 = vrf_get_saddr6, |
|
712#endif 713}; 714 715static void vrf_get_drvinfo(struct net_device *dev, 716 struct ethtool_drvinfo *info) 717{ 718 strlcpy(info->driver, DRV_NAME, sizeof(info->driver)); 719 strlcpy(info->version, DRV_VERSION, sizeof(info->version)); 720} 721 722static const struct ethtool_ops vrf_ethtool_ops = { 723 .get_drvinfo = vrf_get_drvinfo, 724}; 725 | 1052#endif 1053}; 1054 1055static void vrf_get_drvinfo(struct net_device *dev, 1056 struct ethtool_drvinfo *info) 1057{ 1058 strlcpy(info->driver, DRV_NAME, sizeof(info->driver)); 1059 strlcpy(info->version, DRV_VERSION, sizeof(info->version)); 1060} 1061 1062static const struct ethtool_ops vrf_ethtool_ops = { 1063 .get_drvinfo = vrf_get_drvinfo, 1064}; 1065 |
1066static inline size_t vrf_fib_rule_nl_size(void) 1067{ 1068 size_t sz; 1069 1070 sz = NLMSG_ALIGN(sizeof(struct fib_rule_hdr)); 1071 sz += nla_total_size(sizeof(u8)); /* FRA_L3MDEV */ 1072 sz += nla_total_size(sizeof(u32)); /* FRA_PRIORITY */ 1073 1074 return sz; 1075} 1076 1077static int vrf_fib_rule(const struct net_device *dev, __u8 family, bool add_it) 1078{ 1079 struct fib_rule_hdr *frh; 1080 struct nlmsghdr *nlh; 1081 struct sk_buff *skb; 1082 int err; 1083 1084 if (family == AF_INET6 && !ipv6_mod_enabled()) 1085 return 0; 1086 1087 skb = nlmsg_new(vrf_fib_rule_nl_size(), GFP_KERNEL); 1088 if (!skb) 1089 return -ENOMEM; 1090 1091 nlh = nlmsg_put(skb, 0, 0, 0, sizeof(*frh), 0); 1092 if (!nlh) 1093 goto nla_put_failure; 1094 1095 /* rule only needs to appear once */ 1096 nlh->nlmsg_flags &= NLM_F_EXCL; 1097 1098 frh = nlmsg_data(nlh); 1099 memset(frh, 0, sizeof(*frh)); 1100 frh->family = family; 1101 frh->action = FR_ACT_TO_TBL; 1102 1103 if (nla_put_u32(skb, FRA_L3MDEV, 1)) 1104 goto nla_put_failure; 1105 1106 if (nla_put_u32(skb, FRA_PRIORITY, FIB_RULE_PREF)) 1107 goto nla_put_failure; 1108 1109 nlmsg_end(skb, nlh); 1110 1111 /* fib_nl_{new,del}rule handling looks for net from skb->sk */ 1112 skb->sk = dev_net(dev)->rtnl; 1113 if (add_it) { 1114 err = fib_nl_newrule(skb, nlh); 1115 if (err == -EEXIST) 1116 err = 0; 1117 } else { 1118 err = fib_nl_delrule(skb, nlh); 1119 if (err == -ENOENT) 1120 err = 0; 1121 } 1122 nlmsg_free(skb); 1123 1124 return err; 1125 1126nla_put_failure: 1127 nlmsg_free(skb); 1128 1129 return -EMSGSIZE; 1130} 1131 1132static int vrf_add_fib_rules(const struct net_device *dev) 1133{ 1134 int err; 1135 1136 err = vrf_fib_rule(dev, AF_INET, true); 1137 if (err < 0) 1138 goto out_err; 1139 1140 err = vrf_fib_rule(dev, AF_INET6, true); 1141 if (err < 0) 1142 goto ipv6_err; 1143 1144 return 0; 1145 1146ipv6_err: 1147 vrf_fib_rule(dev, AF_INET, false); 1148 1149out_err: 1150 netdev_err(dev, "Failed to add FIB rules.\n"); 1151 return err; 1152} 1153 |
|
726static void vrf_setup(struct net_device *dev) 727{ 728 ether_setup(dev); 729 730 /* Initialize the device structure. */ 731 dev->netdev_ops = &vrf_netdev_ops; 732 dev->l3mdev_ops = &vrf_l3mdev_ops; 733 dev->ethtool_ops = &vrf_ethtool_ops; 734 dev->destructor = free_netdev; 735 736 /* Fill in device structure with ethernet-generic values. */ 737 eth_hw_addr_random(dev); 738 739 /* don't acquire vrf device's netif_tx_lock when transmitting */ 740 dev->features |= NETIF_F_LLTX; 741 742 /* don't allow vrf devices to change network namespaces. */ 743 dev->features |= NETIF_F_NETNS_LOCAL; | 1154static void vrf_setup(struct net_device *dev) 1155{ 1156 ether_setup(dev); 1157 1158 /* Initialize the device structure. */ 1159 dev->netdev_ops = &vrf_netdev_ops; 1160 dev->l3mdev_ops = &vrf_l3mdev_ops; 1161 dev->ethtool_ops = &vrf_ethtool_ops; 1162 dev->destructor = free_netdev; 1163 1164 /* Fill in device structure with ethernet-generic values. */ 1165 eth_hw_addr_random(dev); 1166 1167 /* don't acquire vrf device's netif_tx_lock when transmitting */ 1168 dev->features |= NETIF_F_LLTX; 1169 1170 /* don't allow vrf devices to change network namespaces. */ 1171 dev->features |= NETIF_F_NETNS_LOCAL; |
1172 1173 /* does not make sense for a VLAN to be added to a vrf device */ 1174 dev->features |= NETIF_F_VLAN_CHALLENGED; 1175 1176 /* enable offload features */ 1177 dev->features |= NETIF_F_GSO_SOFTWARE; 1178 dev->features |= NETIF_F_RXCSUM | NETIF_F_HW_CSUM; 1179 dev->features |= NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA; 1180 1181 dev->hw_features = dev->features; 1182 dev->hw_enc_features = dev->features; 1183 1184 /* default to no qdisc; user can add if desired */ 1185 dev->priv_flags |= IFF_NO_QUEUE; |
|
744} 745 746static int vrf_validate(struct nlattr *tb[], struct nlattr *data[]) 747{ 748 if (tb[IFLA_ADDRESS]) { 749 if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN) 750 return -EINVAL; 751 if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) --- 6 unchanged lines hidden (view full) --- 758{ 759 unregister_netdevice_queue(dev, head); 760} 761 762static int vrf_newlink(struct net *src_net, struct net_device *dev, 763 struct nlattr *tb[], struct nlattr *data[]) 764{ 765 struct net_vrf *vrf = netdev_priv(dev); | 1186} 1187 1188static int vrf_validate(struct nlattr *tb[], struct nlattr *data[]) 1189{ 1190 if (tb[IFLA_ADDRESS]) { 1191 if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN) 1192 return -EINVAL; 1193 if (!is_valid_ether_addr(nla_data(tb[IFLA_ADDRESS]))) --- 6 unchanged lines hidden (view full) --- 1200{ 1201 unregister_netdevice_queue(dev, head); 1202} 1203 1204static int vrf_newlink(struct net *src_net, struct net_device *dev, 1205 struct nlattr *tb[], struct nlattr *data[]) 1206{ 1207 struct net_vrf *vrf = netdev_priv(dev); |
1208 int err; |
|
766 767 if (!data || !data[IFLA_VRF_TABLE]) 768 return -EINVAL; 769 770 vrf->tb_id = nla_get_u32(data[IFLA_VRF_TABLE]); 771 772 dev->priv_flags |= IFF_L3MDEV_MASTER; 773 | 1209 1210 if (!data || !data[IFLA_VRF_TABLE]) 1211 return -EINVAL; 1212 1213 vrf->tb_id = nla_get_u32(data[IFLA_VRF_TABLE]); 1214 1215 dev->priv_flags |= IFF_L3MDEV_MASTER; 1216 |
774 return register_netdevice(dev); | 1217 err = register_netdevice(dev); 1218 if (err) 1219 goto out; 1220 1221 if (add_fib_rules) { 1222 err = vrf_add_fib_rules(dev); 1223 if (err) { 1224 unregister_netdevice(dev); 1225 goto out; 1226 } 1227 add_fib_rules = false; 1228 } 1229 1230out: 1231 return err; |
775} 776 777static size_t vrf_nl_getsize(const struct net_device *dev) 778{ 779 return nla_total_size(sizeof(u32)); /* IFLA_VRF_TABLE */ 780} 781 782static int vrf_fillinfo(struct sk_buff *skb, --- 93 unchanged lines hidden --- | 1232} 1233 1234static size_t vrf_nl_getsize(const struct net_device *dev) 1235{ 1236 return nla_total_size(sizeof(u32)); /* IFLA_VRF_TABLE */ 1237} 1238 1239static int vrf_fillinfo(struct sk_buff *skb, --- 93 unchanged lines hidden --- |