11da177e4SLinus Torvalds /* 21da177e4SLinus Torvalds * INET An implementation of the TCP/IP protocol suite for the LINUX 31da177e4SLinus Torvalds * operating system. INET is implemented using the BSD Socket 41da177e4SLinus Torvalds * interface as the means of communication with the user level. 51da177e4SLinus Torvalds * 61da177e4SLinus Torvalds * The Internet Protocol (IP) module. 71da177e4SLinus Torvalds * 802c30a84SJesper Juhl * Authors: Ross Biro 91da177e4SLinus Torvalds * Fred N. van Kempen, <waltje@uWalt.NL.Mugnet.ORG> 101da177e4SLinus Torvalds * Donald Becker, <becker@super.org> 11113aa838SAlan Cox * Alan Cox, <alan@lxorguk.ukuu.org.uk> 121da177e4SLinus Torvalds * Richard Underwood 131da177e4SLinus Torvalds * Stefan Becker, <stefanb@yello.ping.de> 141da177e4SLinus Torvalds * Jorge Cwik, <jorge@laser.satlink.net> 151da177e4SLinus Torvalds * Arnt Gulbrandsen, <agulbra@nvg.unit.no> 161da177e4SLinus Torvalds * 171da177e4SLinus Torvalds * 181da177e4SLinus Torvalds * Fixes: 191da177e4SLinus Torvalds * Alan Cox : Commented a couple of minor bits of surplus code 201da177e4SLinus Torvalds * Alan Cox : Undefining IP_FORWARD doesn't include the code 211da177e4SLinus Torvalds * (just stops a compiler warning). 221da177e4SLinus Torvalds * Alan Cox : Frames with >=MAX_ROUTE record routes, strict routes or loose routes 231da177e4SLinus Torvalds * are junked rather than corrupting things. 241da177e4SLinus Torvalds * Alan Cox : Frames to bad broadcast subnets are dumped 251da177e4SLinus Torvalds * We used to process them non broadcast and 261da177e4SLinus Torvalds * boy could that cause havoc. 271da177e4SLinus Torvalds * Alan Cox : ip_forward sets the free flag on the 281da177e4SLinus Torvalds * new frame it queues. Still crap because 291da177e4SLinus Torvalds * it copies the frame but at least it 301da177e4SLinus Torvalds * doesn't eat memory too. 311da177e4SLinus Torvalds * Alan Cox : Generic queue code and memory fixes. 321da177e4SLinus Torvalds * Fred Van Kempen : IP fragment support (borrowed from NET2E) 331da177e4SLinus Torvalds * Gerhard Koerting: Forward fragmented frames correctly. 341da177e4SLinus Torvalds * Gerhard Koerting: Fixes to my fix of the above 8-). 351da177e4SLinus Torvalds * Gerhard Koerting: IP interface addressing fix. 361da177e4SLinus Torvalds * Linus Torvalds : More robustness checks 371da177e4SLinus Torvalds * Alan Cox : Even more checks: Still not as robust as it ought to be 381da177e4SLinus Torvalds * Alan Cox : Save IP header pointer for later 391da177e4SLinus Torvalds * Alan Cox : ip option setting 401da177e4SLinus Torvalds * Alan Cox : Use ip_tos/ip_ttl settings 411da177e4SLinus Torvalds * Alan Cox : Fragmentation bogosity removed 421da177e4SLinus Torvalds * (Thanks to Mark.Bush@prg.ox.ac.uk) 431da177e4SLinus Torvalds * Dmitry Gorodchanin : Send of a raw packet crash fix. 441da177e4SLinus Torvalds * Alan Cox : Silly ip bug when an overlength 451da177e4SLinus Torvalds * fragment turns up. Now frees the 461da177e4SLinus Torvalds * queue. 471da177e4SLinus Torvalds * Linus Torvalds/ : Memory leakage on fragmentation 481da177e4SLinus Torvalds * Alan Cox : handling. 491da177e4SLinus Torvalds * Gerhard Koerting: Forwarding uses IP priority hints 501da177e4SLinus Torvalds * Teemu Rantanen : Fragment problems. 511da177e4SLinus Torvalds * Alan Cox : General cleanup, comments and reformat 521da177e4SLinus Torvalds * Alan Cox : SNMP statistics 531da177e4SLinus Torvalds * Alan Cox : BSD address rule semantics. Also see 541da177e4SLinus Torvalds * UDP as there is a nasty checksum issue 551da177e4SLinus Torvalds * if you do things the wrong way. 561da177e4SLinus Torvalds * Alan Cox : Always defrag, moved IP_FORWARD to the config.in file 571da177e4SLinus Torvalds * Alan Cox : IP options adjust sk->priority. 581da177e4SLinus Torvalds * Pedro Roque : Fix mtu/length error in ip_forward. 591da177e4SLinus Torvalds * Alan Cox : Avoid ip_chk_addr when possible. 601da177e4SLinus Torvalds * Richard Underwood : IP multicasting. 611da177e4SLinus Torvalds * Alan Cox : Cleaned up multicast handlers. 621da177e4SLinus Torvalds * Alan Cox : RAW sockets demultiplex in the BSD style. 631da177e4SLinus Torvalds * Gunther Mayer : Fix the SNMP reporting typo 641da177e4SLinus Torvalds * Alan Cox : Always in group 224.0.0.1 651da177e4SLinus Torvalds * Pauline Middelink : Fast ip_checksum update when forwarding 661da177e4SLinus Torvalds * Masquerading support. 671da177e4SLinus Torvalds * Alan Cox : Multicast loopback error for 224.0.0.1 681da177e4SLinus Torvalds * Alan Cox : IP_MULTICAST_LOOP option. 691da177e4SLinus Torvalds * Alan Cox : Use notifiers. 701da177e4SLinus Torvalds * Bjorn Ekwall : Removed ip_csum (from slhc.c too) 711da177e4SLinus Torvalds * Bjorn Ekwall : Moved ip_fast_csum to ip.h (inline!) 721da177e4SLinus Torvalds * Stefan Becker : Send out ICMP HOST REDIRECT 731da177e4SLinus Torvalds * Arnt Gulbrandsen : ip_build_xmit 741da177e4SLinus Torvalds * Alan Cox : Per socket routing cache 751da177e4SLinus Torvalds * Alan Cox : Fixed routing cache, added header cache. 761da177e4SLinus Torvalds * Alan Cox : Loopback didn't work right in original ip_build_xmit - fixed it. 771da177e4SLinus Torvalds * Alan Cox : Only send ICMP_REDIRECT if src/dest are the same net. 781da177e4SLinus Torvalds * Alan Cox : Incoming IP option handling. 791da177e4SLinus Torvalds * Alan Cox : Set saddr on raw output frames as per BSD. 801da177e4SLinus Torvalds * Alan Cox : Stopped broadcast source route explosions. 811da177e4SLinus Torvalds * Alan Cox : Can disable source routing 821da177e4SLinus Torvalds * Takeshi Sone : Masquerading didn't work. 831da177e4SLinus Torvalds * Dave Bonn,Alan Cox : Faster IP forwarding whenever possible. 841da177e4SLinus Torvalds * Alan Cox : Memory leaks, tramples, misc debugging. 851da177e4SLinus Torvalds * Alan Cox : Fixed multicast (by popular demand 8)) 861da177e4SLinus Torvalds * Alan Cox : Fixed forwarding (by even more popular demand 8)) 871da177e4SLinus Torvalds * Alan Cox : Fixed SNMP statistics [I think] 881da177e4SLinus Torvalds * Gerhard Koerting : IP fragmentation forwarding fix 891da177e4SLinus Torvalds * Alan Cox : Device lock against page fault. 901da177e4SLinus Torvalds * Alan Cox : IP_HDRINCL facility. 911da177e4SLinus Torvalds * Werner Almesberger : Zero fragment bug 921da177e4SLinus Torvalds * Alan Cox : RAW IP frame length bug 931da177e4SLinus Torvalds * Alan Cox : Outgoing firewall on build_xmit 941da177e4SLinus Torvalds * A.N.Kuznetsov : IP_OPTIONS support throughout the kernel 951da177e4SLinus Torvalds * Alan Cox : Multicast routing hooks 961da177e4SLinus Torvalds * Jos Vos : Do accounting *before* call_in_firewall 971da177e4SLinus Torvalds * Willy Konynenberg : Transparent proxying support 981da177e4SLinus Torvalds * 991da177e4SLinus Torvalds * 1001da177e4SLinus Torvalds * 1011da177e4SLinus Torvalds * To Fix: 1021da177e4SLinus Torvalds * IP fragmentation wants rewriting cleanly. The RFC815 algorithm is much more efficient 1031da177e4SLinus Torvalds * and could be made very efficient with the addition of some virtual memory hacks to permit 1041da177e4SLinus Torvalds * the allocation of a buffer that can then be 'grown' by twiddling page tables. 1051da177e4SLinus Torvalds * Output fragmentation wants updating along with the buffer management to use a single 1061da177e4SLinus Torvalds * interleaved copy algorithm so that fragmenting has a one copy overhead. Actual packet 1071da177e4SLinus Torvalds * output should probably do its own fragmentation at the UDP/RAW layer. TCP shouldn't cause 1081da177e4SLinus Torvalds * fragmentation anyway. 1091da177e4SLinus Torvalds * 1101da177e4SLinus Torvalds * This program is free software; you can redistribute it and/or 1111da177e4SLinus Torvalds * modify it under the terms of the GNU General Public License 1121da177e4SLinus Torvalds * as published by the Free Software Foundation; either version 1131da177e4SLinus Torvalds * 2 of the License, or (at your option) any later version. 1141da177e4SLinus Torvalds */ 1151da177e4SLinus Torvalds 116afd46503SJoe Perches #define pr_fmt(fmt) "IPv4: " fmt 117afd46503SJoe Perches 1181da177e4SLinus Torvalds #include <linux/module.h> 1191da177e4SLinus Torvalds #include <linux/types.h> 1201da177e4SLinus Torvalds #include <linux/kernel.h> 1211da177e4SLinus Torvalds #include <linux/string.h> 1221da177e4SLinus Torvalds #include <linux/errno.h> 1235a0e3ad6STejun Heo #include <linux/slab.h> 1241da177e4SLinus Torvalds 1251da177e4SLinus Torvalds #include <linux/net.h> 1261da177e4SLinus Torvalds #include <linux/socket.h> 1271da177e4SLinus Torvalds #include <linux/sockios.h> 1281da177e4SLinus Torvalds #include <linux/in.h> 1291da177e4SLinus Torvalds #include <linux/inet.h> 13014c85021SArnaldo Carvalho de Melo #include <linux/inetdevice.h> 1311da177e4SLinus Torvalds #include <linux/netdevice.h> 1321da177e4SLinus Torvalds #include <linux/etherdevice.h> 1331da177e4SLinus Torvalds 1341da177e4SLinus Torvalds #include <net/snmp.h> 1351da177e4SLinus Torvalds #include <net/ip.h> 1361da177e4SLinus Torvalds #include <net/protocol.h> 1371da177e4SLinus Torvalds #include <net/route.h> 1381da177e4SLinus Torvalds #include <linux/skbuff.h> 1391da177e4SLinus Torvalds #include <net/sock.h> 1401da177e4SLinus Torvalds #include <net/arp.h> 1411da177e4SLinus Torvalds #include <net/icmp.h> 1421da177e4SLinus Torvalds #include <net/raw.h> 1431da177e4SLinus Torvalds #include <net/checksum.h> 1441da177e4SLinus Torvalds #include <linux/netfilter_ipv4.h> 1451da177e4SLinus Torvalds #include <net/xfrm.h> 1461da177e4SLinus Torvalds #include <linux/mroute.h> 1471da177e4SLinus Torvalds #include <linux/netlink.h> 1481da177e4SLinus Torvalds 1491da177e4SLinus Torvalds /* 15066018506SEric Dumazet * Process Router Attention IP option (RFC 2113) 1511da177e4SLinus Torvalds */ 152ba57b4dbSDavid S. Miller bool ip_call_ra_chain(struct sk_buff *skb) 1531da177e4SLinus Torvalds { 1541da177e4SLinus Torvalds struct ip_ra_chain *ra; 155eddc9ec5SArnaldo Carvalho de Melo u8 protocol = ip_hdr(skb)->protocol; 1561da177e4SLinus Torvalds struct sock *last = NULL; 157cb84663eSDenis V. Lunev struct net_device *dev = skb->dev; 1581da177e4SLinus Torvalds 15966018506SEric Dumazet for (ra = rcu_dereference(ip_ra_chain); ra; ra = rcu_dereference(ra->next)) { 1601da177e4SLinus Torvalds struct sock *sk = ra->sk; 1611da177e4SLinus Torvalds 1621da177e4SLinus Torvalds /* If socket is bound to an interface, only report 1631da177e4SLinus Torvalds * the packet if it came from that interface. 1641da177e4SLinus Torvalds */ 165c720c7e8SEric Dumazet if (sk && inet_sk(sk)->inet_num == protocol && 1661da177e4SLinus Torvalds (!sk->sk_bound_dev_if || 167cb84663eSDenis V. Lunev sk->sk_bound_dev_if == dev->ifindex) && 16809ad9bc7SOctavian Purdila net_eq(sock_net(sk), dev_net(dev))) { 16956f8a75cSPaul Gortmaker if (ip_is_fragment(ip_hdr(skb))) { 17066018506SEric Dumazet if (ip_defrag(skb, IP_DEFRAG_CALL_RA_CHAIN)) 171ba57b4dbSDavid S. Miller return true; 1721da177e4SLinus Torvalds } 1731da177e4SLinus Torvalds if (last) { 1741da177e4SLinus Torvalds struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC); 1751da177e4SLinus Torvalds if (skb2) 1761da177e4SLinus Torvalds raw_rcv(last, skb2); 1771da177e4SLinus Torvalds } 1781da177e4SLinus Torvalds last = sk; 1791da177e4SLinus Torvalds } 1801da177e4SLinus Torvalds } 1811da177e4SLinus Torvalds 1821da177e4SLinus Torvalds if (last) { 1831da177e4SLinus Torvalds raw_rcv(last, skb); 184ba57b4dbSDavid S. Miller return true; 1851da177e4SLinus Torvalds } 186ba57b4dbSDavid S. Miller return false; 1871da177e4SLinus Torvalds } 1881da177e4SLinus Torvalds 189861d0486SPatrick McHardy static int ip_local_deliver_finish(struct sk_buff *skb) 1901da177e4SLinus Torvalds { 191c346dca1SYOSHIFUJI Hideaki struct net *net = dev_net(skb->dev); 192f145049aSDenis V. Lunev 193c9bdd4b5SArnaldo Carvalho de Melo __skb_pull(skb, ip_hdrlen(skb)); 1941da177e4SLinus Torvalds 1951da177e4SLinus Torvalds /* Point into the IP datagram, just past the header. */ 196badff6d0SArnaldo Carvalho de Melo skb_reset_transport_header(skb); 1971da177e4SLinus Torvalds 1981da177e4SLinus Torvalds rcu_read_lock(); 1991da177e4SLinus Torvalds { 200eddc9ec5SArnaldo Carvalho de Melo int protocol = ip_hdr(skb)->protocol; 20132613090SAlexey Dobriyan const struct net_protocol *ipprot; 202f9242b6bSDavid S. Miller int raw; 2031da177e4SLinus Torvalds 2041da177e4SLinus Torvalds resubmit: 2057bc54c90SPavel Emelyanov raw = raw_local_deliver(skb, protocol); 2067bc54c90SPavel Emelyanov 207f9242b6bSDavid S. Miller ipprot = rcu_dereference(inet_protos[protocol]); 2089c0188acSAlexey Dobriyan if (ipprot != NULL) { 2091da177e4SLinus Torvalds int ret; 2101da177e4SLinus Torvalds 2119c0188acSAlexey Dobriyan if (!net_eq(net, &init_net) && !ipprot->netns_ok) { 212e87cc472SJoe Perches net_info_ratelimited("%s: proto %d isn't netns-ready\n", 2139c0188acSAlexey Dobriyan __func__, protocol); 2149c0188acSAlexey Dobriyan kfree_skb(skb); 2159c0188acSAlexey Dobriyan goto out; 2169c0188acSAlexey Dobriyan } 2179c0188acSAlexey Dobriyan 218b59c2701SPatrick McHardy if (!ipprot->no_policy) { 219b59c2701SPatrick McHardy if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) { 2201da177e4SLinus Torvalds kfree_skb(skb); 2211da177e4SLinus Torvalds goto out; 2221da177e4SLinus Torvalds } 223b59c2701SPatrick McHardy nf_reset(skb); 224b59c2701SPatrick McHardy } 2251da177e4SLinus Torvalds ret = ipprot->handler(skb); 2261da177e4SLinus Torvalds if (ret < 0) { 2271da177e4SLinus Torvalds protocol = -ret; 2281da177e4SLinus Torvalds goto resubmit; 2291da177e4SLinus Torvalds } 2307c73a6faSPavel Emelyanov IP_INC_STATS_BH(net, IPSTATS_MIB_INDELIVERS); 2311da177e4SLinus Torvalds } else { 2327bc54c90SPavel Emelyanov if (!raw) { 2331da177e4SLinus Torvalds if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) { 2347c73a6faSPavel Emelyanov IP_INC_STATS_BH(net, IPSTATS_MIB_INUNKNOWNPROTOS); 2351da177e4SLinus Torvalds icmp_send(skb, ICMP_DEST_UNREACH, 2361da177e4SLinus Torvalds ICMP_PROT_UNREACH, 0); 2371da177e4SLinus Torvalds } 2381da177e4SLinus Torvalds } else 2397c73a6faSPavel Emelyanov IP_INC_STATS_BH(net, IPSTATS_MIB_INDELIVERS); 2401da177e4SLinus Torvalds kfree_skb(skb); 2411da177e4SLinus Torvalds } 2421da177e4SLinus Torvalds } 2431da177e4SLinus Torvalds out: 2441da177e4SLinus Torvalds rcu_read_unlock(); 2451da177e4SLinus Torvalds 2461da177e4SLinus Torvalds return 0; 2471da177e4SLinus Torvalds } 2481da177e4SLinus Torvalds 2491da177e4SLinus Torvalds /* 2501da177e4SLinus Torvalds * Deliver IP Packets to the higher protocol layers. 2511da177e4SLinus Torvalds */ 2521da177e4SLinus Torvalds int ip_local_deliver(struct sk_buff *skb) 2531da177e4SLinus Torvalds { 2541da177e4SLinus Torvalds /* 2551da177e4SLinus Torvalds * Reassemble IP fragments. 2561da177e4SLinus Torvalds */ 2571da177e4SLinus Torvalds 25856f8a75cSPaul Gortmaker if (ip_is_fragment(ip_hdr(skb))) { 259776c729eSHerbert Xu if (ip_defrag(skb, IP_DEFRAG_LOCAL_DELIVER)) 2601da177e4SLinus Torvalds return 0; 2611da177e4SLinus Torvalds } 2621da177e4SLinus Torvalds 2639bbc768aSJan Engelhardt return NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_IN, skb, skb->dev, NULL, 2641da177e4SLinus Torvalds ip_local_deliver_finish); 2651da177e4SLinus Torvalds } 2661da177e4SLinus Torvalds 2676a91395fSDavid S. Miller static inline bool ip_rcv_options(struct sk_buff *skb) 268d245407eSThomas Graf { 269d245407eSThomas Graf struct ip_options *opt; 270b71d1d42SEric Dumazet const struct iphdr *iph; 271d245407eSThomas Graf struct net_device *dev = skb->dev; 272d245407eSThomas Graf 273d245407eSThomas Graf /* It looks as overkill, because not all 274d245407eSThomas Graf IP options require packet mangling. 275d245407eSThomas Graf But it is the easiest for now, especially taking 276d245407eSThomas Graf into account that combination of IP options 277d245407eSThomas Graf and running sniffer is extremely rare condition. 278d245407eSThomas Graf --ANK (980813) 279d245407eSThomas Graf */ 280d245407eSThomas Graf if (skb_cow(skb, skb_headroom(skb))) { 2817c73a6faSPavel Emelyanov IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS); 282d245407eSThomas Graf goto drop; 283d245407eSThomas Graf } 284d245407eSThomas Graf 285eddc9ec5SArnaldo Carvalho de Melo iph = ip_hdr(skb); 28622aba383SDenis V. Lunev opt = &(IPCB(skb)->opt); 28722aba383SDenis V. Lunev opt->optlen = iph->ihl*4 - sizeof(struct iphdr); 288d245407eSThomas Graf 289c346dca1SYOSHIFUJI Hideaki if (ip_options_compile(dev_net(dev), opt, skb)) { 2907c73a6faSPavel Emelyanov IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INHDRERRORS); 291d245407eSThomas Graf goto drop; 292d245407eSThomas Graf } 293d245407eSThomas Graf 294d245407eSThomas Graf if (unlikely(opt->srr)) { 2956e8b11b4SEric Dumazet struct in_device *in_dev = __in_dev_get_rcu(dev); 2966e8b11b4SEric Dumazet 297d245407eSThomas Graf if (in_dev) { 298d245407eSThomas Graf if (!IN_DEV_SOURCE_ROUTE(in_dev)) { 299e87cc472SJoe Perches if (IN_DEV_LOG_MARTIANS(in_dev)) 300e87cc472SJoe Perches net_info_ratelimited("source route option %pI4 -> %pI4\n", 301e87cc472SJoe Perches &iph->saddr, 302e87cc472SJoe Perches &iph->daddr); 303d245407eSThomas Graf goto drop; 304d245407eSThomas Graf } 305d245407eSThomas Graf } 306d245407eSThomas Graf 307d245407eSThomas Graf if (ip_options_rcv_srr(skb)) 308d245407eSThomas Graf goto drop; 309d245407eSThomas Graf } 310d245407eSThomas Graf 3116a91395fSDavid S. Miller return false; 312d245407eSThomas Graf drop: 3136a91395fSDavid S. Miller return true; 314d245407eSThomas Graf } 315d245407eSThomas Graf 3166648bd7eSAlexander Duyck int sysctl_ip_early_demux __read_mostly = 1; 3176648bd7eSAlexander Duyck 318861d0486SPatrick McHardy static int ip_rcv_finish(struct sk_buff *skb) 3191da177e4SLinus Torvalds { 320eddc9ec5SArnaldo Carvalho de Melo const struct iphdr *iph = ip_hdr(skb); 3215506b54bSMitsuru Chinen struct rtable *rt; 3221da177e4SLinus Torvalds 3231da177e4SLinus Torvalds /* 3241da177e4SLinus Torvalds * Initialise the virtual path cache for the packet. It describes 3251da177e4SLinus Torvalds * how the packet travels inside Linux networking. 3261da177e4SLinus Torvalds */ 327adf30907SEric Dumazet if (skb_dst(skb) == NULL) { 3286648bd7eSAlexander Duyck int err = -ENOENT; 3296648bd7eSAlexander Duyck 3306648bd7eSAlexander Duyck if (sysctl_ip_early_demux) { 33141063e9dSDavid S. Miller const struct net_protocol *ipprot; 33241063e9dSDavid S. Miller int protocol = iph->protocol; 33341063e9dSDavid S. Miller 33441063e9dSDavid S. Miller rcu_read_lock(); 33541063e9dSDavid S. Miller ipprot = rcu_dereference(inet_protos[protocol]); 33641063e9dSDavid S. Miller if (ipprot && ipprot->early_demux) 33741063e9dSDavid S. Miller err = ipprot->early_demux(skb); 33841063e9dSDavid S. Miller rcu_read_unlock(); 3396648bd7eSAlexander Duyck } 34041063e9dSDavid S. Miller 34141063e9dSDavid S. Miller if (err) { 34241063e9dSDavid S. Miller err = ip_route_input_noref(skb, iph->daddr, iph->saddr, 3434a94445cSEric Dumazet iph->tos, skb->dev); 3443e192beaSThomas Graf if (unlikely(err)) { 345251da413SDavid S. Miller if (err == -EXDEV) 346b5f7e755SEric Dumazet NET_INC_STATS_BH(dev_net(skb->dev), 347b5f7e755SEric Dumazet LINUX_MIB_IPRPFILTER); 3481da177e4SLinus Torvalds goto drop; 3491da177e4SLinus Torvalds } 3502c2910a4SDietmar Eggemann } 35141063e9dSDavid S. Miller } 3521da177e4SLinus Torvalds 353c7066f70SPatrick McHardy #ifdef CONFIG_IP_ROUTE_CLASSID 354adf30907SEric Dumazet if (unlikely(skb_dst(skb)->tclassid)) { 3557a9b2d59SEric Dumazet struct ip_rt_acct *st = this_cpu_ptr(ip_rt_acct); 356adf30907SEric Dumazet u32 idx = skb_dst(skb)->tclassid; 3571da177e4SLinus Torvalds st[idx&0xFF].o_packets++; 3581da177e4SLinus Torvalds st[idx&0xFF].o_bytes += skb->len; 3591da177e4SLinus Torvalds st[(idx>>16)&0xFF].i_packets++; 3601da177e4SLinus Torvalds st[(idx>>16)&0xFF].i_bytes += skb->len; 3611da177e4SLinus Torvalds } 3621da177e4SLinus Torvalds #endif 3631da177e4SLinus Torvalds 364d245407eSThomas Graf if (iph->ihl > 5 && ip_rcv_options(skb)) 3651da177e4SLinus Torvalds goto drop; 3661da177e4SLinus Torvalds 367511c3f92SEric Dumazet rt = skb_rtable(skb); 368edf391ffSNeil Horman if (rt->rt_type == RTN_MULTICAST) { 369d8d1f30bSChangli Gao IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INMCAST, 370edf391ffSNeil Horman skb->len); 371edf391ffSNeil Horman } else if (rt->rt_type == RTN_BROADCAST) 372d8d1f30bSChangli Gao IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INBCAST, 373edf391ffSNeil Horman skb->len); 3745506b54bSMitsuru Chinen 3751da177e4SLinus Torvalds return dst_input(skb); 3761da177e4SLinus Torvalds 3771da177e4SLinus Torvalds drop: 3781da177e4SLinus Torvalds kfree_skb(skb); 3791da177e4SLinus Torvalds return NET_RX_DROP; 3801da177e4SLinus Torvalds } 3811da177e4SLinus Torvalds 3821da177e4SLinus Torvalds /* 3831da177e4SLinus Torvalds * Main IP Receive routine. 3841da177e4SLinus Torvalds */ 385f2ccd8faSDavid S. Miller int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) 3861da177e4SLinus Torvalds { 387b71d1d42SEric Dumazet const struct iphdr *iph; 38858615242SThomas Graf u32 len; 3891da177e4SLinus Torvalds 3901da177e4SLinus Torvalds /* When the interface is in promisc. mode, drop all the crap 3911da177e4SLinus Torvalds * that it receives, do not try to analyse it. 3921da177e4SLinus Torvalds */ 3931da177e4SLinus Torvalds if (skb->pkt_type == PACKET_OTHERHOST) 3941da177e4SLinus Torvalds goto drop; 3951da177e4SLinus Torvalds 396edf391ffSNeil Horman 397edf391ffSNeil Horman IP_UPD_PO_STATS_BH(dev_net(dev), IPSTATS_MIB_IN, skb->len); 3981da177e4SLinus Torvalds 3991da177e4SLinus Torvalds if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL) { 4007c73a6faSPavel Emelyanov IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS); 4011da177e4SLinus Torvalds goto out; 4021da177e4SLinus Torvalds } 4031da177e4SLinus Torvalds 4041da177e4SLinus Torvalds if (!pskb_may_pull(skb, sizeof(struct iphdr))) 4051da177e4SLinus Torvalds goto inhdr_error; 4061da177e4SLinus Torvalds 407eddc9ec5SArnaldo Carvalho de Melo iph = ip_hdr(skb); 4081da177e4SLinus Torvalds 4091da177e4SLinus Torvalds /* 410c67fa027SJ.H.M. Dassen (Ray) * RFC1122: 3.2.1.2 MUST silently discard any IP frame that fails the checksum. 4111da177e4SLinus Torvalds * 4121da177e4SLinus Torvalds * Is the datagram acceptable? 4131da177e4SLinus Torvalds * 4141da177e4SLinus Torvalds * 1. Length at least the size of an ip header 4151da177e4SLinus Torvalds * 2. Version of 4 4161da177e4SLinus Torvalds * 3. Checksums correctly. [Speed optimisation for later, skip loopback checksums] 4171da177e4SLinus Torvalds * 4. Doesn't have a bogus length 4181da177e4SLinus Torvalds */ 4191da177e4SLinus Torvalds 4201da177e4SLinus Torvalds if (iph->ihl < 5 || iph->version != 4) 4211da177e4SLinus Torvalds goto inhdr_error; 4221da177e4SLinus Torvalds 4231da177e4SLinus Torvalds if (!pskb_may_pull(skb, iph->ihl*4)) 4241da177e4SLinus Torvalds goto inhdr_error; 4251da177e4SLinus Torvalds 426eddc9ec5SArnaldo Carvalho de Melo iph = ip_hdr(skb); 4271da177e4SLinus Torvalds 428e9c60422SThomas Graf if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl))) 4291da177e4SLinus Torvalds goto inhdr_error; 4301da177e4SLinus Torvalds 43158615242SThomas Graf len = ntohs(iph->tot_len); 432704aed53SMitsuru Chinen if (skb->len < len) { 4337c73a6faSPavel Emelyanov IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INTRUNCATEDPKTS); 434704aed53SMitsuru Chinen goto drop; 435704aed53SMitsuru Chinen } else if (len < (iph->ihl*4)) 4361da177e4SLinus Torvalds goto inhdr_error; 4371da177e4SLinus Torvalds 4381da177e4SLinus Torvalds /* Our transport medium may have padded the buffer out. Now we know it 4391da177e4SLinus Torvalds * is IP we can trim to the true length of the frame. 4401da177e4SLinus Torvalds * Note this now means skb->len holds ntohs(iph->tot_len). 4411da177e4SLinus Torvalds */ 4421da177e4SLinus Torvalds if (pskb_trim_rcsum(skb, len)) { 4437c73a6faSPavel Emelyanov IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS); 4441da177e4SLinus Torvalds goto drop; 4451da177e4SLinus Torvalds } 4461da177e4SLinus Torvalds 44753602f92SStephen Hemminger /* Remove any debris in the socket control block */ 448d569f1d7SGuillaume Chazarain memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); 44953602f92SStephen Hemminger 45071f9dacdSHerbert Xu /* Must drop socket now because of tproxy. */ 45171f9dacdSHerbert Xu skb_orphan(skb); 45271f9dacdSHerbert Xu 4539bbc768aSJan Engelhardt return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL, 4541da177e4SLinus Torvalds ip_rcv_finish); 4551da177e4SLinus Torvalds 4561da177e4SLinus Torvalds inhdr_error: 4577c73a6faSPavel Emelyanov IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INHDRERRORS); 4581da177e4SLinus Torvalds drop: 4591da177e4SLinus Torvalds kfree_skb(skb); 4601da177e4SLinus Torvalds out: 4611da177e4SLinus Torvalds return NET_RX_DROP; 4621da177e4SLinus Torvalds } 463