1ae5220c6SRandy Dunlap============ 2b08794a9SyupengSNMP counter 3ae5220c6SRandy Dunlap============ 4b08794a9Syupeng 5b08794a9SyupengThis document explains the meaning of SNMP counters. 6b08794a9Syupeng 7b08794a9SyupengGeneral IPv4 counters 8ae5220c6SRandy Dunlap===================== 9b08794a9SyupengAll layer 4 packets and ICMP packets will change these counters, but 10b08794a9Syupengthese counters won't be changed by layer 2 packets (such as STP) or 11b08794a9SyupengARP packets. 12b08794a9Syupeng 13b08794a9Syupeng* IpInReceives 14ae5220c6SRandy Dunlap 15b08794a9SyupengDefined in `RFC1213 ipInReceives`_ 16b08794a9Syupeng 17b08794a9Syupeng.. _RFC1213 ipInReceives: https://tools.ietf.org/html/rfc1213#page-26 18b08794a9Syupeng 19b08794a9SyupengThe number of packets received by the IP layer. It gets increasing at the 20b08794a9Syupengbeginning of ip_rcv function, always be updated together with 218e2ea53aSyupengIpExtInOctets. It will be increased even if the packet is dropped 228e2ea53aSyupenglater (e.g. due to the IP header is invalid or the checksum is wrong 238e2ea53aSyupengand so on). It indicates the number of aggregated segments after 24b08794a9SyupengGRO/LRO. 25b08794a9Syupeng 26b08794a9Syupeng* IpInDelivers 27ae5220c6SRandy Dunlap 28b08794a9SyupengDefined in `RFC1213 ipInDelivers`_ 29b08794a9Syupeng 30b08794a9Syupeng.. _RFC1213 ipInDelivers: https://tools.ietf.org/html/rfc1213#page-28 31b08794a9Syupeng 32b08794a9SyupengThe number of packets delivers to the upper layer protocols. E.g. TCP, UDP, 33b08794a9SyupengICMP and so on. If no one listens on a raw socket, only kernel 34b08794a9Syupengsupported protocols will be delivered, if someone listens on the raw 35b08794a9Syupengsocket, all valid IP packets will be delivered. 36b08794a9Syupeng 37b08794a9Syupeng* IpOutRequests 38ae5220c6SRandy Dunlap 39b08794a9SyupengDefined in `RFC1213 ipOutRequests`_ 40b08794a9Syupeng 41b08794a9Syupeng.. _RFC1213 ipOutRequests: https://tools.ietf.org/html/rfc1213#page-28 42b08794a9Syupeng 43b08794a9SyupengThe number of packets sent via IP layer, for both single cast and 44b08794a9Syupengmulticast packets, and would always be updated together with 45b08794a9SyupengIpExtOutOctets. 46b08794a9Syupeng 47b08794a9Syupeng* IpExtInOctets and IpExtOutOctets 48ae5220c6SRandy Dunlap 4980cc4950SyupengThey are Linux kernel extensions, no RFC definitions. Please note, 50b08794a9SyupengRFC1213 indeed defines ifInOctets and ifOutOctets, but they 51b08794a9Syupengare different things. The ifInOctets and ifOutOctets include the MAC 52b08794a9Syupenglayer header size but IpExtInOctets and IpExtOutOctets don't, they 53b08794a9Syupengonly include the IP layer header and the IP layer data. 54b08794a9Syupeng 55b08794a9Syupeng* IpExtInNoECTPkts, IpExtInECT1Pkts, IpExtInECT0Pkts, IpExtInCEPkts 56ae5220c6SRandy Dunlap 57b08794a9SyupengThey indicate the number of four kinds of ECN IP packets, please refer 58b08794a9Syupeng`Explicit Congestion Notification`_ for more details. 59b08794a9Syupeng 60b08794a9Syupeng.. _Explicit Congestion Notification: https://tools.ietf.org/html/rfc3168#page-6 61b08794a9Syupeng 62b08794a9SyupengThese 4 counters calculate how many packets received per ECN 63b08794a9Syupengstatus. They count the real frame number regardless the LRO/GRO. So 64b08794a9Syupengfor the same packet, you might find that IpInReceives count 1, but 65b08794a9SyupengIpExtInNoECTPkts counts 2 or more. 66b08794a9Syupeng 678e2ea53aSyupeng* IpInHdrErrors 68ae5220c6SRandy Dunlap 698e2ea53aSyupengDefined in `RFC1213 ipInHdrErrors`_. It indicates the packet is 708e2ea53aSyupengdropped due to the IP header error. It might happen in both IP input 718e2ea53aSyupengand IP forward paths. 728e2ea53aSyupeng 738e2ea53aSyupeng.. _RFC1213 ipInHdrErrors: https://tools.ietf.org/html/rfc1213#page-27 748e2ea53aSyupeng 758e2ea53aSyupeng* IpInAddrErrors 76ae5220c6SRandy Dunlap 778e2ea53aSyupengDefined in `RFC1213 ipInAddrErrors`_. It will be increased in two 788e2ea53aSyupengscenarios: (1) The IP address is invalid. (2) The destination IP 798e2ea53aSyupengaddress is not a local address and IP forwarding is not enabled 808e2ea53aSyupeng 818e2ea53aSyupeng.. _RFC1213 ipInAddrErrors: https://tools.ietf.org/html/rfc1213#page-27 828e2ea53aSyupeng 838e2ea53aSyupeng* IpExtInNoRoutes 84ae5220c6SRandy Dunlap 858e2ea53aSyupengThis counter means the packet is dropped when the IP stack receives a 868e2ea53aSyupengpacket and can't find a route for it from the route table. It might 878e2ea53aSyupenghappen when IP forwarding is enabled and the destination IP address is 888e2ea53aSyupengnot a local address and there is no route for the destination IP 898e2ea53aSyupengaddress. 908e2ea53aSyupeng 918e2ea53aSyupeng* IpInUnknownProtos 92ae5220c6SRandy Dunlap 938e2ea53aSyupengDefined in `RFC1213 ipInUnknownProtos`_. It will be increased if the 948e2ea53aSyupenglayer 4 protocol is unsupported by kernel. If an application is using 958e2ea53aSyupengraw socket, kernel will always deliver the packet to the raw socket 968e2ea53aSyupengand this counter won't be increased. 978e2ea53aSyupeng 988e2ea53aSyupeng.. _RFC1213 ipInUnknownProtos: https://tools.ietf.org/html/rfc1213#page-27 998e2ea53aSyupeng 1008e2ea53aSyupeng* IpExtInTruncatedPkts 101ae5220c6SRandy Dunlap 1028e2ea53aSyupengFor IPv4 packet, it means the actual data size is smaller than the 1038e2ea53aSyupeng"Total Length" field in the IPv4 header. 1048e2ea53aSyupeng 1058e2ea53aSyupeng* IpInDiscards 106ae5220c6SRandy Dunlap 1078e2ea53aSyupengDefined in `RFC1213 ipInDiscards`_. It indicates the packet is dropped 1088e2ea53aSyupengin the IP receiving path and due to kernel internal reasons (e.g. no 1098e2ea53aSyupengenough memory). 1108e2ea53aSyupeng 1118e2ea53aSyupeng.. _RFC1213 ipInDiscards: https://tools.ietf.org/html/rfc1213#page-28 1128e2ea53aSyupeng 1138e2ea53aSyupeng* IpOutDiscards 114ae5220c6SRandy Dunlap 1158e2ea53aSyupengDefined in `RFC1213 ipOutDiscards`_. It indicates the packet is 1168e2ea53aSyupengdropped in the IP sending path and due to kernel internal reasons. 1178e2ea53aSyupeng 1188e2ea53aSyupeng.. _RFC1213 ipOutDiscards: https://tools.ietf.org/html/rfc1213#page-28 1198e2ea53aSyupeng 1208e2ea53aSyupeng* IpOutNoRoutes 121ae5220c6SRandy Dunlap 1228e2ea53aSyupengDefined in `RFC1213 ipOutNoRoutes`_. It indicates the packet is 1238e2ea53aSyupengdropped in the IP sending path and no route is found for it. 1248e2ea53aSyupeng 1258e2ea53aSyupeng.. _RFC1213 ipOutNoRoutes: https://tools.ietf.org/html/rfc1213#page-29 1268e2ea53aSyupeng 127b08794a9SyupengICMP counters 128ae5220c6SRandy Dunlap============= 129b08794a9Syupeng* IcmpInMsgs and IcmpOutMsgs 130ae5220c6SRandy Dunlap 131b08794a9SyupengDefined by `RFC1213 icmpInMsgs`_ and `RFC1213 icmpOutMsgs`_ 132b08794a9Syupeng 133b08794a9Syupeng.. _RFC1213 icmpInMsgs: https://tools.ietf.org/html/rfc1213#page-41 134b08794a9Syupeng.. _RFC1213 icmpOutMsgs: https://tools.ietf.org/html/rfc1213#page-43 135b08794a9Syupeng 136b08794a9SyupengAs mentioned in the RFC1213, these two counters include errors, they 137b08794a9Syupengwould be increased even if the ICMP packet has an invalid type. The 138b08794a9SyupengICMP output path will check the header of a raw socket, so the 139b08794a9SyupengIcmpOutMsgs would still be updated if the IP header is constructed by 140b08794a9Syupenga userspace program. 141b08794a9Syupeng 142b08794a9Syupeng* ICMP named types 143ae5220c6SRandy Dunlap 144b08794a9Syupeng| These counters include most of common ICMP types, they are: 145b08794a9Syupeng| IcmpInDestUnreachs: `RFC1213 icmpInDestUnreachs`_ 146b08794a9Syupeng| IcmpInTimeExcds: `RFC1213 icmpInTimeExcds`_ 147b08794a9Syupeng| IcmpInParmProbs: `RFC1213 icmpInParmProbs`_ 148b08794a9Syupeng| IcmpInSrcQuenchs: `RFC1213 icmpInSrcQuenchs`_ 149b08794a9Syupeng| IcmpInRedirects: `RFC1213 icmpInRedirects`_ 150b08794a9Syupeng| IcmpInEchos: `RFC1213 icmpInEchos`_ 151b08794a9Syupeng| IcmpInEchoReps: `RFC1213 icmpInEchoReps`_ 152b08794a9Syupeng| IcmpInTimestamps: `RFC1213 icmpInTimestamps`_ 153b08794a9Syupeng| IcmpInTimestampReps: `RFC1213 icmpInTimestampReps`_ 154b08794a9Syupeng| IcmpInAddrMasks: `RFC1213 icmpInAddrMasks`_ 155b08794a9Syupeng| IcmpInAddrMaskReps: `RFC1213 icmpInAddrMaskReps`_ 156b08794a9Syupeng| IcmpOutDestUnreachs: `RFC1213 icmpOutDestUnreachs`_ 157b08794a9Syupeng| IcmpOutTimeExcds: `RFC1213 icmpOutTimeExcds`_ 158b08794a9Syupeng| IcmpOutParmProbs: `RFC1213 icmpOutParmProbs`_ 159b08794a9Syupeng| IcmpOutSrcQuenchs: `RFC1213 icmpOutSrcQuenchs`_ 160b08794a9Syupeng| IcmpOutRedirects: `RFC1213 icmpOutRedirects`_ 161b08794a9Syupeng| IcmpOutEchos: `RFC1213 icmpOutEchos`_ 162b08794a9Syupeng| IcmpOutEchoReps: `RFC1213 icmpOutEchoReps`_ 163b08794a9Syupeng| IcmpOutTimestamps: `RFC1213 icmpOutTimestamps`_ 164b08794a9Syupeng| IcmpOutTimestampReps: `RFC1213 icmpOutTimestampReps`_ 165b08794a9Syupeng| IcmpOutAddrMasks: `RFC1213 icmpOutAddrMasks`_ 166b08794a9Syupeng| IcmpOutAddrMaskReps: `RFC1213 icmpOutAddrMaskReps`_ 167b08794a9Syupeng 168b08794a9Syupeng.. _RFC1213 icmpInDestUnreachs: https://tools.ietf.org/html/rfc1213#page-41 169b08794a9Syupeng.. _RFC1213 icmpInTimeExcds: https://tools.ietf.org/html/rfc1213#page-41 170b08794a9Syupeng.. _RFC1213 icmpInParmProbs: https://tools.ietf.org/html/rfc1213#page-42 171b08794a9Syupeng.. _RFC1213 icmpInSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-42 172b08794a9Syupeng.. _RFC1213 icmpInRedirects: https://tools.ietf.org/html/rfc1213#page-42 173b08794a9Syupeng.. _RFC1213 icmpInEchos: https://tools.ietf.org/html/rfc1213#page-42 174b08794a9Syupeng.. _RFC1213 icmpInEchoReps: https://tools.ietf.org/html/rfc1213#page-42 175b08794a9Syupeng.. _RFC1213 icmpInTimestamps: https://tools.ietf.org/html/rfc1213#page-42 176b08794a9Syupeng.. _RFC1213 icmpInTimestampReps: https://tools.ietf.org/html/rfc1213#page-43 177b08794a9Syupeng.. _RFC1213 icmpInAddrMasks: https://tools.ietf.org/html/rfc1213#page-43 178b08794a9Syupeng.. _RFC1213 icmpInAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-43 179b08794a9Syupeng 180b08794a9Syupeng.. _RFC1213 icmpOutDestUnreachs: https://tools.ietf.org/html/rfc1213#page-44 181b08794a9Syupeng.. _RFC1213 icmpOutTimeExcds: https://tools.ietf.org/html/rfc1213#page-44 182b08794a9Syupeng.. _RFC1213 icmpOutParmProbs: https://tools.ietf.org/html/rfc1213#page-44 183b08794a9Syupeng.. _RFC1213 icmpOutSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-44 184b08794a9Syupeng.. _RFC1213 icmpOutRedirects: https://tools.ietf.org/html/rfc1213#page-44 185b08794a9Syupeng.. _RFC1213 icmpOutEchos: https://tools.ietf.org/html/rfc1213#page-45 186b08794a9Syupeng.. _RFC1213 icmpOutEchoReps: https://tools.ietf.org/html/rfc1213#page-45 187b08794a9Syupeng.. _RFC1213 icmpOutTimestamps: https://tools.ietf.org/html/rfc1213#page-45 188b08794a9Syupeng.. _RFC1213 icmpOutTimestampReps: https://tools.ietf.org/html/rfc1213#page-45 189b08794a9Syupeng.. _RFC1213 icmpOutAddrMasks: https://tools.ietf.org/html/rfc1213#page-45 190b08794a9Syupeng.. _RFC1213 icmpOutAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-46 191b08794a9Syupeng 192b08794a9SyupengEvery ICMP type has two counters: 'In' and 'Out'. E.g., for the ICMP 193b08794a9SyupengEcho packet, they are IcmpInEchos and IcmpOutEchos. Their meanings are 194b08794a9Syupengstraightforward. The 'In' counter means kernel receives such a packet 195b08794a9Syupengand the 'Out' counter means kernel sends such a packet. 196b08794a9Syupeng 197b08794a9Syupeng* ICMP numeric types 198ae5220c6SRandy Dunlap 199b08794a9SyupengThey are IcmpMsgInType[N] and IcmpMsgOutType[N], the [N] indicates the 200b08794a9SyupengICMP type number. These counters track all kinds of ICMP packets. The 201b08794a9SyupengICMP type number definition could be found in the `ICMP parameters`_ 202b08794a9Syupengdocument. 203b08794a9Syupeng 204b08794a9Syupeng.. _ICMP parameters: https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml 205b08794a9Syupeng 206b08794a9SyupengFor example, if the Linux kernel sends an ICMP Echo packet, the 207b08794a9SyupengIcmpMsgOutType8 would increase 1. And if kernel gets an ICMP Echo Reply 208b08794a9Syupengpacket, IcmpMsgInType0 would increase 1. 209b08794a9Syupeng 210b08794a9Syupeng* IcmpInCsumErrors 211ae5220c6SRandy Dunlap 212b08794a9SyupengThis counter indicates the checksum of the ICMP packet is 213b08794a9Syupengwrong. Kernel verifies the checksum after updating the IcmpInMsgs and 214b08794a9Syupengbefore updating IcmpMsgInType[N]. If a packet has bad checksum, the 215b08794a9SyupengIcmpInMsgs would be updated but none of IcmpMsgInType[N] would be updated. 216b08794a9Syupeng 217b08794a9Syupeng* IcmpInErrors and IcmpOutErrors 218ae5220c6SRandy Dunlap 219b08794a9SyupengDefined by `RFC1213 icmpInErrors`_ and `RFC1213 icmpOutErrors`_ 220b08794a9Syupeng 221b08794a9Syupeng.. _RFC1213 icmpInErrors: https://tools.ietf.org/html/rfc1213#page-41 222b08794a9Syupeng.. _RFC1213 icmpOutErrors: https://tools.ietf.org/html/rfc1213#page-43 223b08794a9Syupeng 224b08794a9SyupengWhen an error occurs in the ICMP packet handler path, these two 225b08794a9Syupengcounters would be updated. The receiving packet path use IcmpInErrors 226b08794a9Syupengand the sending packet path use IcmpOutErrors. When IcmpInCsumErrors 227b08794a9Syupengis increased, IcmpInErrors would always be increased too. 228b08794a9Syupeng 229b08794a9Syupengrelationship of the ICMP counters 230ae5220c6SRandy Dunlap--------------------------------- 231b08794a9SyupengThe sum of IcmpMsgOutType[N] is always equal to IcmpOutMsgs, as they 232b08794a9Syupengare updated at the same time. The sum of IcmpMsgInType[N] plus 233b08794a9SyupengIcmpInErrors should be equal or larger than IcmpInMsgs. When kernel 234b08794a9Syupengreceives an ICMP packet, kernel follows below logic: 235b08794a9Syupeng 236b08794a9Syupeng1. increase IcmpInMsgs 237b08794a9Syupeng2. if has any error, update IcmpInErrors and finish the process 238b08794a9Syupeng3. update IcmpMsgOutType[N] 239b08794a9Syupeng4. handle the packet depending on the type, if has any error, update 240b08794a9Syupeng IcmpInErrors and finish the process 241b08794a9Syupeng 242b08794a9SyupengSo if all errors occur in step (2), IcmpInMsgs should be equal to the 243b08794a9Syupengsum of IcmpMsgOutType[N] plus IcmpInErrors. If all errors occur in 244b08794a9Syupengstep (4), IcmpInMsgs should be equal to the sum of 245b08794a9SyupengIcmpMsgOutType[N]. If the errors occur in both step (2) and step (4), 246b08794a9SyupengIcmpInMsgs should be less than the sum of IcmpMsgOutType[N] plus 247b08794a9SyupengIcmpInErrors. 248b08794a9Syupeng 24980cc4950SyupengGeneral TCP counters 250ae5220c6SRandy Dunlap==================== 25180cc4950Syupeng* TcpInSegs 252ae5220c6SRandy Dunlap 25380cc4950SyupengDefined in `RFC1213 tcpInSegs`_ 25480cc4950Syupeng 25580cc4950Syupeng.. _RFC1213 tcpInSegs: https://tools.ietf.org/html/rfc1213#page-48 25680cc4950Syupeng 25780cc4950SyupengThe number of packets received by the TCP layer. As mentioned in 25880cc4950SyupengRFC1213, it includes the packets received in error, such as checksum 25980cc4950Syupengerror, invalid TCP header and so on. Only one error won't be included: 26080cc4950Syupengif the layer 2 destination address is not the NIC's layer 2 26180cc4950Syupengaddress. It might happen if the packet is a multicast or broadcast 26280cc4950Syupengpacket, or the NIC is in promiscuous mode. In these situations, the 26380cc4950Syupengpackets would be delivered to the TCP layer, but the TCP layer will discard 26480cc4950Syupengthese packets before increasing TcpInSegs. The TcpInSegs counter 26580cc4950Syupengisn't aware of GRO. So if two packets are merged by GRO, the TcpInSegs 26680cc4950Syupengcounter would only increase 1. 26780cc4950Syupeng 26880cc4950Syupeng* TcpOutSegs 269ae5220c6SRandy Dunlap 27080cc4950SyupengDefined in `RFC1213 tcpOutSegs`_ 27180cc4950Syupeng 27280cc4950Syupeng.. _RFC1213 tcpOutSegs: https://tools.ietf.org/html/rfc1213#page-48 27380cc4950Syupeng 27480cc4950SyupengThe number of packets sent by the TCP layer. As mentioned in RFC1213, 27580cc4950Syupengit excludes the retransmitted packets. But it includes the SYN, ACK 27680cc4950Syupengand RST packets. Doesn't like TcpInSegs, the TcpOutSegs is aware of 27780cc4950SyupengGSO, so if a packet would be split to 2 by GSO, TcpOutSegs will 27880cc4950Syupengincrease 2. 27980cc4950Syupeng 28080cc4950Syupeng* TcpActiveOpens 281ae5220c6SRandy Dunlap 28280cc4950SyupengDefined in `RFC1213 tcpActiveOpens`_ 28380cc4950Syupeng 28480cc4950Syupeng.. _RFC1213 tcpActiveOpens: https://tools.ietf.org/html/rfc1213#page-47 28580cc4950Syupeng 28680cc4950SyupengIt means the TCP layer sends a SYN, and come into the SYN-SENT 28780cc4950Syupengstate. Every time TcpActiveOpens increases 1, TcpOutSegs should always 28880cc4950Syupengincrease 1. 28980cc4950Syupeng 29080cc4950Syupeng* TcpPassiveOpens 291ae5220c6SRandy Dunlap 29280cc4950SyupengDefined in `RFC1213 tcpPassiveOpens`_ 29380cc4950Syupeng 29480cc4950Syupeng.. _RFC1213 tcpPassiveOpens: https://tools.ietf.org/html/rfc1213#page-47 29580cc4950Syupeng 29680cc4950SyupengIt means the TCP layer receives a SYN, replies a SYN+ACK, come into 29780cc4950Syupengthe SYN-RCVD state. 29880cc4950Syupeng 299712ee16cSyupeng* TcpExtTCPRcvCoalesce 300ae5220c6SRandy Dunlap 301712ee16cSyupengWhen packets are received by the TCP layer and are not be read by the 302712ee16cSyupengapplication, the TCP layer will try to merge them. This counter 303712ee16cSyupengindicate how many packets are merged in such situation. If GRO is 304712ee16cSyupengenabled, lots of packets would be merged by GRO, these packets 305712ee16cSyupengwouldn't be counted to TcpExtTCPRcvCoalesce. 306712ee16cSyupeng 307712ee16cSyupeng* TcpExtTCPAutoCorking 308ae5220c6SRandy Dunlap 309712ee16cSyupengWhen sending packets, the TCP layer will try to merge small packets to 310712ee16cSyupenga bigger one. This counter increase 1 for every packet merged in such 311712ee16cSyupengsituation. Please refer to the LWN article for more details: 312712ee16cSyupenghttps://lwn.net/Articles/576263/ 313712ee16cSyupeng 314712ee16cSyupeng* TcpExtTCPOrigDataSent 315ae5220c6SRandy Dunlap 316712ee16cSyupengThis counter is explained by `kernel commit f19c29e3e391`_, I pasted the 317712ee16cSyupengexplaination below:: 318712ee16cSyupeng 319712ee16cSyupeng TCPOrigDataSent: number of outgoing packets with original data (excluding 320712ee16cSyupeng retransmission but including data-in-SYN). This counter is different from 321712ee16cSyupeng TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is 322712ee16cSyupeng more useful to track the TCP retransmission rate. 323712ee16cSyupeng 324712ee16cSyupeng* TCPSynRetrans 325ae5220c6SRandy Dunlap 326712ee16cSyupengThis counter is explained by `kernel commit f19c29e3e391`_, I pasted the 327712ee16cSyupengexplaination below:: 328712ee16cSyupeng 329712ee16cSyupeng TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down 330712ee16cSyupeng retransmissions into SYN, fast-retransmits, timeout retransmits, etc. 331712ee16cSyupeng 332712ee16cSyupeng* TCPFastOpenActiveFail 333ae5220c6SRandy Dunlap 334712ee16cSyupengThis counter is explained by `kernel commit f19c29e3e391`_, I pasted the 335712ee16cSyupengexplaination below:: 336712ee16cSyupeng 337712ee16cSyupeng TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed because 338712ee16cSyupeng the remote does not accept it or the attempts timed out. 339712ee16cSyupeng 340712ee16cSyupeng.. _kernel commit f19c29e3e391: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd 341712ee16cSyupeng 342712ee16cSyupeng* TcpExtListenOverflows and TcpExtListenDrops 343ae5220c6SRandy Dunlap 344712ee16cSyupengWhen kernel receives a SYN from a client, and if the TCP accept queue 345712ee16cSyupengis full, kernel will drop the SYN and add 1 to TcpExtListenOverflows. 346712ee16cSyupengAt the same time kernel will also add 1 to TcpExtListenDrops. When a 347712ee16cSyupengTCP socket is in LISTEN state, and kernel need to drop a packet, 348712ee16cSyupengkernel would always add 1 to TcpExtListenDrops. So increase 349712ee16cSyupengTcpExtListenOverflows would let TcpExtListenDrops increasing at the 350712ee16cSyupengsame time, but TcpExtListenDrops would also increase without 351712ee16cSyupengTcpExtListenOverflows increasing, e.g. a memory allocation fail would 352712ee16cSyupengalso let TcpExtListenDrops increase. 353712ee16cSyupeng 354712ee16cSyupengNote: The above explanation is based on kernel 4.10 or above version, on 355712ee16cSyupengan old kernel, the TCP stack has different behavior when TCP accept 356712ee16cSyupengqueue is full. On the old kernel, TCP stack won't drop the SYN, it 357712ee16cSyupengwould complete the 3-way handshake. As the accept queue is full, TCP 358712ee16cSyupengstack will keep the socket in the TCP half-open queue. As it is in the 359712ee16cSyupenghalf open queue, TCP stack will send SYN+ACK on an exponential backoff 360712ee16cSyupengtimer, after client replies ACK, TCP stack checks whether the accept 361712ee16cSyupengqueue is still full, if it is not full, moves the socket to the accept 362712ee16cSyupengqueue, if it is full, keeps the socket in the half-open queue, at next 363712ee16cSyupengtime client replies ACK, this socket will get another chance to move 364712ee16cSyupengto the accept queue. 365712ee16cSyupeng 366712ee16cSyupeng 36780cc4950SyupengTCP Fast Open 368ae5220c6SRandy Dunlap============= 36980cc4950SyupengWhen kernel receives a TCP packet, it has two paths to handler the 37080cc4950Syupengpacket, one is fast path, another is slow path. The comment in kernel 37180cc4950Syupengcode provides a good explanation of them, I pasted them below:: 37280cc4950Syupeng 37380cc4950Syupeng It is split into a fast path and a slow path. The fast path is 37480cc4950Syupeng disabled when: 37580cc4950Syupeng 37680cc4950Syupeng - A zero window was announced from us 37780cc4950Syupeng - zero window probing 37880cc4950Syupeng is only handled properly on the slow path. 37980cc4950Syupeng - Out of order segments arrived. 38080cc4950Syupeng - Urgent data is expected. 38180cc4950Syupeng - There is no buffer space left 38280cc4950Syupeng - Unexpected TCP flags/window values/header lengths are received 38380cc4950Syupeng (detected by checking the TCP header against pred_flags) 38480cc4950Syupeng - Data is sent in both directions. The fast path only supports pure senders 38580cc4950Syupeng or pure receivers (this means either the sequence number or the ack 38680cc4950Syupeng value must stay constant) 38780cc4950Syupeng - Unexpected TCP option. 38880cc4950Syupeng 38980cc4950SyupengKernel will try to use fast path unless any of the above conditions 39080cc4950Syupengare satisfied. If the packets are out of order, kernel will handle 39180cc4950Syupengthem in slow path, which means the performance might be not very 39280cc4950Syupenggood. Kernel would also come into slow path if the "Delayed ack" is 39380cc4950Syupengused, because when using "Delayed ack", the data is sent in both 39480cc4950Syupengdirections. When the TCP window scale option is not used, kernel will 39580cc4950Syupengtry to enable fast path immediately when the connection comes into the 39680cc4950Syupengestablished state, but if the TCP window scale option is used, kernel 39780cc4950Syupengwill disable the fast path at first, and try to enable it after kernel 39880cc4950Syupengreceives packets. 39980cc4950Syupeng 40080cc4950Syupeng* TcpExtTCPPureAcks and TcpExtTCPHPAcks 401ae5220c6SRandy Dunlap 40280cc4950SyupengIf a packet set ACK flag and has no data, it is a pure ACK packet, if 40380cc4950Syupengkernel handles it in the fast path, TcpExtTCPHPAcks will increase 1, 40480cc4950Syupengif kernel handles it in the slow path, TcpExtTCPPureAcks will 40580cc4950Syupengincrease 1. 40680cc4950Syupeng 40780cc4950Syupeng* TcpExtTCPHPHits 408ae5220c6SRandy Dunlap 40980cc4950SyupengIf a TCP packet has data (which means it is not a pure ACK packet), 41080cc4950Syupengand this packet is handled in the fast path, TcpExtTCPHPHits will 41180cc4950Syupengincrease 1. 41280cc4950Syupeng 41380cc4950Syupeng 41480cc4950SyupengTCP abort 415ae5220c6SRandy Dunlap========= 41680cc4950Syupeng 41780cc4950Syupeng* TcpExtTCPAbortOnData 418ae5220c6SRandy Dunlap 41980cc4950SyupengIt means TCP layer has data in flight, but need to close the 42080cc4950Syupengconnection. So TCP layer sends a RST to the other side, indicate the 42180cc4950Syupengconnection is not closed very graceful. An easy way to increase this 42280cc4950Syupengcounter is using the SO_LINGER option. Please refer to the SO_LINGER 42380cc4950Syupengsection of the `socket man page`_: 42480cc4950Syupeng 42580cc4950Syupeng.. _socket man page: http://man7.org/linux/man-pages/man7/socket.7.html 42680cc4950Syupeng 42780cc4950SyupengBy default, when an application closes a connection, the close function 42880cc4950Syupengwill return immediately and kernel will try to send the in-flight data 42980cc4950Syupengasync. If you use the SO_LINGER option, set l_onoff to 1, and l_linger 43080cc4950Syupengto a positive number, the close function won't return immediately, but 43180cc4950Syupengwait for the in-flight data are acked by the other side, the max wait 43280cc4950Syupengtime is l_linger seconds. If set l_onoff to 1 and set l_linger to 0, 43380cc4950Syupengwhen the application closes a connection, kernel will send a RST 43480cc4950Syupengimmediately and increase the TcpExtTCPAbortOnData counter. 43580cc4950Syupeng 43680cc4950Syupeng* TcpExtTCPAbortOnClose 437ae5220c6SRandy Dunlap 43880cc4950SyupengThis counter means the application has unread data in the TCP layer when 43980cc4950Syupengthe application wants to close the TCP connection. In such a situation, 44080cc4950Syupengkernel will send a RST to the other side of the TCP connection. 44180cc4950Syupeng 44280cc4950Syupeng* TcpExtTCPAbortOnMemory 443ae5220c6SRandy Dunlap 44480cc4950SyupengWhen an application closes a TCP connection, kernel still need to track 44580cc4950Syupengthe connection, let it complete the TCP disconnect process. E.g. an 44680cc4950Syupengapp calls the close method of a socket, kernel sends fin to the other 44780cc4950Syupengside of the connection, then the app has no relationship with the 44880cc4950Syupengsocket any more, but kernel need to keep the socket, this socket 44980cc4950Syupengbecomes an orphan socket, kernel waits for the reply of the other side, 45080cc4950Syupengand would come to the TIME_WAIT state finally. When kernel has no 45180cc4950Syupengenough memory to keep the orphan socket, kernel would send an RST to 45280cc4950Syupengthe other side, and delete the socket, in such situation, kernel will 45380cc4950Syupengincrease 1 to the TcpExtTCPAbortOnMemory. Two conditions would trigger 45480cc4950SyupengTcpExtTCPAbortOnMemory: 45580cc4950Syupeng 45680cc4950Syupeng1. the memory used by the TCP protocol is higher than the third value of 45780cc4950Syupengthe tcp_mem. Please refer the tcp_mem section in the `TCP man page`_: 45880cc4950Syupeng 45980cc4950Syupeng.. _TCP man page: http://man7.org/linux/man-pages/man7/tcp.7.html 46080cc4950Syupeng 46180cc4950Syupeng2. the orphan socket count is higher than net.ipv4.tcp_max_orphans 46280cc4950Syupeng 46380cc4950Syupeng 46480cc4950Syupeng* TcpExtTCPAbortOnTimeout 465ae5220c6SRandy Dunlap 46680cc4950SyupengThis counter will increase when any of the TCP timers expire. In such 46780cc4950Syupengsituation, kernel won't send RST, just give up the connection. 46880cc4950Syupeng 46980cc4950Syupeng* TcpExtTCPAbortOnLinger 470ae5220c6SRandy Dunlap 47180cc4950SyupengWhen a TCP connection comes into FIN_WAIT_2 state, instead of waiting 47280cc4950Syupengfor the fin packet from the other side, kernel could send a RST and 47380cc4950Syupengdelete the socket immediately. This is not the default behavior of 47480cc4950SyupengLinux kernel TCP stack. By configuring the TCP_LINGER2 socket option, 47580cc4950Syupengyou could let kernel follow this behavior. 47680cc4950Syupeng 47780cc4950Syupeng* TcpExtTCPAbortFailed 478ae5220c6SRandy Dunlap 47980cc4950SyupengThe kernel TCP layer will send RST if the `RFC2525 2.17 section`_ is 48080cc4950Syupengsatisfied. If an internal error occurs during this process, 48180cc4950SyupengTcpExtTCPAbortFailed will be increased. 48280cc4950Syupeng 48380cc4950Syupeng.. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50 48480cc4950Syupeng 485712ee16cSyupengTCP Hybrid Slow Start 486ae5220c6SRandy Dunlap===================== 487712ee16cSyupengThe Hybrid Slow Start algorithm is an enhancement of the traditional 488712ee16cSyupengTCP congestion window Slow Start algorithm. It uses two pieces of 489712ee16cSyupenginformation to detect whether the max bandwidth of the TCP path is 490712ee16cSyupengapproached. The two pieces of information are ACK train length and 491712ee16cSyupengincrease in packet delay. For detail information, please refer the 492712ee16cSyupeng`Hybrid Slow Start paper`_. Either ACK train length or packet delay 493712ee16cSyupenghits a specific threshold, the congestion control algorithm will come 494712ee16cSyupenginto the Congestion Avoidance state. Until v4.20, two congestion 495712ee16cSyupengcontrol algorithms are using Hybrid Slow Start, they are cubic (the 496712ee16cSyupengdefault congestion control algorithm) and cdg. Four snmp counters 497712ee16cSyupengrelate with the Hybrid Slow Start algorithm. 498712ee16cSyupeng 499712ee16cSyupeng.. _Hybrid Slow Start paper: https://pdfs.semanticscholar.org/25e9/ef3f03315782c7f1cbcd31b587857adae7d1.pdf 500712ee16cSyupeng 501712ee16cSyupeng* TcpExtTCPHystartTrainDetect 502ae5220c6SRandy Dunlap 503712ee16cSyupengHow many times the ACK train length threshold is detected 504712ee16cSyupeng 505712ee16cSyupeng* TcpExtTCPHystartTrainCwnd 506ae5220c6SRandy Dunlap 507712ee16cSyupengThe sum of CWND detected by ACK train length. Dividing this value by 508712ee16cSyupengTcpExtTCPHystartTrainDetect is the average CWND which detected by the 509712ee16cSyupengACK train length. 510712ee16cSyupeng 511712ee16cSyupeng* TcpExtTCPHystartDelayDetect 512ae5220c6SRandy Dunlap 513712ee16cSyupengHow many times the packet delay threshold is detected. 514712ee16cSyupeng 515712ee16cSyupeng* TcpExtTCPHystartDelayCwnd 516ae5220c6SRandy Dunlap 517712ee16cSyupengThe sum of CWND detected by packet delay. Dividing this value by 518712ee16cSyupengTcpExtTCPHystartDelayDetect is the average CWND which detected by the 519712ee16cSyupengpacket delay. 520712ee16cSyupeng 5218e2ea53aSyupengTCP retransmission and congestion control 522ae5220c6SRandy Dunlap========================================= 5238e2ea53aSyupengThe TCP protocol has two retransmission mechanisms: SACK and fast 5248e2ea53aSyupengrecovery. They are exclusive with each other. When SACK is enabled, 5258e2ea53aSyupengthe kernel TCP stack would use SACK, or kernel would use fast 5268e2ea53aSyupengrecovery. The SACK is a TCP option, which is defined in `RFC2018`_, 5278e2ea53aSyupengthe fast recovery is defined in `RFC6582`_, which is also called 5288e2ea53aSyupeng'Reno'. 5298e2ea53aSyupeng 5308e2ea53aSyupengThe TCP congestion control is a big and complex topic. To understand 5318e2ea53aSyupengthe related snmp counter, we need to know the states of the congestion 5328e2ea53aSyupengcontrol state machine. There are 5 states: Open, Disorder, CWR, 5338e2ea53aSyupengRecovery and Loss. For details about these states, please refer page 5 5348e2ea53aSyupengand page 6 of this document: 5358e2ea53aSyupenghttps://pdfs.semanticscholar.org/0e9c/968d09ab2e53e24c4dca5b2d67c7f7140f8e.pdf 5368e2ea53aSyupeng 5378e2ea53aSyupeng.. _RFC2018: https://tools.ietf.org/html/rfc2018 5388e2ea53aSyupeng.. _RFC6582: https://tools.ietf.org/html/rfc6582 5398e2ea53aSyupeng 5408e2ea53aSyupeng* TcpExtTCPRenoRecovery and TcpExtTCPSackRecovery 541ae5220c6SRandy Dunlap 5428e2ea53aSyupengWhen the congestion control comes into Recovery state, if sack is 5438e2ea53aSyupengused, TcpExtTCPSackRecovery increases 1, if sack is not used, 5448e2ea53aSyupengTcpExtTCPRenoRecovery increases 1. These two counters mean the TCP 5458e2ea53aSyupengstack begins to retransmit the lost packets. 5468e2ea53aSyupeng 5478e2ea53aSyupeng* TcpExtTCPSACKReneging 548ae5220c6SRandy Dunlap 5498e2ea53aSyupengA packet was acknowledged by SACK, but the receiver has dropped this 5508e2ea53aSyupengpacket, so the sender needs to retransmit this packet. In this 5518e2ea53aSyupengsituation, the sender adds 1 to TcpExtTCPSACKReneging. A receiver 5528e2ea53aSyupengcould drop a packet which has been acknowledged by SACK, although it is 5538e2ea53aSyupengunusual, it is allowed by the TCP protocol. The sender doesn't really 5548e2ea53aSyupengknow what happened on the receiver side. The sender just waits until 5558e2ea53aSyupengthe RTO expires for this packet, then the sender assumes this packet 5568e2ea53aSyupenghas been dropped by the receiver. 5578e2ea53aSyupeng 5588e2ea53aSyupeng* TcpExtTCPRenoReorder 559ae5220c6SRandy Dunlap 5608e2ea53aSyupengThe reorder packet is detected by fast recovery. It would only be used 5618e2ea53aSyupengif SACK is disabled. The fast recovery algorithm detects recorder by 5628e2ea53aSyupengthe duplicate ACK number. E.g., if retransmission is triggered, and 5638e2ea53aSyupengthe original retransmitted packet is not lost, it is just out of 5648e2ea53aSyupengorder, the receiver would acknowledge multiple times, one for the 5658e2ea53aSyupengretransmitted packet, another for the arriving of the original out of 5668e2ea53aSyupengorder packet. Thus the sender would find more ACks than its 5678e2ea53aSyupengexpectation, and the sender knows out of order occurs. 5688e2ea53aSyupeng 5698e2ea53aSyupeng* TcpExtTCPTSReorder 570ae5220c6SRandy Dunlap 5718e2ea53aSyupengThe reorder packet is detected when a hole is filled. E.g., assume the 5728e2ea53aSyupengsender sends packet 1,2,3,4,5, and the receiving order is 5738e2ea53aSyupeng1,2,4,5,3. When the sender receives the ACK of packet 3 (which will 5748e2ea53aSyupengfill the hole), two conditions will let TcpExtTCPTSReorder increase 5758e2ea53aSyupeng1: (1) if the packet 3 is not re-retransmitted yet. (2) if the packet 5768e2ea53aSyupeng3 is retransmitted but the timestamp of the packet 3's ACK is earlier 5778e2ea53aSyupengthan the retransmission timestamp. 5788e2ea53aSyupeng 5798e2ea53aSyupeng* TcpExtTCPSACKReorder 580ae5220c6SRandy Dunlap 5818e2ea53aSyupengThe reorder packet detected by SACK. The SACK has two methods to 5828e2ea53aSyupengdetect reorder: (1) DSACK is received by the sender. It means the 5838e2ea53aSyupengsender sends the same packet more than one times. And the only reason 5848e2ea53aSyupengis the sender believes an out of order packet is lost so it sends the 5858e2ea53aSyupengpacket again. (2) Assume packet 1,2,3,4,5 are sent by the sender, and 5868e2ea53aSyupengthe sender has received SACKs for packet 2 and 5, now the sender 5878e2ea53aSyupengreceives SACK for packet 4 and the sender doesn't retransmit the 5888e2ea53aSyupengpacket yet, the sender would know packet 4 is out of order. The TCP 5898e2ea53aSyupengstack of kernel will increase TcpExtTCPSACKReorder for both of the 5908e2ea53aSyupengabove scenarios. 5918e2ea53aSyupeng 5928e2ea53aSyupeng 5938e2ea53aSyupengDSACK 5948e2ea53aSyupeng===== 5958e2ea53aSyupengThe DSACK is defined in `RFC2883`_. The receiver uses DSACK to report 5968e2ea53aSyupengduplicate packets to the sender. There are two kinds of 5978e2ea53aSyupengduplications: (1) a packet which has been acknowledged is 5988e2ea53aSyupengduplicate. (2) an out of order packet is duplicate. The TCP stack 5998e2ea53aSyupengcounts these two kinds of duplications on both receiver side and 6008e2ea53aSyupengsender side. 6018e2ea53aSyupeng 6028e2ea53aSyupeng.. _RFC2883 : https://tools.ietf.org/html/rfc2883 6038e2ea53aSyupeng 6048e2ea53aSyupeng* TcpExtTCPDSACKOldSent 605ae5220c6SRandy Dunlap 6068e2ea53aSyupengThe TCP stack receives a duplicate packet which has been acked, so it 6078e2ea53aSyupengsends a DSACK to the sender. 6088e2ea53aSyupeng 6098e2ea53aSyupeng* TcpExtTCPDSACKOfoSent 610ae5220c6SRandy Dunlap 6118e2ea53aSyupengThe TCP stack receives an out of order duplicate packet, so it sends a 6128e2ea53aSyupengDSACK to the sender. 6138e2ea53aSyupeng 6148e2ea53aSyupeng* TcpExtTCPDSACKRecv 615ae5220c6SRandy Dunlap 6168e2ea53aSyupengThe TCP stack receives a DSACK, which indicate an acknowledged 6178e2ea53aSyupengduplicate packet is received. 6188e2ea53aSyupeng 6198e2ea53aSyupeng* TcpExtTCPDSACKOfoRecv 620ae5220c6SRandy Dunlap 6218e2ea53aSyupengThe TCP stack receives a DSACK, which indicate an out of order 6222b965472Syupengduplicate packet is received. 6232b965472Syupeng 6242b965472SyupengTCP out of order 625ae5220c6SRandy Dunlap================ 6262b965472Syupeng* TcpExtTCPOFOQueue 627ae5220c6SRandy Dunlap 6282b965472SyupengThe TCP layer receives an out of order packet and has enough memory 6292b965472Syupengto queue it. 6302b965472Syupeng 6312b965472Syupeng* TcpExtTCPOFODrop 632ae5220c6SRandy Dunlap 6332b965472SyupengThe TCP layer receives an out of order packet but doesn't have enough 6342b965472Syupengmemory, so drops it. Such packets won't be counted into 6352b965472SyupengTcpExtTCPOFOQueue. 6362b965472Syupeng 6372b965472Syupeng* TcpExtTCPOFOMerge 638ae5220c6SRandy Dunlap 6392b965472SyupengThe received out of order packet has an overlay with the previous 6402b965472Syupengpacket. the overlay part will be dropped. All of TcpExtTCPOFOMerge 6412b965472Syupengpackets will also be counted into TcpExtTCPOFOQueue. 6422b965472Syupeng 6432b965472SyupengTCP PAWS 644ae5220c6SRandy Dunlap======== 6452b965472SyupengPAWS (Protection Against Wrapped Sequence numbers) is an algorithm 6462b965472Syupengwhich is used to drop old packets. It depends on the TCP 6472b965472Syupengtimestamps. For detail information, please refer the `timestamp wiki`_ 6482b965472Syupengand the `RFC of PAWS`_. 6492b965472Syupeng 6502b965472Syupeng.. _RFC of PAWS: https://tools.ietf.org/html/rfc1323#page-17 6512b965472Syupeng.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps 6522b965472Syupeng 6532b965472Syupeng* TcpExtPAWSActive 654ae5220c6SRandy Dunlap 6552b965472SyupengPackets are dropped by PAWS in Syn-Sent status. 6562b965472Syupeng 6572b965472Syupeng* TcpExtPAWSEstab 658ae5220c6SRandy Dunlap 6592b965472SyupengPackets are dropped by PAWS in any status other than Syn-Sent. 6602b965472Syupeng 6612b965472SyupengTCP ACK skip 662ae5220c6SRandy Dunlap============ 6632b965472SyupengIn some scenarios, kernel would avoid sending duplicate ACKs too 6642b965472Syupengfrequently. Please find more details in the tcp_invalid_ratelimit 6652b965472Syupengsection of the `sysctl document`_. When kernel decides to skip an ACK 6662b965472Syupengdue to tcp_invalid_ratelimit, kernel would update one of below 6672b965472Syupengcounters to indicate the ACK is skipped in which scenario. The ACK 6682b965472Syupengwould only be skipped if the received packet is either a SYN packet or 6692b965472Syupengit has no data. 6702b965472Syupeng 6712b965472Syupeng.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt 6722b965472Syupeng 6732b965472Syupeng* TcpExtTCPACKSkippedSynRecv 674ae5220c6SRandy Dunlap 6752b965472SyupengThe ACK is skipped in Syn-Recv status. The Syn-Recv status means the 6762b965472SyupengTCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is 6772b965472Syupengwaiting for an ACK. Generally, the TCP stack doesn't need to send ACK 6782b965472Syupengin the Syn-Recv status. But in several scenarios, the TCP stack need 6792b965472Syupengto send an ACK. E.g., the TCP stack receives the same SYN packet 6802b965472Syupengrepeately, the received packet does not pass the PAWS check, or the 6812b965472Syupengreceived packet sequence number is out of window. In these scenarios, 6822b965472Syupengthe TCP stack needs to send ACK. If the ACk sending frequency is higher than 6832b965472Syupengtcp_invalid_ratelimit allows, the TCP stack will skip sending ACK and 6842b965472Syupengincrease TcpExtTCPACKSkippedSynRecv. 6852b965472Syupeng 6862b965472Syupeng 6872b965472Syupeng* TcpExtTCPACKSkippedPAWS 688ae5220c6SRandy Dunlap 6892b965472SyupengThe ACK is skipped due to PAWS (Protect Against Wrapped Sequence 6902b965472Syupengnumbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2 6912b965472Syupengor Time-Wait statuses, the skipped ACK would be counted to 6922b965472SyupengTcpExtTCPACKSkippedSynRecv, TcpExtTCPACKSkippedFinWait2 or 6932b965472SyupengTcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK 6942b965472Syupengwould be counted to TcpExtTCPACKSkippedPAWS. 6952b965472Syupeng 6962b965472Syupeng* TcpExtTCPACKSkippedSeq 697ae5220c6SRandy Dunlap 6982b965472SyupengThe sequence number is out of window and the timestamp passes the PAWS 6992b965472Syupengcheck and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait. 7002b965472Syupeng 7012b965472Syupeng* TcpExtTCPACKSkippedFinWait2 702ae5220c6SRandy Dunlap 7032b965472SyupengThe ACK is skipped in Fin-Wait-2 status, the reason would be either 7042b965472SyupengPAWS check fails or the received sequence number is out of window. 7052b965472Syupeng 7062b965472Syupeng* TcpExtTCPACKSkippedTimeWait 707ae5220c6SRandy Dunlap 7082b965472SyupengTha ACK is skipped in Time-Wait status, the reason would be either 7092b965472SyupengPAWS check failed or the received sequence number is out of window. 7102b965472Syupeng 7112b965472Syupeng* TcpExtTCPACKSkippedChallenge 712ae5220c6SRandy Dunlap 7132b965472SyupengThe ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines 7142b965472Syupeng3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_, 7152b965472Syupeng`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these 7162b965472Syupengthree scenarios, In some TCP status, the linux TCP stack would also 7172b965472Syupengsend challenge ACKs if the ACK number is before the first 7182b965472Syupengunacknowledged number (more strict than `RFC 5961 section 5.2`_). 7192b965472Syupeng 7202b965472Syupeng.. _RFC 5961 section 3.2: https://tools.ietf.org/html/rfc5961#page-7 7212b965472Syupeng.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9 7222b965472Syupeng.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11 7232b965472Syupeng 7248e2ea53aSyupeng 725b08794a9Syupengexamples 726ae5220c6SRandy Dunlap======== 727b08794a9Syupeng 728b08794a9Syupengping test 729ae5220c6SRandy Dunlap--------- 730b08794a9SyupengRun the ping command against the public dns server 8.8.8.8:: 731b08794a9Syupeng 732b08794a9Syupeng nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1 733b08794a9Syupeng PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 734b08794a9Syupeng 64 bytes from 8.8.8.8: icmp_seq=1 ttl=119 time=17.8 ms 735b08794a9Syupeng 736b08794a9Syupeng --- 8.8.8.8 ping statistics --- 737b08794a9Syupeng 1 packets transmitted, 1 received, 0% packet loss, time 0ms 738b08794a9Syupeng rtt min/avg/max/mdev = 17.875/17.875/17.875/0.000 ms 739b08794a9Syupeng 740b08794a9SyupengThe nstayt result:: 741b08794a9Syupeng 742b08794a9Syupeng nstatuser@nstat-a:~$ nstat 743b08794a9Syupeng #kernel 744b08794a9Syupeng IpInReceives 1 0.0 745b08794a9Syupeng IpInDelivers 1 0.0 746b08794a9Syupeng IpOutRequests 1 0.0 747b08794a9Syupeng IcmpInMsgs 1 0.0 748b08794a9Syupeng IcmpInEchoReps 1 0.0 749b08794a9Syupeng IcmpOutMsgs 1 0.0 750b08794a9Syupeng IcmpOutEchos 1 0.0 751b08794a9Syupeng IcmpMsgInType0 1 0.0 752b08794a9Syupeng IcmpMsgOutType8 1 0.0 753b08794a9Syupeng IpExtInOctets 84 0.0 754b08794a9Syupeng IpExtOutOctets 84 0.0 755b08794a9Syupeng IpExtInNoECTPkts 1 0.0 756b08794a9Syupeng 757b08794a9SyupengThe Linux server sent an ICMP Echo packet, so IpOutRequests, 758b08794a9SyupengIcmpOutMsgs, IcmpOutEchos and IcmpMsgOutType8 were increased 1. The 759b08794a9Syupengserver got ICMP Echo Reply from 8.8.8.8, so IpInReceives, IcmpInMsgs, 760b08794a9SyupengIcmpInEchoReps and IcmpMsgInType0 were increased 1. The ICMP Echo Reply 761b08794a9Syupengwas passed to the ICMP layer via IP layer, so IpInDelivers was 762b08794a9Syupengincreased 1. The default ping data size is 48, so an ICMP Echo packet 763b08794a9Syupengand its corresponding Echo Reply packet are constructed by: 764b08794a9Syupeng 765b08794a9Syupeng* 14 bytes MAC header 766b08794a9Syupeng* 20 bytes IP header 767b08794a9Syupeng* 16 bytes ICMP header 768b08794a9Syupeng* 48 bytes data (default value of the ping command) 769b08794a9Syupeng 770b08794a9SyupengSo the IpExtInOctets and IpExtOutOctets are 20+16+48=84. 77180cc4950Syupeng 77280cc4950Syupengtcp 3-way handshake 773ae5220c6SRandy Dunlap------------------- 77480cc4950SyupengOn server side, we run:: 77580cc4950Syupeng 77680cc4950Syupeng nstatuser@nstat-b:~$ nc -lknv 0.0.0.0 9000 77780cc4950Syupeng Listening on [0.0.0.0] (family 0, port 9000) 77880cc4950Syupeng 77980cc4950SyupengOn client side, we run:: 78080cc4950Syupeng 78180cc4950Syupeng nstatuser@nstat-a:~$ nc -nv 192.168.122.251 9000 78280cc4950Syupeng Connection to 192.168.122.251 9000 port [tcp/*] succeeded! 78380cc4950Syupeng 78480cc4950SyupengThe server listened on tcp 9000 port, the client connected to it, they 78580cc4950Syupengcompleted the 3-way handshake. 78680cc4950Syupeng 78780cc4950SyupengOn server side, we can find below nstat output:: 78880cc4950Syupeng 78980cc4950Syupeng nstatuser@nstat-b:~$ nstat | grep -i tcp 79080cc4950Syupeng TcpPassiveOpens 1 0.0 79180cc4950Syupeng TcpInSegs 2 0.0 79280cc4950Syupeng TcpOutSegs 1 0.0 79380cc4950Syupeng TcpExtTCPPureAcks 1 0.0 79480cc4950Syupeng 79580cc4950SyupengOn client side, we can find below nstat output:: 79680cc4950Syupeng 79780cc4950Syupeng nstatuser@nstat-a:~$ nstat | grep -i tcp 79880cc4950Syupeng TcpActiveOpens 1 0.0 79980cc4950Syupeng TcpInSegs 1 0.0 80080cc4950Syupeng TcpOutSegs 2 0.0 80180cc4950Syupeng 80280cc4950SyupengWhen the server received the first SYN, it replied a SYN+ACK, and came into 80380cc4950SyupengSYN-RCVD state, so TcpPassiveOpens increased 1. The server received 80480cc4950SyupengSYN, sent SYN+ACK, received ACK, so server sent 1 packet, received 2 80580cc4950Syupengpackets, TcpInSegs increased 2, TcpOutSegs increased 1. The last ACK 80680cc4950Syupengof the 3-way handshake is a pure ACK without data, so 80780cc4950SyupengTcpExtTCPPureAcks increased 1. 80880cc4950Syupeng 80980cc4950SyupengWhen the client sent SYN, the client came into the SYN-SENT state, so 81080cc4950SyupengTcpActiveOpens increased 1, the client sent SYN, received SYN+ACK, sent 81180cc4950SyupengACK, so client sent 2 packets, received 1 packet, TcpInSegs increased 81280cc4950Syupeng1, TcpOutSegs increased 2. 81380cc4950Syupeng 81480cc4950SyupengTCP normal traffic 815ae5220c6SRandy Dunlap------------------ 81680cc4950SyupengRun nc on server:: 81780cc4950Syupeng 81880cc4950Syupeng nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000 81980cc4950Syupeng Listening on [0.0.0.0] (family 0, port 9000) 82080cc4950Syupeng 82180cc4950SyupengRun nc on client:: 82280cc4950Syupeng 82380cc4950Syupeng nstatuser@nstat-a:~$ nc -v nstat-b 9000 82480cc4950Syupeng Connection to nstat-b 9000 port [tcp/*] succeeded! 82580cc4950Syupeng 82680cc4950SyupengInput a string in the nc client ('hello' in our example):: 82780cc4950Syupeng 82880cc4950Syupeng nstatuser@nstat-a:~$ nc -v nstat-b 9000 82980cc4950Syupeng Connection to nstat-b 9000 port [tcp/*] succeeded! 83080cc4950Syupeng hello 83180cc4950Syupeng 83280cc4950SyupengThe client side nstat output:: 83380cc4950Syupeng 83480cc4950Syupeng nstatuser@nstat-a:~$ nstat 83580cc4950Syupeng #kernel 83680cc4950Syupeng IpInReceives 1 0.0 83780cc4950Syupeng IpInDelivers 1 0.0 83880cc4950Syupeng IpOutRequests 1 0.0 83980cc4950Syupeng TcpInSegs 1 0.0 84080cc4950Syupeng TcpOutSegs 1 0.0 84180cc4950Syupeng TcpExtTCPPureAcks 1 0.0 84280cc4950Syupeng TcpExtTCPOrigDataSent 1 0.0 84380cc4950Syupeng IpExtInOctets 52 0.0 84480cc4950Syupeng IpExtOutOctets 58 0.0 84580cc4950Syupeng IpExtInNoECTPkts 1 0.0 84680cc4950Syupeng 84780cc4950SyupengThe server side nstat output:: 84880cc4950Syupeng 84980cc4950Syupeng nstatuser@nstat-b:~$ nstat 85080cc4950Syupeng #kernel 85180cc4950Syupeng IpInReceives 1 0.0 85280cc4950Syupeng IpInDelivers 1 0.0 85380cc4950Syupeng IpOutRequests 1 0.0 85480cc4950Syupeng TcpInSegs 1 0.0 85580cc4950Syupeng TcpOutSegs 1 0.0 85680cc4950Syupeng IpExtInOctets 58 0.0 85780cc4950Syupeng IpExtOutOctets 52 0.0 85880cc4950Syupeng IpExtInNoECTPkts 1 0.0 85980cc4950Syupeng 86080cc4950SyupengInput a string in nc client side again ('world' in our exmaple):: 86180cc4950Syupeng 86280cc4950Syupeng nstatuser@nstat-a:~$ nc -v nstat-b 9000 86380cc4950Syupeng Connection to nstat-b 9000 port [tcp/*] succeeded! 86480cc4950Syupeng hello 86580cc4950Syupeng world 86680cc4950Syupeng 86780cc4950SyupengClient side nstat output:: 86880cc4950Syupeng 86980cc4950Syupeng nstatuser@nstat-a:~$ nstat 87080cc4950Syupeng #kernel 87180cc4950Syupeng IpInReceives 1 0.0 87280cc4950Syupeng IpInDelivers 1 0.0 87380cc4950Syupeng IpOutRequests 1 0.0 87480cc4950Syupeng TcpInSegs 1 0.0 87580cc4950Syupeng TcpOutSegs 1 0.0 87680cc4950Syupeng TcpExtTCPHPAcks 1 0.0 87780cc4950Syupeng TcpExtTCPOrigDataSent 1 0.0 87880cc4950Syupeng IpExtInOctets 52 0.0 87980cc4950Syupeng IpExtOutOctets 58 0.0 88080cc4950Syupeng IpExtInNoECTPkts 1 0.0 88180cc4950Syupeng 88280cc4950Syupeng 88380cc4950SyupengServer side nstat output:: 88480cc4950Syupeng 88580cc4950Syupeng nstatuser@nstat-b:~$ nstat 88680cc4950Syupeng #kernel 88780cc4950Syupeng IpInReceives 1 0.0 88880cc4950Syupeng IpInDelivers 1 0.0 88980cc4950Syupeng IpOutRequests 1 0.0 89080cc4950Syupeng TcpInSegs 1 0.0 89180cc4950Syupeng TcpOutSegs 1 0.0 89280cc4950Syupeng TcpExtTCPHPHits 1 0.0 89380cc4950Syupeng IpExtInOctets 58 0.0 89480cc4950Syupeng IpExtOutOctets 52 0.0 89580cc4950Syupeng IpExtInNoECTPkts 1 0.0 89680cc4950Syupeng 89780cc4950SyupengCompare the first client-side nstat and the second client-side nstat, 89880cc4950Syupengwe could find one difference: the first one had a 'TcpExtTCPPureAcks', 89980cc4950Syupengbut the second one had a 'TcpExtTCPHPAcks'. The first server-side 90080cc4950Syupengnstat and the second server-side nstat had a difference too: the 90180cc4950Syupengsecond server-side nstat had a TcpExtTCPHPHits, but the first 90280cc4950Syupengserver-side nstat didn't have it. The network traffic patterns were 90380cc4950Syupengexactly the same: the client sent a packet to the server, the server 90480cc4950Syupengreplied an ACK. But kernel handled them in different ways. When the 90580cc4950SyupengTCP window scale option is not used, kernel will try to enable fast 90680cc4950Syupengpath immediately when the connection comes into the established state, 90780cc4950Syupengbut if the TCP window scale option is used, kernel will disable the 90880cc4950Syupengfast path at first, and try to enable it after kerenl receives 90980cc4950Syupengpackets. We could use the 'ss' command to verify whether the window 91080cc4950Syupengscale option is used. e.g. run below command on either server or 91180cc4950Syupengclient:: 91280cc4950Syupeng 91380cc4950Syupeng nstatuser@nstat-a:~$ ss -o state established -i '( dport = :9000 or sport = :9000 ) 91480cc4950Syupeng Netid Recv-Q Send-Q Local Address:Port Peer Address:Port 91580cc4950Syupeng tcp 0 0 192.168.122.250:40654 192.168.122.251:9000 91680cc4950Syupeng ts sack cubic wscale:7,7 rto:204 rtt:0.98/0.49 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:10 bytes_acked:1 segs_out:2 segs_in:1 send 118.2Mbps lastsnd:46572 lastrcv:46572 lastack:46572 pacing_rate 236.4Mbps rcv_space:29200 rcv_ssthresh:29200 minrtt:0.98 91780cc4950Syupeng 91880cc4950SyupengThe 'wscale:7,7' means both server and client set the window scale 91980cc4950Syupengoption to 7. Now we could explain the nstat output in our test: 92080cc4950Syupeng 92180cc4950SyupengIn the first nstat output of client side, the client sent a packet, server 92280cc4950Syupengreply an ACK, when kernel handled this ACK, the fast path was not 92380cc4950Syupengenabled, so the ACK was counted into 'TcpExtTCPPureAcks'. 92480cc4950Syupeng 92580cc4950SyupengIn the second nstat output of client side, the client sent a packet again, 92680cc4950Syupengand received another ACK from the server, in this time, the fast path is 92780cc4950Syupengenabled, and the ACK was qualified for fast path, so it was handled by 92880cc4950Syupengthe fast path, so this ACK was counted into TcpExtTCPHPAcks. 92980cc4950Syupeng 93080cc4950SyupengIn the first nstat output of server side, fast path was not enabled, 93180cc4950Syupengso there was no 'TcpExtTCPHPHits'. 93280cc4950Syupeng 93380cc4950SyupengIn the second nstat output of server side, the fast path was enabled, 93480cc4950Syupengand the packet received from client qualified for fast path, so it 93580cc4950Syupengwas counted into 'TcpExtTCPHPHits'. 93680cc4950Syupeng 93780cc4950SyupengTcpExtTCPAbortOnClose 938ae5220c6SRandy Dunlap--------------------- 93980cc4950SyupengOn the server side, we run below python script:: 94080cc4950Syupeng 94180cc4950Syupeng import socket 94280cc4950Syupeng import time 94380cc4950Syupeng 94480cc4950Syupeng port = 9000 94580cc4950Syupeng 94680cc4950Syupeng s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 94780cc4950Syupeng s.bind(('0.0.0.0', port)) 94880cc4950Syupeng s.listen(1) 94980cc4950Syupeng sock, addr = s.accept() 95080cc4950Syupeng while True: 95180cc4950Syupeng time.sleep(9999999) 95280cc4950Syupeng 95380cc4950SyupengThis python script listen on 9000 port, but doesn't read anything from 95480cc4950Syupengthe connection. 95580cc4950Syupeng 95680cc4950SyupengOn the client side, we send the string "hello" by nc:: 95780cc4950Syupeng 95880cc4950Syupeng nstatuser@nstat-a:~$ echo "hello" | nc nstat-b 9000 95980cc4950Syupeng 96080cc4950SyupengThen, we come back to the server side, the server has received the "hello" 96180cc4950Syupengpacket, and the TCP layer has acked this packet, but the application didn't 96280cc4950Syupengread it yet. We type Ctrl-C to terminate the server script. Then we 96380cc4950Syupengcould find TcpExtTCPAbortOnClose increased 1 on the server side:: 96480cc4950Syupeng 96580cc4950Syupeng nstatuser@nstat-b:~$ nstat | grep -i abort 96680cc4950Syupeng TcpExtTCPAbortOnClose 1 0.0 96780cc4950Syupeng 96880cc4950SyupengIf we run tcpdump on the server side, we could find the server sent a 96980cc4950SyupengRST after we type Ctrl-C. 97080cc4950Syupeng 97180cc4950SyupengTcpExtTCPAbortOnMemory and TcpExtTCPAbortOnTimeout 972ae5220c6SRandy Dunlap--------------------------------------------------- 97380cc4950SyupengBelow is an example which let the orphan socket count be higher than 97480cc4950Syupengnet.ipv4.tcp_max_orphans. 97580cc4950SyupengChange tcp_max_orphans to a smaller value on client:: 97680cc4950Syupeng 97780cc4950Syupeng sudo bash -c "echo 10 > /proc/sys/net/ipv4/tcp_max_orphans" 97880cc4950Syupeng 97980cc4950SyupengClient code (create 64 connection to server):: 98080cc4950Syupeng 98180cc4950Syupeng nstatuser@nstat-a:~$ cat client_orphan.py 98280cc4950Syupeng import socket 98380cc4950Syupeng import time 98480cc4950Syupeng 98580cc4950Syupeng server = 'nstat-b' # server address 98680cc4950Syupeng port = 9000 98780cc4950Syupeng 98880cc4950Syupeng count = 64 98980cc4950Syupeng 99080cc4950Syupeng connection_list = [] 99180cc4950Syupeng 99280cc4950Syupeng for i in range(64): 99380cc4950Syupeng s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 99480cc4950Syupeng s.connect((server, port)) 99580cc4950Syupeng connection_list.append(s) 99680cc4950Syupeng print("connection_count: %d" % len(connection_list)) 99780cc4950Syupeng 99880cc4950Syupeng while True: 99980cc4950Syupeng time.sleep(99999) 100080cc4950Syupeng 100180cc4950SyupengServer code (accept 64 connection from client):: 100280cc4950Syupeng 100380cc4950Syupeng nstatuser@nstat-b:~$ cat server_orphan.py 100480cc4950Syupeng import socket 100580cc4950Syupeng import time 100680cc4950Syupeng 100780cc4950Syupeng port = 9000 100880cc4950Syupeng count = 64 100980cc4950Syupeng 101080cc4950Syupeng s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 101180cc4950Syupeng s.bind(('0.0.0.0', port)) 101280cc4950Syupeng s.listen(count) 101380cc4950Syupeng connection_list = [] 101480cc4950Syupeng while True: 101580cc4950Syupeng sock, addr = s.accept() 101680cc4950Syupeng connection_list.append((sock, addr)) 101780cc4950Syupeng print("connection_count: %d" % len(connection_list)) 101880cc4950Syupeng 101980cc4950SyupengRun the python scripts on server and client. 102080cc4950Syupeng 102180cc4950SyupengOn server:: 102280cc4950Syupeng 102380cc4950Syupeng python3 server_orphan.py 102480cc4950Syupeng 102580cc4950SyupengOn client:: 102680cc4950Syupeng 102780cc4950Syupeng python3 client_orphan.py 102880cc4950Syupeng 102980cc4950SyupengRun iptables on server:: 103080cc4950Syupeng 103180cc4950Syupeng sudo iptables -A INPUT -i ens3 -p tcp --destination-port 9000 -j DROP 103280cc4950Syupeng 103380cc4950SyupengType Ctrl-C on client, stop client_orphan.py. 103480cc4950Syupeng 103580cc4950SyupengCheck TcpExtTCPAbortOnMemory on client:: 103680cc4950Syupeng 103780cc4950Syupeng nstatuser@nstat-a:~$ nstat | grep -i abort 103880cc4950Syupeng TcpExtTCPAbortOnMemory 54 0.0 103980cc4950Syupeng 104080cc4950SyupengCheck orphane socket count on client:: 104180cc4950Syupeng 104280cc4950Syupeng nstatuser@nstat-a:~$ ss -s 104380cc4950Syupeng Total: 131 (kernel 0) 104480cc4950Syupeng TCP: 14 (estab 1, closed 0, orphaned 10, synrecv 0, timewait 0/0), ports 0 104580cc4950Syupeng 104680cc4950Syupeng Transport Total IP IPv6 104780cc4950Syupeng * 0 - - 104880cc4950Syupeng RAW 1 0 1 104980cc4950Syupeng UDP 1 1 0 105080cc4950Syupeng TCP 14 13 1 105180cc4950Syupeng INET 16 14 2 105280cc4950Syupeng FRAG 0 0 0 105380cc4950Syupeng 105480cc4950SyupengThe explanation of the test: after run server_orphan.py and 105580cc4950Syupengclient_orphan.py, we set up 64 connections between server and 105680cc4950Syupengclient. Run the iptables command, the server will drop all packets from 105780cc4950Syupengthe client, type Ctrl-C on client_orphan.py, the system of the client 105880cc4950Syupengwould try to close these connections, and before they are closed 105980cc4950Syupenggracefully, these connections became orphan sockets. As the iptables 106080cc4950Syupengof the server blocked packets from the client, the server won't receive fin 106180cc4950Syupengfrom the client, so all connection on clients would be stuck on FIN_WAIT_1 106280cc4950Syupengstage, so they will keep as orphan sockets until timeout. We have echo 106380cc4950Syupeng10 to /proc/sys/net/ipv4/tcp_max_orphans, so the client system would 106480cc4950Syupengonly keep 10 orphan sockets, for all other orphan sockets, the client 106580cc4950Syupengsystem sent RST for them and delete them. We have 64 connections, so 106680cc4950Syupengthe 'ss -s' command shows the system has 10 orphan sockets, and the 106780cc4950Syupengvalue of TcpExtTCPAbortOnMemory was 54. 106880cc4950Syupeng 106980cc4950SyupengAn additional explanation about orphan socket count: You could find the 107080cc4950Syupengexactly orphan socket count by the 'ss -s' command, but when kernel 107180cc4950Syupengdecide whither increases TcpExtTCPAbortOnMemory and sends RST, kernel 107280cc4950Syupengdoesn't always check the exactly orphan socket count. For increasing 107380cc4950Syupengperformance, kernel checks an approximate count firstly, if the 107480cc4950Syupengapproximate count is more than tcp_max_orphans, kernel checks the 107580cc4950Syupengexact count again. So if the approximate count is less than 107680cc4950Syupengtcp_max_orphans, but exactly count is more than tcp_max_orphans, you 107780cc4950Syupengwould find TcpExtTCPAbortOnMemory is not increased at all. If 107880cc4950Syupengtcp_max_orphans is large enough, it won't occur, but if you decrease 107980cc4950Syupengtcp_max_orphans to a small value like our test, you might find this 108080cc4950Syupengissue. So in our test, the client set up 64 connections although the 108180cc4950Syupengtcp_max_orphans is 10. If the client only set up 11 connections, we 108280cc4950Syupengcan't find the change of TcpExtTCPAbortOnMemory. 108380cc4950Syupeng 108480cc4950SyupengContinue the previous test, we wait for several minutes. Because of the 108580cc4950Syupengiptables on the server blocked the traffic, the server wouldn't receive 108680cc4950Syupengfin, and all the client's orphan sockets would timeout on the 108780cc4950SyupengFIN_WAIT_1 state finally. So we wait for a few minutes, we could find 108880cc4950Syupeng10 timeout on the client:: 108980cc4950Syupeng 109080cc4950Syupeng nstatuser@nstat-a:~$ nstat | grep -i abort 109180cc4950Syupeng TcpExtTCPAbortOnTimeout 10 0.0 109280cc4950Syupeng 109380cc4950SyupengTcpExtTCPAbortOnLinger 1094ae5220c6SRandy Dunlap---------------------- 109580cc4950SyupengThe server side code:: 109680cc4950Syupeng 109780cc4950Syupeng nstatuser@nstat-b:~$ cat server_linger.py 109880cc4950Syupeng import socket 109980cc4950Syupeng import time 110080cc4950Syupeng 110180cc4950Syupeng port = 9000 110280cc4950Syupeng 110380cc4950Syupeng s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 110480cc4950Syupeng s.bind(('0.0.0.0', port)) 110580cc4950Syupeng s.listen(1) 110680cc4950Syupeng sock, addr = s.accept() 110780cc4950Syupeng while True: 110880cc4950Syupeng time.sleep(9999999) 110980cc4950Syupeng 111080cc4950SyupengThe client side code:: 111180cc4950Syupeng 111280cc4950Syupeng nstatuser@nstat-a:~$ cat client_linger.py 111380cc4950Syupeng import socket 111480cc4950Syupeng import struct 111580cc4950Syupeng 111680cc4950Syupeng server = 'nstat-b' # server address 111780cc4950Syupeng port = 9000 111880cc4950Syupeng 111980cc4950Syupeng s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 112080cc4950Syupeng s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 10)) 112180cc4950Syupeng s.setsockopt(socket.SOL_TCP, socket.TCP_LINGER2, struct.pack('i', -1)) 112280cc4950Syupeng s.connect((server, port)) 112380cc4950Syupeng s.close() 112480cc4950Syupeng 112580cc4950SyupengRun server_linger.py on server:: 112680cc4950Syupeng 112780cc4950Syupeng nstatuser@nstat-b:~$ python3 server_linger.py 112880cc4950Syupeng 112980cc4950SyupengRun client_linger.py on client:: 113080cc4950Syupeng 113180cc4950Syupeng nstatuser@nstat-a:~$ python3 client_linger.py 113280cc4950Syupeng 113380cc4950SyupengAfter run client_linger.py, check the output of nstat:: 113480cc4950Syupeng 113580cc4950Syupeng nstatuser@nstat-a:~$ nstat | grep -i abort 113680cc4950Syupeng TcpExtTCPAbortOnLinger 1 0.0 1137712ee16cSyupeng 1138712ee16cSyupengTcpExtTCPRcvCoalesce 1139ae5220c6SRandy Dunlap-------------------- 1140712ee16cSyupengOn the server, we run a program which listen on TCP port 9000, but 1141712ee16cSyupengdoesn't read any data:: 1142712ee16cSyupeng 1143712ee16cSyupeng import socket 1144712ee16cSyupeng import time 1145712ee16cSyupeng port = 9000 1146712ee16cSyupeng s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 1147712ee16cSyupeng s.bind(('0.0.0.0', port)) 1148712ee16cSyupeng s.listen(1) 1149712ee16cSyupeng sock, addr = s.accept() 1150712ee16cSyupeng while True: 1151712ee16cSyupeng time.sleep(9999999) 1152712ee16cSyupeng 1153712ee16cSyupengSave the above code as server_coalesce.py, and run:: 1154712ee16cSyupeng 1155712ee16cSyupeng python3 server_coalesce.py 1156712ee16cSyupeng 1157712ee16cSyupengOn the client, save below code as client_coalesce.py:: 1158712ee16cSyupeng 1159712ee16cSyupeng import socket 1160712ee16cSyupeng server = 'nstat-b' 1161712ee16cSyupeng port = 9000 1162712ee16cSyupeng s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 1163712ee16cSyupeng s.connect((server, port)) 1164712ee16cSyupeng 1165712ee16cSyupengRun:: 1166712ee16cSyupeng 1167712ee16cSyupeng nstatuser@nstat-a:~$ python3 -i client_coalesce.py 1168712ee16cSyupeng 1169712ee16cSyupengWe use '-i' to come into the interactive mode, then a packet:: 1170712ee16cSyupeng 1171712ee16cSyupeng >>> s.send(b'foo') 1172712ee16cSyupeng 3 1173712ee16cSyupeng 1174712ee16cSyupengSend a packet again:: 1175712ee16cSyupeng 1176712ee16cSyupeng >>> s.send(b'bar') 1177712ee16cSyupeng 3 1178712ee16cSyupeng 1179712ee16cSyupengOn the server, run nstat:: 1180712ee16cSyupeng 1181712ee16cSyupeng ubuntu@nstat-b:~$ nstat 1182712ee16cSyupeng #kernel 1183712ee16cSyupeng IpInReceives 2 0.0 1184712ee16cSyupeng IpInDelivers 2 0.0 1185712ee16cSyupeng IpOutRequests 2 0.0 1186712ee16cSyupeng TcpInSegs 2 0.0 1187712ee16cSyupeng TcpOutSegs 2 0.0 1188712ee16cSyupeng TcpExtTCPRcvCoalesce 1 0.0 1189712ee16cSyupeng IpExtInOctets 110 0.0 1190712ee16cSyupeng IpExtOutOctets 104 0.0 1191712ee16cSyupeng IpExtInNoECTPkts 2 0.0 1192712ee16cSyupeng 1193712ee16cSyupengThe client sent two packets, server didn't read any data. When 1194712ee16cSyupengthe second packet arrived at server, the first packet was still in 1195712ee16cSyupengthe receiving queue. So the TCP layer merged the two packets, and we 1196712ee16cSyupengcould find the TcpExtTCPRcvCoalesce increased 1. 1197712ee16cSyupeng 1198712ee16cSyupengTcpExtListenOverflows and TcpExtListenDrops 1199ae5220c6SRandy Dunlap------------------------------------------- 1200712ee16cSyupengOn server, run the nc command, listen on port 9000:: 1201712ee16cSyupeng 1202712ee16cSyupeng nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000 1203712ee16cSyupeng Listening on [0.0.0.0] (family 0, port 9000) 1204712ee16cSyupeng 1205712ee16cSyupengOn client, run 3 nc commands in different terminals:: 1206712ee16cSyupeng 1207712ee16cSyupeng nstatuser@nstat-a:~$ nc -v nstat-b 9000 1208712ee16cSyupeng Connection to nstat-b 9000 port [tcp/*] succeeded! 1209712ee16cSyupeng 1210712ee16cSyupengThe nc command only accepts 1 connection, and the accept queue length 1211712ee16cSyupengis 1. On current linux implementation, set queue length to n means the 1212712ee16cSyupengactual queue length is n+1. Now we create 3 connections, 1 is accepted 1213712ee16cSyupengby nc, 2 in accepted queue, so the accept queue is full. 1214712ee16cSyupeng 1215712ee16cSyupengBefore running the 4th nc, we clean the nstat history on the server:: 1216712ee16cSyupeng 1217712ee16cSyupeng nstatuser@nstat-b:~$ nstat -n 1218712ee16cSyupeng 1219712ee16cSyupengRun the 4th nc on the client:: 1220712ee16cSyupeng 1221712ee16cSyupeng nstatuser@nstat-a:~$ nc -v nstat-b 9000 1222712ee16cSyupeng 1223712ee16cSyupengIf the nc server is running on kernel 4.10 or higher version, you 1224712ee16cSyupengwon't see the "Connection to ... succeeded!" string, because kernel 1225712ee16cSyupengwill drop the SYN if the accept queue is full. If the nc client is running 1226712ee16cSyupengon an old kernel, you would see that the connection is succeeded, 1227712ee16cSyupengbecause kernel would complete the 3 way handshake and keep the socket 1228712ee16cSyupengon half open queue. I did the test on kernel 4.15. Below is the nstat 1229712ee16cSyupengon the server:: 1230712ee16cSyupeng 1231712ee16cSyupeng nstatuser@nstat-b:~$ nstat 1232712ee16cSyupeng #kernel 1233712ee16cSyupeng IpInReceives 4 0.0 1234712ee16cSyupeng IpInDelivers 4 0.0 1235712ee16cSyupeng TcpInSegs 4 0.0 1236712ee16cSyupeng TcpExtListenOverflows 4 0.0 1237712ee16cSyupeng TcpExtListenDrops 4 0.0 1238712ee16cSyupeng IpExtInOctets 240 0.0 1239712ee16cSyupeng IpExtInNoECTPkts 4 0.0 1240712ee16cSyupeng 1241712ee16cSyupengBoth TcpExtListenOverflows and TcpExtListenDrops were 4. If the time 1242712ee16cSyupengbetween the 4th nc and the nstat was longer, the value of 1243712ee16cSyupengTcpExtListenOverflows and TcpExtListenDrops would be larger, because 1244712ee16cSyupengthe SYN of the 4th nc was dropped, the client was retrying. 12458e2ea53aSyupeng 12468e2ea53aSyupengIpInAddrErrors, IpExtInNoRoutes and IpOutNoRoutes 1247ae5220c6SRandy Dunlap------------------------------------------------- 12488e2ea53aSyupengserver A IP address: 192.168.122.250 12498e2ea53aSyupengserver B IP address: 192.168.122.251 12508e2ea53aSyupengPrepare on server A, add a route to server B:: 12518e2ea53aSyupeng 12528e2ea53aSyupeng $ sudo ip route add 8.8.8.8/32 via 192.168.122.251 12538e2ea53aSyupeng 12548e2ea53aSyupengPrepare on server B, disable send_redirects for all interfaces:: 12558e2ea53aSyupeng 12568e2ea53aSyupeng $ sudo sysctl -w net.ipv4.conf.all.send_redirects=0 12578e2ea53aSyupeng $ sudo sysctl -w net.ipv4.conf.ens3.send_redirects=0 12588e2ea53aSyupeng $ sudo sysctl -w net.ipv4.conf.lo.send_redirects=0 12598e2ea53aSyupeng $ sudo sysctl -w net.ipv4.conf.default.send_redirects=0 12608e2ea53aSyupeng 12618e2ea53aSyupengWe want to let sever A send a packet to 8.8.8.8, and route the packet 12628e2ea53aSyupengto server B. When server B receives such packet, it might send a ICMP 12638e2ea53aSyupengRedirect message to server A, set send_redirects to 0 will disable 12648e2ea53aSyupengthis behavior. 12658e2ea53aSyupeng 12668e2ea53aSyupengFirst, generate InAddrErrors. On server B, we disable IP forwarding:: 12678e2ea53aSyupeng 12688e2ea53aSyupeng $ sudo sysctl -w net.ipv4.conf.all.forwarding=0 12698e2ea53aSyupeng 12708e2ea53aSyupengOn server A, we send packets to 8.8.8.8:: 12718e2ea53aSyupeng 12728e2ea53aSyupeng $ nc -v 8.8.8.8 53 12738e2ea53aSyupeng 12748e2ea53aSyupengOn server B, we check the output of nstat:: 12758e2ea53aSyupeng 12768e2ea53aSyupeng $ nstat 12778e2ea53aSyupeng #kernel 12788e2ea53aSyupeng IpInReceives 3 0.0 12798e2ea53aSyupeng IpInAddrErrors 3 0.0 12808e2ea53aSyupeng IpExtInOctets 180 0.0 12818e2ea53aSyupeng IpExtInNoECTPkts 3 0.0 12828e2ea53aSyupeng 12838e2ea53aSyupengAs we have let server A route 8.8.8.8 to server B, and we disabled IP 12848e2ea53aSyupengforwarding on server B, Server A sent packets to server B, then server B 12858e2ea53aSyupengdropped packets and increased IpInAddrErrors. As the nc command would 12868e2ea53aSyupengre-send the SYN packet if it didn't receive a SYN+ACK, we could find 12878e2ea53aSyupengmultiple IpInAddrErrors. 12888e2ea53aSyupeng 12898e2ea53aSyupengSecond, generate IpExtInNoRoutes. On server B, we enable IP 12908e2ea53aSyupengforwarding:: 12918e2ea53aSyupeng 12928e2ea53aSyupeng $ sudo sysctl -w net.ipv4.conf.all.forwarding=1 12938e2ea53aSyupeng 12948e2ea53aSyupengCheck the route table of server B and remove the default route:: 12958e2ea53aSyupeng 12968e2ea53aSyupeng $ ip route show 12978e2ea53aSyupeng default via 192.168.122.1 dev ens3 proto static 12988e2ea53aSyupeng 192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.251 12998e2ea53aSyupeng $ sudo ip route delete default via 192.168.122.1 dev ens3 proto static 13008e2ea53aSyupeng 13018e2ea53aSyupengOn server A, we contact 8.8.8.8 again:: 13028e2ea53aSyupeng 13038e2ea53aSyupeng $ nc -v 8.8.8.8 53 13048e2ea53aSyupeng nc: connect to 8.8.8.8 port 53 (tcp) failed: Network is unreachable 13058e2ea53aSyupeng 13068e2ea53aSyupengOn server B, run nstat:: 13078e2ea53aSyupeng 13088e2ea53aSyupeng $ nstat 13098e2ea53aSyupeng #kernel 13108e2ea53aSyupeng IpInReceives 1 0.0 13118e2ea53aSyupeng IpOutRequests 1 0.0 13128e2ea53aSyupeng IcmpOutMsgs 1 0.0 13138e2ea53aSyupeng IcmpOutDestUnreachs 1 0.0 13148e2ea53aSyupeng IcmpMsgOutType3 1 0.0 13158e2ea53aSyupeng IpExtInNoRoutes 1 0.0 13168e2ea53aSyupeng IpExtInOctets 60 0.0 13178e2ea53aSyupeng IpExtOutOctets 88 0.0 13188e2ea53aSyupeng IpExtInNoECTPkts 1 0.0 13198e2ea53aSyupeng 13208e2ea53aSyupengWe enabled IP forwarding on server B, when server B received a packet 13218e2ea53aSyupengwhich destination IP address is 8.8.8.8, server B will try to forward 13228e2ea53aSyupengthis packet. We have deleted the default route, there was no route for 13238e2ea53aSyupeng8.8.8.8, so server B increase IpExtInNoRoutes and sent the "ICMP 13248e2ea53aSyupengDestination Unreachable" message to server A. 13258e2ea53aSyupeng 13268e2ea53aSyupengThird, generate IpOutNoRoutes. Run ping command on server B:: 13278e2ea53aSyupeng 13288e2ea53aSyupeng $ ping -c 1 8.8.8.8 13298e2ea53aSyupeng connect: Network is unreachable 13308e2ea53aSyupeng 13318e2ea53aSyupengRun nstat on server B:: 13328e2ea53aSyupeng 13338e2ea53aSyupeng $ nstat 13348e2ea53aSyupeng #kernel 13358e2ea53aSyupeng IpOutNoRoutes 1 0.0 13368e2ea53aSyupeng 13378e2ea53aSyupengWe have deleted the default route on server B. Server B couldn't find 13388e2ea53aSyupenga route for the 8.8.8.8 IP address, so server B increased 13398e2ea53aSyupengIpOutNoRoutes. 13402b965472Syupeng 13412b965472SyupengTcpExtTCPACKSkippedSynRecv 1342ae5220c6SRandy Dunlap-------------------------- 13432b965472SyupengIn this test, we send 3 same SYN packets from client to server. The 13442b965472Syupengfirst SYN will let server create a socket, set it to Syn-Recv status, 13452b965472Syupengand reply a SYN/ACK. The second SYN will let server reply the SYN/ACK 13462b965472Syupengagain, and record the reply time (the duplicate ACK reply time). The 13472b965472Syupengthird SYN will let server check the previous duplicate ACK reply time, 13482b965472Syupengand decide to skip the duplicate ACK, then increase the 13492b965472SyupengTcpExtTCPACKSkippedSynRecv counter. 13502b965472Syupeng 13512b965472SyupengRun tcpdump to capture a SYN packet:: 13522b965472Syupeng 13532b965472Syupeng nstatuser@nstat-a:~$ sudo tcpdump -c 1 -w /tmp/syn.pcap port 9000 13542b965472Syupeng tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes 13552b965472Syupeng 13562b965472SyupengOpen another terminal, run nc command:: 13572b965472Syupeng 13582b965472Syupeng nstatuser@nstat-a:~$ nc nstat-b 9000 13592b965472Syupeng 13602b965472SyupengAs the nstat-b didn't listen on port 9000, it should reply a RST, and 13612b965472Syupengthe nc command exited immediately. It was enough for the tcpdump 13622b965472Syupengcommand to capture a SYN packet. A linux server might use hardware 13632b965472Syupengoffload for the TCP checksum, so the checksum in the /tmp/syn.pcap 13642b965472Syupengmight be not correct. We call tcprewrite to fix it:: 13652b965472Syupeng 13662b965472Syupeng nstatuser@nstat-a:~$ tcprewrite --infile=/tmp/syn.pcap --outfile=/tmp/syn_fixcsum.pcap --fixcsum 13672b965472Syupeng 13682b965472SyupengOn nstat-b, we run nc to listen on port 9000:: 13692b965472Syupeng 13702b965472Syupeng nstatuser@nstat-b:~$ nc -lkv 9000 13712b965472Syupeng Listening on [0.0.0.0] (family 0, port 9000) 13722b965472Syupeng 13732b965472SyupengOn nstat-a, we blocked the packet from port 9000, or nstat-a would send 13742b965472SyupengRST to nstat-b:: 13752b965472Syupeng 13762b965472Syupeng nstatuser@nstat-a:~$ sudo iptables -A INPUT -p tcp --sport 9000 -j DROP 13772b965472Syupeng 13782b965472SyupengSend 3 SYN repeatly to nstat-b:: 13792b965472Syupeng 13802b965472Syupeng nstatuser@nstat-a:~$ for i in {1..3}; do sudo tcpreplay -i ens3 /tmp/syn_fixcsum.pcap; done 13812b965472Syupeng 13822b965472SyupengCheck snmp cunter on nstat-b:: 13832b965472Syupeng 13842b965472Syupeng nstatuser@nstat-b:~$ nstat | grep -i skip 13852b965472Syupeng TcpExtTCPACKSkippedSynRecv 1 0.0 13862b965472Syupeng 13872b965472SyupengAs we expected, TcpExtTCPACKSkippedSynRecv is 1. 13882b965472Syupeng 13892b965472SyupengTcpExtTCPACKSkippedPAWS 1390ae5220c6SRandy Dunlap----------------------- 13912b965472SyupengTo trigger PAWS, we could send an old SYN. 13922b965472Syupeng 13932b965472SyupengOn nstat-b, let nc listen on port 9000:: 13942b965472Syupeng 13952b965472Syupeng nstatuser@nstat-b:~$ nc -lkv 9000 13962b965472Syupeng Listening on [0.0.0.0] (family 0, port 9000) 13972b965472Syupeng 13982b965472SyupengOn nstat-a, run tcpdump to capture a SYN:: 13992b965472Syupeng 14002b965472Syupeng nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/paws_pre.pcap -c 1 port 9000 14012b965472Syupeng tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes 14022b965472Syupeng 14032b965472SyupengOn nstat-a, run nc as a client to connect nstat-b:: 14042b965472Syupeng 14052b965472Syupeng nstatuser@nstat-a:~$ nc -v nstat-b 9000 14062b965472Syupeng Connection to nstat-b 9000 port [tcp/*] succeeded! 14072b965472Syupeng 14082b965472SyupengNow the tcpdump has captured the SYN and exit. We should fix the 14092b965472Syupengchecksum:: 14102b965472Syupeng 14112b965472Syupeng nstatuser@nstat-a:~$ tcprewrite --infile /tmp/paws_pre.pcap --outfile /tmp/paws.pcap --fixcsum 14122b965472Syupeng 14132b965472SyupengSend the SYN packet twice:: 14142b965472Syupeng 14152b965472Syupeng nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/paws.pcap; done 14162b965472Syupeng 14172b965472SyupengOn nstat-b, check the snmp counter:: 14182b965472Syupeng 14192b965472Syupeng nstatuser@nstat-b:~$ nstat | grep -i skip 14202b965472Syupeng TcpExtTCPACKSkippedPAWS 1 0.0 14212b965472Syupeng 14222b965472SyupengWe sent two SYN via tcpreplay, both of them would let PAWS check 14232b965472Syupengfailed, the nstat-b replied an ACK for the first SYN, skipped the ACK 14242b965472Syupengfor the second SYN, and updated TcpExtTCPACKSkippedPAWS. 14252b965472Syupeng 14262b965472SyupengTcpExtTCPACKSkippedSeq 1427ae5220c6SRandy Dunlap---------------------- 14282b965472SyupengTo trigger TcpExtTCPACKSkippedSeq, we send packets which have valid 14292b965472Syupengtimestamp (to pass PAWS check) but the sequence number is out of 14302b965472Syupengwindow. The linux TCP stack would avoid to skip if the packet has 14312b965472Syupengdata, so we need a pure ACK packet. To generate such a packet, we 14322b965472Syupengcould create two sockets: one on port 9000, another on port 9001. Then 14332b965472Syupengwe capture an ACK on port 9001, change the source/destination port 14342b965472Syupengnumbers to match the port 9000 socket. Then we could trigger 14352b965472SyupengTcpExtTCPACKSkippedSeq via this packet. 14362b965472Syupeng 14372b965472SyupengOn nstat-b, open two terminals, run two nc commands to listen on both 14382b965472Syupengport 9000 and port 9001:: 14392b965472Syupeng 14402b965472Syupeng nstatuser@nstat-b:~$ nc -lkv 9000 14412b965472Syupeng Listening on [0.0.0.0] (family 0, port 9000) 14422b965472Syupeng 14432b965472Syupeng nstatuser@nstat-b:~$ nc -lkv 9001 14442b965472Syupeng Listening on [0.0.0.0] (family 0, port 9001) 14452b965472Syupeng 14462b965472SyupengOn nstat-a, run two nc clients:: 14472b965472Syupeng 14482b965472Syupeng nstatuser@nstat-a:~$ nc -v nstat-b 9000 14492b965472Syupeng Connection to nstat-b 9000 port [tcp/*] succeeded! 14502b965472Syupeng 14512b965472Syupeng nstatuser@nstat-a:~$ nc -v nstat-b 9001 14522b965472Syupeng Connection to nstat-b 9001 port [tcp/*] succeeded! 14532b965472Syupeng 14542b965472SyupengOn nstat-a, run tcpdump to capture an ACK:: 14552b965472Syupeng 14562b965472Syupeng nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/seq_pre.pcap -c 1 dst port 9001 14572b965472Syupeng tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes 14582b965472Syupeng 14592b965472SyupengOn nstat-b, send a packet via the port 9001 socket. E.g. we sent a 14602b965472Syupengstring 'foo' in our example:: 14612b965472Syupeng 14622b965472Syupeng nstatuser@nstat-b:~$ nc -lkv 9001 14632b965472Syupeng Listening on [0.0.0.0] (family 0, port 9001) 14642b965472Syupeng Connection from nstat-a 42132 received! 14652b965472Syupeng foo 14662b965472Syupeng 14672b965472SyupengOn nstat-a, the tcpdump should have caputred the ACK. We should check 14682b965472Syupengthe source port numbers of the two nc clients:: 14692b965472Syupeng 14702b965472Syupeng nstatuser@nstat-a:~$ ss -ta '( dport = :9000 || dport = :9001 )' | tee 14712b965472Syupeng State Recv-Q Send-Q Local Address:Port Peer Address:Port 14722b965472Syupeng ESTAB 0 0 192.168.122.250:50208 192.168.122.251:9000 14732b965472Syupeng ESTAB 0 0 192.168.122.250:42132 192.168.122.251:9001 14742b965472Syupeng 14752b965472SyupengRun tcprewrite, change port 9001 to port 9000, chagne port 42132 to 14762b965472Syupengport 50208:: 14772b965472Syupeng 14782b965472Syupeng nstatuser@nstat-a:~$ tcprewrite --infile /tmp/seq_pre.pcap --outfile /tmp/seq.pcap -r 9001:9000 -r 42132:50208 --fixcsum 14792b965472Syupeng 14802b965472SyupengNow the /tmp/seq.pcap is the packet we need. Send it to nstat-b:: 14812b965472Syupeng 14822b965472Syupeng nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/seq.pcap; done 14832b965472Syupeng 14842b965472SyupengCheck TcpExtTCPACKSkippedSeq on nstat-b:: 14852b965472Syupeng 14862b965472Syupeng nstatuser@nstat-b:~$ nstat | grep -i skip 14872b965472Syupeng TcpExtTCPACKSkippedSeq 1 0.0 1488