1ae5220c6SRandy Dunlap============
2b08794a9SyupengSNMP counter
3ae5220c6SRandy Dunlap============
4b08794a9Syupeng
5b08794a9SyupengThis document explains the meaning of SNMP counters.
6b08794a9Syupeng
7b08794a9SyupengGeneral IPv4 counters
8ae5220c6SRandy Dunlap=====================
9b08794a9SyupengAll layer 4 packets and ICMP packets will change these counters, but
10b08794a9Syupengthese counters won't be changed by layer 2 packets (such as STP) or
11b08794a9SyupengARP packets.
12b08794a9Syupeng
13b08794a9Syupeng* IpInReceives
14ae5220c6SRandy Dunlap
15b08794a9SyupengDefined in `RFC1213 ipInReceives`_
16b08794a9Syupeng
17b08794a9Syupeng.. _RFC1213 ipInReceives: https://tools.ietf.org/html/rfc1213#page-26
18b08794a9Syupeng
19b08794a9SyupengThe number of packets received by the IP layer. It gets increasing at the
20b08794a9Syupengbeginning of ip_rcv function, always be updated together with
218e2ea53aSyupengIpExtInOctets. It will be increased even if the packet is dropped
228e2ea53aSyupenglater (e.g. due to the IP header is invalid or the checksum is wrong
238e2ea53aSyupengand so on).  It indicates the number of aggregated segments after
24b08794a9SyupengGRO/LRO.
25b08794a9Syupeng
26b08794a9Syupeng* IpInDelivers
27ae5220c6SRandy Dunlap
28b08794a9SyupengDefined in `RFC1213 ipInDelivers`_
29b08794a9Syupeng
30b08794a9Syupeng.. _RFC1213 ipInDelivers: https://tools.ietf.org/html/rfc1213#page-28
31b08794a9Syupeng
32b08794a9SyupengThe number of packets delivers to the upper layer protocols. E.g. TCP, UDP,
33b08794a9SyupengICMP and so on. If no one listens on a raw socket, only kernel
34b08794a9Syupengsupported protocols will be delivered, if someone listens on the raw
35b08794a9Syupengsocket, all valid IP packets will be delivered.
36b08794a9Syupeng
37b08794a9Syupeng* IpOutRequests
38ae5220c6SRandy Dunlap
39b08794a9SyupengDefined in `RFC1213 ipOutRequests`_
40b08794a9Syupeng
41b08794a9Syupeng.. _RFC1213 ipOutRequests: https://tools.ietf.org/html/rfc1213#page-28
42b08794a9Syupeng
43b08794a9SyupengThe number of packets sent via IP layer, for both single cast and
44b08794a9Syupengmulticast packets, and would always be updated together with
45b08794a9SyupengIpExtOutOctets.
46b08794a9Syupeng
47b08794a9Syupeng* IpExtInOctets and IpExtOutOctets
48ae5220c6SRandy Dunlap
4980cc4950SyupengThey are Linux kernel extensions, no RFC definitions. Please note,
50b08794a9SyupengRFC1213 indeed defines ifInOctets  and ifOutOctets, but they
51b08794a9Syupengare different things. The ifInOctets and ifOutOctets include the MAC
52b08794a9Syupenglayer header size but IpExtInOctets and IpExtOutOctets don't, they
53b08794a9Syupengonly include the IP layer header and the IP layer data.
54b08794a9Syupeng
55b08794a9Syupeng* IpExtInNoECTPkts, IpExtInECT1Pkts, IpExtInECT0Pkts, IpExtInCEPkts
56ae5220c6SRandy Dunlap
57b08794a9SyupengThey indicate the number of four kinds of ECN IP packets, please refer
58b08794a9Syupeng`Explicit Congestion Notification`_ for more details.
59b08794a9Syupeng
60b08794a9Syupeng.. _Explicit Congestion Notification: https://tools.ietf.org/html/rfc3168#page-6
61b08794a9Syupeng
62b08794a9SyupengThese 4 counters calculate how many packets received per ECN
63b08794a9Syupengstatus. They count the real frame number regardless the LRO/GRO. So
64b08794a9Syupengfor the same packet, you might find that IpInReceives count 1, but
65b08794a9SyupengIpExtInNoECTPkts counts 2 or more.
66b08794a9Syupeng
678e2ea53aSyupeng* IpInHdrErrors
68ae5220c6SRandy Dunlap
698e2ea53aSyupengDefined in `RFC1213 ipInHdrErrors`_. It indicates the packet is
708e2ea53aSyupengdropped due to the IP header error. It might happen in both IP input
718e2ea53aSyupengand IP forward paths.
728e2ea53aSyupeng
738e2ea53aSyupeng.. _RFC1213 ipInHdrErrors: https://tools.ietf.org/html/rfc1213#page-27
748e2ea53aSyupeng
758e2ea53aSyupeng* IpInAddrErrors
76ae5220c6SRandy Dunlap
778e2ea53aSyupengDefined in `RFC1213 ipInAddrErrors`_. It will be increased in two
788e2ea53aSyupengscenarios: (1) The IP address is invalid. (2) The destination IP
798e2ea53aSyupengaddress is not a local address and IP forwarding is not enabled
808e2ea53aSyupeng
818e2ea53aSyupeng.. _RFC1213 ipInAddrErrors: https://tools.ietf.org/html/rfc1213#page-27
828e2ea53aSyupeng
838e2ea53aSyupeng* IpExtInNoRoutes
84ae5220c6SRandy Dunlap
858e2ea53aSyupengThis counter means the packet is dropped when the IP stack receives a
868e2ea53aSyupengpacket and can't find a route for it from the route table. It might
878e2ea53aSyupenghappen when IP forwarding is enabled and the destination IP address is
888e2ea53aSyupengnot a local address and there is no route for the destination IP
898e2ea53aSyupengaddress.
908e2ea53aSyupeng
918e2ea53aSyupeng* IpInUnknownProtos
92ae5220c6SRandy Dunlap
938e2ea53aSyupengDefined in `RFC1213 ipInUnknownProtos`_. It will be increased if the
948e2ea53aSyupenglayer 4 protocol is unsupported by kernel. If an application is using
958e2ea53aSyupengraw socket, kernel will always deliver the packet to the raw socket
968e2ea53aSyupengand this counter won't be increased.
978e2ea53aSyupeng
988e2ea53aSyupeng.. _RFC1213 ipInUnknownProtos: https://tools.ietf.org/html/rfc1213#page-27
998e2ea53aSyupeng
1008e2ea53aSyupeng* IpExtInTruncatedPkts
101ae5220c6SRandy Dunlap
1028e2ea53aSyupengFor IPv4 packet, it means the actual data size is smaller than the
1038e2ea53aSyupeng"Total Length" field in the IPv4 header.
1048e2ea53aSyupeng
1058e2ea53aSyupeng* IpInDiscards
106ae5220c6SRandy Dunlap
1078e2ea53aSyupengDefined in `RFC1213 ipInDiscards`_. It indicates the packet is dropped
1088e2ea53aSyupengin the IP receiving path and due to kernel internal reasons (e.g. no
1098e2ea53aSyupengenough memory).
1108e2ea53aSyupeng
1118e2ea53aSyupeng.. _RFC1213 ipInDiscards: https://tools.ietf.org/html/rfc1213#page-28
1128e2ea53aSyupeng
1138e2ea53aSyupeng* IpOutDiscards
114ae5220c6SRandy Dunlap
1158e2ea53aSyupengDefined in `RFC1213 ipOutDiscards`_. It indicates the packet is
1168e2ea53aSyupengdropped in the IP sending path and due to kernel internal reasons.
1178e2ea53aSyupeng
1188e2ea53aSyupeng.. _RFC1213 ipOutDiscards: https://tools.ietf.org/html/rfc1213#page-28
1198e2ea53aSyupeng
1208e2ea53aSyupeng* IpOutNoRoutes
121ae5220c6SRandy Dunlap
1228e2ea53aSyupengDefined in `RFC1213 ipOutNoRoutes`_. It indicates the packet is
1238e2ea53aSyupengdropped in the IP sending path and no route is found for it.
1248e2ea53aSyupeng
1258e2ea53aSyupeng.. _RFC1213 ipOutNoRoutes: https://tools.ietf.org/html/rfc1213#page-29
1268e2ea53aSyupeng
127b08794a9SyupengICMP counters
128ae5220c6SRandy Dunlap=============
129b08794a9Syupeng* IcmpInMsgs and IcmpOutMsgs
130ae5220c6SRandy Dunlap
131b08794a9SyupengDefined by `RFC1213 icmpInMsgs`_ and `RFC1213 icmpOutMsgs`_
132b08794a9Syupeng
133b08794a9Syupeng.. _RFC1213 icmpInMsgs: https://tools.ietf.org/html/rfc1213#page-41
134b08794a9Syupeng.. _RFC1213 icmpOutMsgs: https://tools.ietf.org/html/rfc1213#page-43
135b08794a9Syupeng
136b08794a9SyupengAs mentioned in the RFC1213, these two counters include errors, they
137b08794a9Syupengwould be increased even if the ICMP packet has an invalid type. The
138b08794a9SyupengICMP output path will check the header of a raw socket, so the
139b08794a9SyupengIcmpOutMsgs would still be updated if the IP header is constructed by
140b08794a9Syupenga userspace program.
141b08794a9Syupeng
142b08794a9Syupeng* ICMP named types
143ae5220c6SRandy Dunlap
144b08794a9Syupeng| These counters include most of common ICMP types, they are:
145b08794a9Syupeng| IcmpInDestUnreachs: `RFC1213 icmpInDestUnreachs`_
146b08794a9Syupeng| IcmpInTimeExcds: `RFC1213 icmpInTimeExcds`_
147b08794a9Syupeng| IcmpInParmProbs: `RFC1213 icmpInParmProbs`_
148b08794a9Syupeng| IcmpInSrcQuenchs: `RFC1213 icmpInSrcQuenchs`_
149b08794a9Syupeng| IcmpInRedirects: `RFC1213 icmpInRedirects`_
150b08794a9Syupeng| IcmpInEchos: `RFC1213 icmpInEchos`_
151b08794a9Syupeng| IcmpInEchoReps: `RFC1213 icmpInEchoReps`_
152b08794a9Syupeng| IcmpInTimestamps: `RFC1213 icmpInTimestamps`_
153b08794a9Syupeng| IcmpInTimestampReps: `RFC1213 icmpInTimestampReps`_
154b08794a9Syupeng| IcmpInAddrMasks: `RFC1213 icmpInAddrMasks`_
155b08794a9Syupeng| IcmpInAddrMaskReps: `RFC1213 icmpInAddrMaskReps`_
156b08794a9Syupeng| IcmpOutDestUnreachs: `RFC1213 icmpOutDestUnreachs`_
157b08794a9Syupeng| IcmpOutTimeExcds: `RFC1213 icmpOutTimeExcds`_
158b08794a9Syupeng| IcmpOutParmProbs: `RFC1213 icmpOutParmProbs`_
159b08794a9Syupeng| IcmpOutSrcQuenchs: `RFC1213 icmpOutSrcQuenchs`_
160b08794a9Syupeng| IcmpOutRedirects: `RFC1213 icmpOutRedirects`_
161b08794a9Syupeng| IcmpOutEchos: `RFC1213 icmpOutEchos`_
162b08794a9Syupeng| IcmpOutEchoReps: `RFC1213 icmpOutEchoReps`_
163b08794a9Syupeng| IcmpOutTimestamps: `RFC1213 icmpOutTimestamps`_
164b08794a9Syupeng| IcmpOutTimestampReps: `RFC1213 icmpOutTimestampReps`_
165b08794a9Syupeng| IcmpOutAddrMasks: `RFC1213 icmpOutAddrMasks`_
166b08794a9Syupeng| IcmpOutAddrMaskReps: `RFC1213 icmpOutAddrMaskReps`_
167b08794a9Syupeng
168b08794a9Syupeng.. _RFC1213 icmpInDestUnreachs: https://tools.ietf.org/html/rfc1213#page-41
169b08794a9Syupeng.. _RFC1213 icmpInTimeExcds: https://tools.ietf.org/html/rfc1213#page-41
170b08794a9Syupeng.. _RFC1213 icmpInParmProbs: https://tools.ietf.org/html/rfc1213#page-42
171b08794a9Syupeng.. _RFC1213 icmpInSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-42
172b08794a9Syupeng.. _RFC1213 icmpInRedirects: https://tools.ietf.org/html/rfc1213#page-42
173b08794a9Syupeng.. _RFC1213 icmpInEchos: https://tools.ietf.org/html/rfc1213#page-42
174b08794a9Syupeng.. _RFC1213 icmpInEchoReps: https://tools.ietf.org/html/rfc1213#page-42
175b08794a9Syupeng.. _RFC1213 icmpInTimestamps: https://tools.ietf.org/html/rfc1213#page-42
176b08794a9Syupeng.. _RFC1213 icmpInTimestampReps: https://tools.ietf.org/html/rfc1213#page-43
177b08794a9Syupeng.. _RFC1213 icmpInAddrMasks: https://tools.ietf.org/html/rfc1213#page-43
178b08794a9Syupeng.. _RFC1213 icmpInAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-43
179b08794a9Syupeng
180b08794a9Syupeng.. _RFC1213 icmpOutDestUnreachs: https://tools.ietf.org/html/rfc1213#page-44
181b08794a9Syupeng.. _RFC1213 icmpOutTimeExcds: https://tools.ietf.org/html/rfc1213#page-44
182b08794a9Syupeng.. _RFC1213 icmpOutParmProbs: https://tools.ietf.org/html/rfc1213#page-44
183b08794a9Syupeng.. _RFC1213 icmpOutSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-44
184b08794a9Syupeng.. _RFC1213 icmpOutRedirects: https://tools.ietf.org/html/rfc1213#page-44
185b08794a9Syupeng.. _RFC1213 icmpOutEchos: https://tools.ietf.org/html/rfc1213#page-45
186b08794a9Syupeng.. _RFC1213 icmpOutEchoReps: https://tools.ietf.org/html/rfc1213#page-45
187b08794a9Syupeng.. _RFC1213 icmpOutTimestamps: https://tools.ietf.org/html/rfc1213#page-45
188b08794a9Syupeng.. _RFC1213 icmpOutTimestampReps: https://tools.ietf.org/html/rfc1213#page-45
189b08794a9Syupeng.. _RFC1213 icmpOutAddrMasks: https://tools.ietf.org/html/rfc1213#page-45
190b08794a9Syupeng.. _RFC1213 icmpOutAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-46
191b08794a9Syupeng
192b08794a9SyupengEvery ICMP type has two counters: 'In' and 'Out'. E.g., for the ICMP
193b08794a9SyupengEcho packet, they are IcmpInEchos and IcmpOutEchos. Their meanings are
194b08794a9Syupengstraightforward. The 'In' counter means kernel receives such a packet
195b08794a9Syupengand the 'Out' counter means kernel sends such a packet.
196b08794a9Syupeng
197b08794a9Syupeng* ICMP numeric types
198ae5220c6SRandy Dunlap
199b08794a9SyupengThey are IcmpMsgInType[N] and IcmpMsgOutType[N], the [N] indicates the
200b08794a9SyupengICMP type number. These counters track all kinds of ICMP packets. The
201b08794a9SyupengICMP type number definition could be found in the `ICMP parameters`_
202b08794a9Syupengdocument.
203b08794a9Syupeng
204b08794a9Syupeng.. _ICMP parameters: https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml
205b08794a9Syupeng
206b08794a9SyupengFor example, if the Linux kernel sends an ICMP Echo packet, the
207b08794a9SyupengIcmpMsgOutType8 would increase 1. And if kernel gets an ICMP Echo Reply
208b08794a9Syupengpacket, IcmpMsgInType0 would increase 1.
209b08794a9Syupeng
210b08794a9Syupeng* IcmpInCsumErrors
211ae5220c6SRandy Dunlap
212b08794a9SyupengThis counter indicates the checksum of the ICMP packet is
213b08794a9Syupengwrong. Kernel verifies the checksum after updating the IcmpInMsgs and
214b08794a9Syupengbefore updating IcmpMsgInType[N]. If a packet has bad checksum, the
215b08794a9SyupengIcmpInMsgs would be updated but none of IcmpMsgInType[N] would be updated.
216b08794a9Syupeng
217b08794a9Syupeng* IcmpInErrors and IcmpOutErrors
218ae5220c6SRandy Dunlap
219b08794a9SyupengDefined by `RFC1213 icmpInErrors`_ and `RFC1213 icmpOutErrors`_
220b08794a9Syupeng
221b08794a9Syupeng.. _RFC1213 icmpInErrors: https://tools.ietf.org/html/rfc1213#page-41
222b08794a9Syupeng.. _RFC1213 icmpOutErrors: https://tools.ietf.org/html/rfc1213#page-43
223b08794a9Syupeng
224b08794a9SyupengWhen an error occurs in the ICMP packet handler path, these two
225b08794a9Syupengcounters would be updated. The receiving packet path use IcmpInErrors
226b08794a9Syupengand the sending packet path use IcmpOutErrors. When IcmpInCsumErrors
227b08794a9Syupengis increased, IcmpInErrors would always be increased too.
228b08794a9Syupeng
229b08794a9Syupengrelationship of the ICMP counters
230ae5220c6SRandy Dunlap---------------------------------
231b08794a9SyupengThe sum of IcmpMsgOutType[N] is always equal to IcmpOutMsgs, as they
232b08794a9Syupengare updated at the same time. The sum of IcmpMsgInType[N] plus
233b08794a9SyupengIcmpInErrors should be equal or larger than IcmpInMsgs. When kernel
234b08794a9Syupengreceives an ICMP packet, kernel follows below logic:
235b08794a9Syupeng
236b08794a9Syupeng1. increase IcmpInMsgs
237b08794a9Syupeng2. if has any error, update IcmpInErrors and finish the process
238b08794a9Syupeng3. update IcmpMsgOutType[N]
239b08794a9Syupeng4. handle the packet depending on the type, if has any error, update
240b08794a9Syupeng   IcmpInErrors and finish the process
241b08794a9Syupeng
242b08794a9SyupengSo if all errors occur in step (2), IcmpInMsgs should be equal to the
243b08794a9Syupengsum of IcmpMsgOutType[N] plus IcmpInErrors. If all errors occur in
244b08794a9Syupengstep (4), IcmpInMsgs should be equal to the sum of
245b08794a9SyupengIcmpMsgOutType[N]. If the errors occur in both step (2) and step (4),
246b08794a9SyupengIcmpInMsgs should be less than the sum of IcmpMsgOutType[N] plus
247b08794a9SyupengIcmpInErrors.
248b08794a9Syupeng
24980cc4950SyupengGeneral TCP counters
250ae5220c6SRandy Dunlap====================
25180cc4950Syupeng* TcpInSegs
252ae5220c6SRandy Dunlap
25380cc4950SyupengDefined in `RFC1213 tcpInSegs`_
25480cc4950Syupeng
25580cc4950Syupeng.. _RFC1213 tcpInSegs: https://tools.ietf.org/html/rfc1213#page-48
25680cc4950Syupeng
25780cc4950SyupengThe number of packets received by the TCP layer. As mentioned in
25880cc4950SyupengRFC1213, it includes the packets received in error, such as checksum
25980cc4950Syupengerror, invalid TCP header and so on. Only one error won't be included:
26080cc4950Syupengif the layer 2 destination address is not the NIC's layer 2
26180cc4950Syupengaddress. It might happen if the packet is a multicast or broadcast
26280cc4950Syupengpacket, or the NIC is in promiscuous mode. In these situations, the
26380cc4950Syupengpackets would be delivered to the TCP layer, but the TCP layer will discard
26480cc4950Syupengthese packets before increasing TcpInSegs. The TcpInSegs counter
26580cc4950Syupengisn't aware of GRO. So if two packets are merged by GRO, the TcpInSegs
26680cc4950Syupengcounter would only increase 1.
26780cc4950Syupeng
26880cc4950Syupeng* TcpOutSegs
269ae5220c6SRandy Dunlap
27080cc4950SyupengDefined in `RFC1213 tcpOutSegs`_
27180cc4950Syupeng
27280cc4950Syupeng.. _RFC1213 tcpOutSegs: https://tools.ietf.org/html/rfc1213#page-48
27380cc4950Syupeng
27480cc4950SyupengThe number of packets sent by the TCP layer. As mentioned in RFC1213,
27580cc4950Syupengit excludes the retransmitted packets. But it includes the SYN, ACK
27680cc4950Syupengand RST packets. Doesn't like TcpInSegs, the TcpOutSegs is aware of
27780cc4950SyupengGSO, so if a packet would be split to 2 by GSO, TcpOutSegs will
27880cc4950Syupengincrease 2.
27980cc4950Syupeng
28080cc4950Syupeng* TcpActiveOpens
281ae5220c6SRandy Dunlap
28280cc4950SyupengDefined in `RFC1213 tcpActiveOpens`_
28380cc4950Syupeng
28480cc4950Syupeng.. _RFC1213 tcpActiveOpens: https://tools.ietf.org/html/rfc1213#page-47
28580cc4950Syupeng
28680cc4950SyupengIt means the TCP layer sends a SYN, and come into the SYN-SENT
28780cc4950Syupengstate. Every time TcpActiveOpens increases 1, TcpOutSegs should always
28880cc4950Syupengincrease 1.
28980cc4950Syupeng
29080cc4950Syupeng* TcpPassiveOpens
291ae5220c6SRandy Dunlap
29280cc4950SyupengDefined in `RFC1213 tcpPassiveOpens`_
29380cc4950Syupeng
29480cc4950Syupeng.. _RFC1213 tcpPassiveOpens: https://tools.ietf.org/html/rfc1213#page-47
29580cc4950Syupeng
29680cc4950SyupengIt means the TCP layer receives a SYN, replies a SYN+ACK, come into
29780cc4950Syupengthe SYN-RCVD state.
29880cc4950Syupeng
299712ee16cSyupeng* TcpExtTCPRcvCoalesce
300ae5220c6SRandy Dunlap
301712ee16cSyupengWhen packets are received by the TCP layer and are not be read by the
302712ee16cSyupengapplication, the TCP layer will try to merge them. This counter
303712ee16cSyupengindicate how many packets are merged in such situation. If GRO is
304712ee16cSyupengenabled, lots of packets would be merged by GRO, these packets
305712ee16cSyupengwouldn't be counted to TcpExtTCPRcvCoalesce.
306712ee16cSyupeng
307712ee16cSyupeng* TcpExtTCPAutoCorking
308ae5220c6SRandy Dunlap
309712ee16cSyupengWhen sending packets, the TCP layer will try to merge small packets to
310712ee16cSyupenga bigger one. This counter increase 1 for every packet merged in such
311712ee16cSyupengsituation. Please refer to the LWN article for more details:
312712ee16cSyupenghttps://lwn.net/Articles/576263/
313712ee16cSyupeng
314712ee16cSyupeng* TcpExtTCPOrigDataSent
315ae5220c6SRandy Dunlap
316712ee16cSyupengThis counter is explained by `kernel commit f19c29e3e391`_, I pasted the
317712ee16cSyupengexplaination below::
318712ee16cSyupeng
319712ee16cSyupeng  TCPOrigDataSent: number of outgoing packets with original data (excluding
320712ee16cSyupeng  retransmission but including data-in-SYN). This counter is different from
321712ee16cSyupeng  TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is
322712ee16cSyupeng  more useful to track the TCP retransmission rate.
323712ee16cSyupeng
324712ee16cSyupeng* TCPSynRetrans
325ae5220c6SRandy Dunlap
326712ee16cSyupengThis counter is explained by `kernel commit f19c29e3e391`_, I pasted the
327712ee16cSyupengexplaination below::
328712ee16cSyupeng
329712ee16cSyupeng  TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down
330712ee16cSyupeng  retransmissions into SYN, fast-retransmits, timeout retransmits, etc.
331712ee16cSyupeng
332712ee16cSyupeng* TCPFastOpenActiveFail
333ae5220c6SRandy Dunlap
334712ee16cSyupengThis counter is explained by `kernel commit f19c29e3e391`_, I pasted the
335712ee16cSyupengexplaination below::
336712ee16cSyupeng
337712ee16cSyupeng  TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed because
338712ee16cSyupeng  the remote does not accept it or the attempts timed out.
339712ee16cSyupeng
340712ee16cSyupeng.. _kernel commit f19c29e3e391: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd
341712ee16cSyupeng
342712ee16cSyupeng* TcpExtListenOverflows and TcpExtListenDrops
343ae5220c6SRandy Dunlap
344712ee16cSyupengWhen kernel receives a SYN from a client, and if the TCP accept queue
345712ee16cSyupengis full, kernel will drop the SYN and add 1 to TcpExtListenOverflows.
346712ee16cSyupengAt the same time kernel will also add 1 to TcpExtListenDrops. When a
347712ee16cSyupengTCP socket is in LISTEN state, and kernel need to drop a packet,
348712ee16cSyupengkernel would always add 1 to TcpExtListenDrops. So increase
349712ee16cSyupengTcpExtListenOverflows would let TcpExtListenDrops increasing at the
350712ee16cSyupengsame time, but TcpExtListenDrops would also increase without
351712ee16cSyupengTcpExtListenOverflows increasing, e.g. a memory allocation fail would
352712ee16cSyupengalso let TcpExtListenDrops increase.
353712ee16cSyupeng
354712ee16cSyupengNote: The above explanation is based on kernel 4.10 or above version, on
355712ee16cSyupengan old kernel, the TCP stack has different behavior when TCP accept
356712ee16cSyupengqueue is full. On the old kernel, TCP stack won't drop the SYN, it
357712ee16cSyupengwould complete the 3-way handshake. As the accept queue is full, TCP
358712ee16cSyupengstack will keep the socket in the TCP half-open queue. As it is in the
359712ee16cSyupenghalf open queue, TCP stack will send SYN+ACK on an exponential backoff
360712ee16cSyupengtimer, after client replies ACK, TCP stack checks whether the accept
361712ee16cSyupengqueue is still full, if it is not full, moves the socket to the accept
362712ee16cSyupengqueue, if it is full, keeps the socket in the half-open queue, at next
363712ee16cSyupengtime client replies ACK, this socket will get another chance to move
364712ee16cSyupengto the accept queue.
365712ee16cSyupeng
366712ee16cSyupeng
36780cc4950SyupengTCP Fast Open
368ae5220c6SRandy Dunlap=============
36980cc4950SyupengWhen kernel receives a TCP packet, it has two paths to handler the
37080cc4950Syupengpacket, one is fast path, another is slow path. The comment in kernel
37180cc4950Syupengcode provides a good explanation of them, I pasted them below::
37280cc4950Syupeng
37380cc4950Syupeng  It is split into a fast path and a slow path. The fast path is
37480cc4950Syupeng  disabled when:
37580cc4950Syupeng
37680cc4950Syupeng  - A zero window was announced from us
37780cc4950Syupeng  - zero window probing
37880cc4950Syupeng    is only handled properly on the slow path.
37980cc4950Syupeng  - Out of order segments arrived.
38080cc4950Syupeng  - Urgent data is expected.
38180cc4950Syupeng  - There is no buffer space left
38280cc4950Syupeng  - Unexpected TCP flags/window values/header lengths are received
38380cc4950Syupeng    (detected by checking the TCP header against pred_flags)
38480cc4950Syupeng  - Data is sent in both directions. The fast path only supports pure senders
38580cc4950Syupeng    or pure receivers (this means either the sequence number or the ack
38680cc4950Syupeng    value must stay constant)
38780cc4950Syupeng  - Unexpected TCP option.
38880cc4950Syupeng
38980cc4950SyupengKernel will try to use fast path unless any of the above conditions
39080cc4950Syupengare satisfied. If the packets are out of order, kernel will handle
39180cc4950Syupengthem in slow path, which means the performance might be not very
39280cc4950Syupenggood. Kernel would also come into slow path if the "Delayed ack" is
39380cc4950Syupengused, because when using "Delayed ack", the data is sent in both
39480cc4950Syupengdirections. When the TCP window scale option is not used, kernel will
39580cc4950Syupengtry to enable fast path immediately when the connection comes into the
39680cc4950Syupengestablished state, but if the TCP window scale option is used, kernel
39780cc4950Syupengwill disable the fast path at first, and try to enable it after kernel
39880cc4950Syupengreceives packets.
39980cc4950Syupeng
40080cc4950Syupeng* TcpExtTCPPureAcks and TcpExtTCPHPAcks
401ae5220c6SRandy Dunlap
40280cc4950SyupengIf a packet set ACK flag and has no data, it is a pure ACK packet, if
40380cc4950Syupengkernel handles it in the fast path, TcpExtTCPHPAcks will increase 1,
40480cc4950Syupengif kernel handles it in the slow path, TcpExtTCPPureAcks will
40580cc4950Syupengincrease 1.
40680cc4950Syupeng
40780cc4950Syupeng* TcpExtTCPHPHits
408ae5220c6SRandy Dunlap
40980cc4950SyupengIf a TCP packet has data (which means it is not a pure ACK packet),
41080cc4950Syupengand this packet is handled in the fast path, TcpExtTCPHPHits will
41180cc4950Syupengincrease 1.
41280cc4950Syupeng
41380cc4950Syupeng
41480cc4950SyupengTCP abort
415ae5220c6SRandy Dunlap=========
41680cc4950Syupeng
41780cc4950Syupeng* TcpExtTCPAbortOnData
418ae5220c6SRandy Dunlap
41980cc4950SyupengIt means TCP layer has data in flight, but need to close the
42080cc4950Syupengconnection. So TCP layer sends a RST to the other side, indicate the
42180cc4950Syupengconnection is not closed very graceful. An easy way to increase this
42280cc4950Syupengcounter is using the SO_LINGER option. Please refer to the SO_LINGER
42380cc4950Syupengsection of the `socket man page`_:
42480cc4950Syupeng
42580cc4950Syupeng.. _socket man page: http://man7.org/linux/man-pages/man7/socket.7.html
42680cc4950Syupeng
42780cc4950SyupengBy default, when an application closes a connection, the close function
42880cc4950Syupengwill return immediately and kernel will try to send the in-flight data
42980cc4950Syupengasync. If you use the SO_LINGER option, set l_onoff to 1, and l_linger
43080cc4950Syupengto a positive number, the close function won't return immediately, but
43180cc4950Syupengwait for the in-flight data are acked by the other side, the max wait
43280cc4950Syupengtime is l_linger seconds. If set l_onoff to 1 and set l_linger to 0,
43380cc4950Syupengwhen the application closes a connection, kernel will send a RST
43480cc4950Syupengimmediately and increase the TcpExtTCPAbortOnData counter.
43580cc4950Syupeng
43680cc4950Syupeng* TcpExtTCPAbortOnClose
437ae5220c6SRandy Dunlap
43880cc4950SyupengThis counter means the application has unread data in the TCP layer when
43980cc4950Syupengthe application wants to close the TCP connection. In such a situation,
44080cc4950Syupengkernel will send a RST to the other side of the TCP connection.
44180cc4950Syupeng
44280cc4950Syupeng* TcpExtTCPAbortOnMemory
443ae5220c6SRandy Dunlap
44480cc4950SyupengWhen an application closes a TCP connection, kernel still need to track
44580cc4950Syupengthe connection, let it complete the TCP disconnect process. E.g. an
44680cc4950Syupengapp calls the close method of a socket, kernel sends fin to the other
44780cc4950Syupengside of the connection, then the app has no relationship with the
44880cc4950Syupengsocket any more, but kernel need to keep the socket, this socket
44980cc4950Syupengbecomes an orphan socket, kernel waits for the reply of the other side,
45080cc4950Syupengand would come to the TIME_WAIT state finally. When kernel has no
45180cc4950Syupengenough memory to keep the orphan socket, kernel would send an RST to
45280cc4950Syupengthe other side, and delete the socket, in such situation, kernel will
45380cc4950Syupengincrease 1 to the TcpExtTCPAbortOnMemory. Two conditions would trigger
45480cc4950SyupengTcpExtTCPAbortOnMemory:
45580cc4950Syupeng
45680cc4950Syupeng1. the memory used by the TCP protocol is higher than the third value of
45780cc4950Syupengthe tcp_mem. Please refer the tcp_mem section in the `TCP man page`_:
45880cc4950Syupeng
45980cc4950Syupeng.. _TCP man page: http://man7.org/linux/man-pages/man7/tcp.7.html
46080cc4950Syupeng
46180cc4950Syupeng2. the orphan socket count is higher than net.ipv4.tcp_max_orphans
46280cc4950Syupeng
46380cc4950Syupeng
46480cc4950Syupeng* TcpExtTCPAbortOnTimeout
465ae5220c6SRandy Dunlap
46680cc4950SyupengThis counter will increase when any of the TCP timers expire. In such
46780cc4950Syupengsituation, kernel won't send RST, just give up the connection.
46880cc4950Syupeng
46980cc4950Syupeng* TcpExtTCPAbortOnLinger
470ae5220c6SRandy Dunlap
47180cc4950SyupengWhen a TCP connection comes into FIN_WAIT_2 state, instead of waiting
47280cc4950Syupengfor the fin packet from the other side, kernel could send a RST and
47380cc4950Syupengdelete the socket immediately. This is not the default behavior of
47480cc4950SyupengLinux kernel TCP stack. By configuring the TCP_LINGER2 socket option,
47580cc4950Syupengyou could let kernel follow this behavior.
47680cc4950Syupeng
47780cc4950Syupeng* TcpExtTCPAbortFailed
478ae5220c6SRandy Dunlap
47980cc4950SyupengThe kernel TCP layer will send RST if the `RFC2525 2.17 section`_ is
48080cc4950Syupengsatisfied. If an internal error occurs during this process,
48180cc4950SyupengTcpExtTCPAbortFailed will be increased.
48280cc4950Syupeng
48380cc4950Syupeng.. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50
48480cc4950Syupeng
485712ee16cSyupengTCP Hybrid Slow Start
486ae5220c6SRandy Dunlap=====================
487712ee16cSyupengThe Hybrid Slow Start algorithm is an enhancement of the traditional
488712ee16cSyupengTCP congestion window Slow Start algorithm. It uses two pieces of
489712ee16cSyupenginformation to detect whether the max bandwidth of the TCP path is
490712ee16cSyupengapproached. The two pieces of information are ACK train length and
491712ee16cSyupengincrease in packet delay. For detail information, please refer the
492712ee16cSyupeng`Hybrid Slow Start paper`_. Either ACK train length or packet delay
493712ee16cSyupenghits a specific threshold, the congestion control algorithm will come
494712ee16cSyupenginto the Congestion Avoidance state. Until v4.20, two congestion
495712ee16cSyupengcontrol algorithms are using Hybrid Slow Start, they are cubic (the
496712ee16cSyupengdefault congestion control algorithm) and cdg. Four snmp counters
497712ee16cSyupengrelate with the Hybrid Slow Start algorithm.
498712ee16cSyupeng
499712ee16cSyupeng.. _Hybrid Slow Start paper: https://pdfs.semanticscholar.org/25e9/ef3f03315782c7f1cbcd31b587857adae7d1.pdf
500712ee16cSyupeng
501712ee16cSyupeng* TcpExtTCPHystartTrainDetect
502ae5220c6SRandy Dunlap
503712ee16cSyupengHow many times the ACK train length threshold is detected
504712ee16cSyupeng
505712ee16cSyupeng* TcpExtTCPHystartTrainCwnd
506ae5220c6SRandy Dunlap
507712ee16cSyupengThe sum of CWND detected by ACK train length. Dividing this value by
508712ee16cSyupengTcpExtTCPHystartTrainDetect is the average CWND which detected by the
509712ee16cSyupengACK train length.
510712ee16cSyupeng
511712ee16cSyupeng* TcpExtTCPHystartDelayDetect
512ae5220c6SRandy Dunlap
513712ee16cSyupengHow many times the packet delay threshold is detected.
514712ee16cSyupeng
515712ee16cSyupeng* TcpExtTCPHystartDelayCwnd
516ae5220c6SRandy Dunlap
517712ee16cSyupengThe sum of CWND detected by packet delay. Dividing this value by
518712ee16cSyupengTcpExtTCPHystartDelayDetect is the average CWND which detected by the
519712ee16cSyupengpacket delay.
520712ee16cSyupeng
5218e2ea53aSyupengTCP retransmission and congestion control
522ae5220c6SRandy Dunlap=========================================
5238e2ea53aSyupengThe TCP protocol has two retransmission mechanisms: SACK and fast
5248e2ea53aSyupengrecovery. They are exclusive with each other. When SACK is enabled,
5258e2ea53aSyupengthe kernel TCP stack would use SACK, or kernel would use fast
5268e2ea53aSyupengrecovery. The SACK is a TCP option, which is defined in `RFC2018`_,
5278e2ea53aSyupengthe fast recovery is defined in `RFC6582`_, which is also called
5288e2ea53aSyupeng'Reno'.
5298e2ea53aSyupeng
5308e2ea53aSyupengThe TCP congestion control is a big and complex topic. To understand
5318e2ea53aSyupengthe related snmp counter, we need to know the states of the congestion
5328e2ea53aSyupengcontrol state machine. There are 5 states: Open, Disorder, CWR,
5338e2ea53aSyupengRecovery and Loss. For details about these states, please refer page 5
5348e2ea53aSyupengand page 6 of this document:
5358e2ea53aSyupenghttps://pdfs.semanticscholar.org/0e9c/968d09ab2e53e24c4dca5b2d67c7f7140f8e.pdf
5368e2ea53aSyupeng
5378e2ea53aSyupeng.. _RFC2018: https://tools.ietf.org/html/rfc2018
5388e2ea53aSyupeng.. _RFC6582: https://tools.ietf.org/html/rfc6582
5398e2ea53aSyupeng
5408e2ea53aSyupeng* TcpExtTCPRenoRecovery and TcpExtTCPSackRecovery
541ae5220c6SRandy Dunlap
5428e2ea53aSyupengWhen the congestion control comes into Recovery state, if sack is
5438e2ea53aSyupengused, TcpExtTCPSackRecovery increases 1, if sack is not used,
5448e2ea53aSyupengTcpExtTCPRenoRecovery increases 1. These two counters mean the TCP
5458e2ea53aSyupengstack begins to retransmit the lost packets.
5468e2ea53aSyupeng
5478e2ea53aSyupeng* TcpExtTCPSACKReneging
548ae5220c6SRandy Dunlap
5498e2ea53aSyupengA packet was acknowledged by SACK, but the receiver has dropped this
5508e2ea53aSyupengpacket, so the sender needs to retransmit this packet. In this
5518e2ea53aSyupengsituation, the sender adds 1 to TcpExtTCPSACKReneging. A receiver
5528e2ea53aSyupengcould drop a packet which has been acknowledged by SACK, although it is
5538e2ea53aSyupengunusual, it is allowed by the TCP protocol. The sender doesn't really
5548e2ea53aSyupengknow what happened on the receiver side. The sender just waits until
5558e2ea53aSyupengthe RTO expires for this packet, then the sender assumes this packet
5568e2ea53aSyupenghas been dropped by the receiver.
5578e2ea53aSyupeng
5588e2ea53aSyupeng* TcpExtTCPRenoReorder
559ae5220c6SRandy Dunlap
5608e2ea53aSyupengThe reorder packet is detected by fast recovery. It would only be used
5618e2ea53aSyupengif SACK is disabled. The fast recovery algorithm detects recorder by
5628e2ea53aSyupengthe duplicate ACK number. E.g., if retransmission is triggered, and
5638e2ea53aSyupengthe original retransmitted packet is not lost, it is just out of
5648e2ea53aSyupengorder, the receiver would acknowledge multiple times, one for the
5658e2ea53aSyupengretransmitted packet, another for the arriving of the original out of
5668e2ea53aSyupengorder packet. Thus the sender would find more ACks than its
5678e2ea53aSyupengexpectation, and the sender knows out of order occurs.
5688e2ea53aSyupeng
5698e2ea53aSyupeng* TcpExtTCPTSReorder
570ae5220c6SRandy Dunlap
5718e2ea53aSyupengThe reorder packet is detected when a hole is filled. E.g., assume the
5728e2ea53aSyupengsender sends packet 1,2,3,4,5, and the receiving order is
5738e2ea53aSyupeng1,2,4,5,3. When the sender receives the ACK of packet 3 (which will
5748e2ea53aSyupengfill the hole), two conditions will let TcpExtTCPTSReorder increase
5758e2ea53aSyupeng1: (1) if the packet 3 is not re-retransmitted yet. (2) if the packet
5768e2ea53aSyupeng3 is retransmitted but the timestamp of the packet 3's ACK is earlier
5778e2ea53aSyupengthan the retransmission timestamp.
5788e2ea53aSyupeng
5798e2ea53aSyupeng* TcpExtTCPSACKReorder
580ae5220c6SRandy Dunlap
5818e2ea53aSyupengThe reorder packet detected by SACK. The SACK has two methods to
5828e2ea53aSyupengdetect reorder: (1) DSACK is received by the sender. It means the
5838e2ea53aSyupengsender sends the same packet more than one times. And the only reason
5848e2ea53aSyupengis the sender believes an out of order packet is lost so it sends the
5858e2ea53aSyupengpacket again. (2) Assume packet 1,2,3,4,5 are sent by the sender, and
5868e2ea53aSyupengthe sender has received SACKs for packet 2 and 5, now the sender
5878e2ea53aSyupengreceives SACK for packet 4 and the sender doesn't retransmit the
5888e2ea53aSyupengpacket yet, the sender would know packet 4 is out of order. The TCP
5898e2ea53aSyupengstack of kernel will increase TcpExtTCPSACKReorder for both of the
5908e2ea53aSyupengabove scenarios.
5918e2ea53aSyupeng
5928e2ea53aSyupeng
5938e2ea53aSyupengDSACK
5948e2ea53aSyupeng=====
5958e2ea53aSyupengThe DSACK is defined in `RFC2883`_. The receiver uses DSACK to report
5968e2ea53aSyupengduplicate packets to the sender. There are two kinds of
5978e2ea53aSyupengduplications: (1) a packet which has been acknowledged is
5988e2ea53aSyupengduplicate. (2) an out of order packet is duplicate. The TCP stack
5998e2ea53aSyupengcounts these two kinds of duplications on both receiver side and
6008e2ea53aSyupengsender side.
6018e2ea53aSyupeng
6028e2ea53aSyupeng.. _RFC2883 : https://tools.ietf.org/html/rfc2883
6038e2ea53aSyupeng
6048e2ea53aSyupeng* TcpExtTCPDSACKOldSent
605ae5220c6SRandy Dunlap
6068e2ea53aSyupengThe TCP stack receives a duplicate packet which has been acked, so it
6078e2ea53aSyupengsends a DSACK to the sender.
6088e2ea53aSyupeng
6098e2ea53aSyupeng* TcpExtTCPDSACKOfoSent
610ae5220c6SRandy Dunlap
6118e2ea53aSyupengThe TCP stack receives an out of order duplicate packet, so it sends a
6128e2ea53aSyupengDSACK to the sender.
6138e2ea53aSyupeng
6148e2ea53aSyupeng* TcpExtTCPDSACKRecv
615ae5220c6SRandy Dunlap
6168e2ea53aSyupengThe TCP stack receives a DSACK, which indicate an acknowledged
6178e2ea53aSyupengduplicate packet is received.
6188e2ea53aSyupeng
6198e2ea53aSyupeng* TcpExtTCPDSACKOfoRecv
620ae5220c6SRandy Dunlap
6218e2ea53aSyupengThe TCP stack receives a DSACK, which indicate an out of order
6222b965472Syupengduplicate packet is received.
6232b965472Syupeng
6242b965472SyupengTCP out of order
625ae5220c6SRandy Dunlap================
6262b965472Syupeng* TcpExtTCPOFOQueue
627ae5220c6SRandy Dunlap
6282b965472SyupengThe TCP layer receives an out of order packet and has enough memory
6292b965472Syupengto queue it.
6302b965472Syupeng
6312b965472Syupeng* TcpExtTCPOFODrop
632ae5220c6SRandy Dunlap
6332b965472SyupengThe TCP layer receives an out of order packet but doesn't have enough
6342b965472Syupengmemory, so drops it. Such packets won't be counted into
6352b965472SyupengTcpExtTCPOFOQueue.
6362b965472Syupeng
6372b965472Syupeng* TcpExtTCPOFOMerge
638ae5220c6SRandy Dunlap
6392b965472SyupengThe received out of order packet has an overlay with the previous
6402b965472Syupengpacket. the overlay part will be dropped. All of TcpExtTCPOFOMerge
6412b965472Syupengpackets will also be counted into TcpExtTCPOFOQueue.
6422b965472Syupeng
6432b965472SyupengTCP PAWS
644ae5220c6SRandy Dunlap========
6452b965472SyupengPAWS (Protection Against Wrapped Sequence numbers) is an algorithm
6462b965472Syupengwhich is used to drop old packets. It depends on the TCP
6472b965472Syupengtimestamps. For detail information, please refer the `timestamp wiki`_
6482b965472Syupengand the `RFC of PAWS`_.
6492b965472Syupeng
6502b965472Syupeng.. _RFC of PAWS: https://tools.ietf.org/html/rfc1323#page-17
6512b965472Syupeng.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps
6522b965472Syupeng
6532b965472Syupeng* TcpExtPAWSActive
654ae5220c6SRandy Dunlap
6552b965472SyupengPackets are dropped by PAWS in Syn-Sent status.
6562b965472Syupeng
6572b965472Syupeng* TcpExtPAWSEstab
658ae5220c6SRandy Dunlap
6592b965472SyupengPackets are dropped by PAWS in any status other than Syn-Sent.
6602b965472Syupeng
6612b965472SyupengTCP ACK skip
662ae5220c6SRandy Dunlap============
6632b965472SyupengIn some scenarios, kernel would avoid sending duplicate ACKs too
6642b965472Syupengfrequently. Please find more details in the tcp_invalid_ratelimit
6652b965472Syupengsection of the `sysctl document`_. When kernel decides to skip an ACK
6662b965472Syupengdue to tcp_invalid_ratelimit, kernel would update one of below
6672b965472Syupengcounters to indicate the ACK is skipped in which scenario. The ACK
6682b965472Syupengwould only be skipped if the received packet is either a SYN packet or
6692b965472Syupengit has no data.
6702b965472Syupeng
6712b965472Syupeng.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
6722b965472Syupeng
6732b965472Syupeng* TcpExtTCPACKSkippedSynRecv
674ae5220c6SRandy Dunlap
6752b965472SyupengThe ACK is skipped in Syn-Recv status. The Syn-Recv status means the
6762b965472SyupengTCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is
6772b965472Syupengwaiting for an ACK. Generally, the TCP stack doesn't need to send ACK
6782b965472Syupengin the Syn-Recv status. But in several scenarios, the TCP stack need
6792b965472Syupengto send an ACK. E.g., the TCP stack receives the same SYN packet
6802b965472Syupengrepeately, the received packet does not pass the PAWS check, or the
6812b965472Syupengreceived packet sequence number is out of window. In these scenarios,
6822b965472Syupengthe TCP stack needs to send ACK. If the ACk sending frequency is higher than
6832b965472Syupengtcp_invalid_ratelimit allows, the TCP stack will skip sending ACK and
6842b965472Syupengincrease TcpExtTCPACKSkippedSynRecv.
6852b965472Syupeng
6862b965472Syupeng
6872b965472Syupeng* TcpExtTCPACKSkippedPAWS
688ae5220c6SRandy Dunlap
6892b965472SyupengThe ACK is skipped due to PAWS (Protect Against Wrapped Sequence
6902b965472Syupengnumbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2
6912b965472Syupengor Time-Wait statuses, the skipped ACK would be counted to
6922b965472SyupengTcpExtTCPACKSkippedSynRecv, TcpExtTCPACKSkippedFinWait2 or
6932b965472SyupengTcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK
6942b965472Syupengwould be counted to TcpExtTCPACKSkippedPAWS.
6952b965472Syupeng
6962b965472Syupeng* TcpExtTCPACKSkippedSeq
697ae5220c6SRandy Dunlap
6982b965472SyupengThe sequence number is out of window and the timestamp passes the PAWS
6992b965472Syupengcheck and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait.
7002b965472Syupeng
7012b965472Syupeng* TcpExtTCPACKSkippedFinWait2
702ae5220c6SRandy Dunlap
7032b965472SyupengThe ACK is skipped in Fin-Wait-2 status, the reason would be either
7042b965472SyupengPAWS check fails or the received sequence number is out of window.
7052b965472Syupeng
7062b965472Syupeng* TcpExtTCPACKSkippedTimeWait
707ae5220c6SRandy Dunlap
7082b965472SyupengTha ACK is skipped in Time-Wait status, the reason would be either
7092b965472SyupengPAWS check failed or the received sequence number is out of window.
7102b965472Syupeng
7112b965472Syupeng* TcpExtTCPACKSkippedChallenge
712ae5220c6SRandy Dunlap
7132b965472SyupengThe ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines
7142b965472Syupeng3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_,
7152b965472Syupeng`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these
7162b965472Syupengthree scenarios, In some TCP status, the linux TCP stack would also
7172b965472Syupengsend challenge ACKs if the ACK number is before the first
7182b965472Syupengunacknowledged number (more strict than `RFC 5961 section 5.2`_).
7192b965472Syupeng
7202b965472Syupeng.. _RFC 5961 section 3.2: https://tools.ietf.org/html/rfc5961#page-7
7212b965472Syupeng.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9
7222b965472Syupeng.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11
7232b965472Syupeng
7248e2ea53aSyupeng
725b08794a9Syupengexamples
726ae5220c6SRandy Dunlap========
727b08794a9Syupeng
728b08794a9Syupengping test
729ae5220c6SRandy Dunlap---------
730b08794a9SyupengRun the ping command against the public dns server 8.8.8.8::
731b08794a9Syupeng
732b08794a9Syupeng  nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1
733b08794a9Syupeng  PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
734b08794a9Syupeng  64 bytes from 8.8.8.8: icmp_seq=1 ttl=119 time=17.8 ms
735b08794a9Syupeng
736b08794a9Syupeng  --- 8.8.8.8 ping statistics ---
737b08794a9Syupeng  1 packets transmitted, 1 received, 0% packet loss, time 0ms
738b08794a9Syupeng  rtt min/avg/max/mdev = 17.875/17.875/17.875/0.000 ms
739b08794a9Syupeng
740b08794a9SyupengThe nstayt result::
741b08794a9Syupeng
742b08794a9Syupeng  nstatuser@nstat-a:~$ nstat
743b08794a9Syupeng  #kernel
744b08794a9Syupeng  IpInReceives                    1                  0.0
745b08794a9Syupeng  IpInDelivers                    1                  0.0
746b08794a9Syupeng  IpOutRequests                   1                  0.0
747b08794a9Syupeng  IcmpInMsgs                      1                  0.0
748b08794a9Syupeng  IcmpInEchoReps                  1                  0.0
749b08794a9Syupeng  IcmpOutMsgs                     1                  0.0
750b08794a9Syupeng  IcmpOutEchos                    1                  0.0
751b08794a9Syupeng  IcmpMsgInType0                  1                  0.0
752b08794a9Syupeng  IcmpMsgOutType8                 1                  0.0
753b08794a9Syupeng  IpExtInOctets                   84                 0.0
754b08794a9Syupeng  IpExtOutOctets                  84                 0.0
755b08794a9Syupeng  IpExtInNoECTPkts                1                  0.0
756b08794a9Syupeng
757b08794a9SyupengThe Linux server sent an ICMP Echo packet, so IpOutRequests,
758b08794a9SyupengIcmpOutMsgs, IcmpOutEchos and IcmpMsgOutType8 were increased 1. The
759b08794a9Syupengserver got ICMP Echo Reply from 8.8.8.8, so IpInReceives, IcmpInMsgs,
760b08794a9SyupengIcmpInEchoReps and IcmpMsgInType0 were increased 1. The ICMP Echo Reply
761b08794a9Syupengwas passed to the ICMP layer via IP layer, so IpInDelivers was
762b08794a9Syupengincreased 1. The default ping data size is 48, so an ICMP Echo packet
763b08794a9Syupengand its corresponding Echo Reply packet are constructed by:
764b08794a9Syupeng
765b08794a9Syupeng* 14 bytes MAC header
766b08794a9Syupeng* 20 bytes IP header
767b08794a9Syupeng* 16 bytes ICMP header
768b08794a9Syupeng* 48 bytes data (default value of the ping command)
769b08794a9Syupeng
770b08794a9SyupengSo the IpExtInOctets and IpExtOutOctets are 20+16+48=84.
77180cc4950Syupeng
77280cc4950Syupengtcp 3-way handshake
773ae5220c6SRandy Dunlap-------------------
77480cc4950SyupengOn server side, we run::
77580cc4950Syupeng
77680cc4950Syupeng  nstatuser@nstat-b:~$ nc -lknv 0.0.0.0 9000
77780cc4950Syupeng  Listening on [0.0.0.0] (family 0, port 9000)
77880cc4950Syupeng
77980cc4950SyupengOn client side, we run::
78080cc4950Syupeng
78180cc4950Syupeng  nstatuser@nstat-a:~$ nc -nv 192.168.122.251 9000
78280cc4950Syupeng  Connection to 192.168.122.251 9000 port [tcp/*] succeeded!
78380cc4950Syupeng
78480cc4950SyupengThe server listened on tcp 9000 port, the client connected to it, they
78580cc4950Syupengcompleted the 3-way handshake.
78680cc4950Syupeng
78780cc4950SyupengOn server side, we can find below nstat output::
78880cc4950Syupeng
78980cc4950Syupeng  nstatuser@nstat-b:~$ nstat | grep -i tcp
79080cc4950Syupeng  TcpPassiveOpens                 1                  0.0
79180cc4950Syupeng  TcpInSegs                       2                  0.0
79280cc4950Syupeng  TcpOutSegs                      1                  0.0
79380cc4950Syupeng  TcpExtTCPPureAcks               1                  0.0
79480cc4950Syupeng
79580cc4950SyupengOn client side, we can find below nstat output::
79680cc4950Syupeng
79780cc4950Syupeng  nstatuser@nstat-a:~$ nstat | grep -i tcp
79880cc4950Syupeng  TcpActiveOpens                  1                  0.0
79980cc4950Syupeng  TcpInSegs                       1                  0.0
80080cc4950Syupeng  TcpOutSegs                      2                  0.0
80180cc4950Syupeng
80280cc4950SyupengWhen the server received the first SYN, it replied a SYN+ACK, and came into
80380cc4950SyupengSYN-RCVD state, so TcpPassiveOpens increased 1. The server received
80480cc4950SyupengSYN, sent SYN+ACK, received ACK, so server sent 1 packet, received 2
80580cc4950Syupengpackets, TcpInSegs increased 2, TcpOutSegs increased 1. The last ACK
80680cc4950Syupengof the 3-way handshake is a pure ACK without data, so
80780cc4950SyupengTcpExtTCPPureAcks increased 1.
80880cc4950Syupeng
80980cc4950SyupengWhen the client sent SYN, the client came into the SYN-SENT state, so
81080cc4950SyupengTcpActiveOpens increased 1, the client sent SYN, received SYN+ACK, sent
81180cc4950SyupengACK, so client sent 2 packets, received 1 packet, TcpInSegs increased
81280cc4950Syupeng1, TcpOutSegs increased 2.
81380cc4950Syupeng
81480cc4950SyupengTCP normal traffic
815ae5220c6SRandy Dunlap------------------
81680cc4950SyupengRun nc on server::
81780cc4950Syupeng
81880cc4950Syupeng  nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
81980cc4950Syupeng  Listening on [0.0.0.0] (family 0, port 9000)
82080cc4950Syupeng
82180cc4950SyupengRun nc on client::
82280cc4950Syupeng
82380cc4950Syupeng  nstatuser@nstat-a:~$ nc -v nstat-b 9000
82480cc4950Syupeng  Connection to nstat-b 9000 port [tcp/*] succeeded!
82580cc4950Syupeng
82680cc4950SyupengInput a string in the nc client ('hello' in our example)::
82780cc4950Syupeng
82880cc4950Syupeng  nstatuser@nstat-a:~$ nc -v nstat-b 9000
82980cc4950Syupeng  Connection to nstat-b 9000 port [tcp/*] succeeded!
83080cc4950Syupeng  hello
83180cc4950Syupeng
83280cc4950SyupengThe client side nstat output::
83380cc4950Syupeng
83480cc4950Syupeng  nstatuser@nstat-a:~$ nstat
83580cc4950Syupeng  #kernel
83680cc4950Syupeng  IpInReceives                    1                  0.0
83780cc4950Syupeng  IpInDelivers                    1                  0.0
83880cc4950Syupeng  IpOutRequests                   1                  0.0
83980cc4950Syupeng  TcpInSegs                       1                  0.0
84080cc4950Syupeng  TcpOutSegs                      1                  0.0
84180cc4950Syupeng  TcpExtTCPPureAcks               1                  0.0
84280cc4950Syupeng  TcpExtTCPOrigDataSent           1                  0.0
84380cc4950Syupeng  IpExtInOctets                   52                 0.0
84480cc4950Syupeng  IpExtOutOctets                  58                 0.0
84580cc4950Syupeng  IpExtInNoECTPkts                1                  0.0
84680cc4950Syupeng
84780cc4950SyupengThe server side nstat output::
84880cc4950Syupeng
84980cc4950Syupeng  nstatuser@nstat-b:~$ nstat
85080cc4950Syupeng  #kernel
85180cc4950Syupeng  IpInReceives                    1                  0.0
85280cc4950Syupeng  IpInDelivers                    1                  0.0
85380cc4950Syupeng  IpOutRequests                   1                  0.0
85480cc4950Syupeng  TcpInSegs                       1                  0.0
85580cc4950Syupeng  TcpOutSegs                      1                  0.0
85680cc4950Syupeng  IpExtInOctets                   58                 0.0
85780cc4950Syupeng  IpExtOutOctets                  52                 0.0
85880cc4950Syupeng  IpExtInNoECTPkts                1                  0.0
85980cc4950Syupeng
86080cc4950SyupengInput a string in nc client side again ('world' in our exmaple)::
86180cc4950Syupeng
86280cc4950Syupeng  nstatuser@nstat-a:~$ nc -v nstat-b 9000
86380cc4950Syupeng  Connection to nstat-b 9000 port [tcp/*] succeeded!
86480cc4950Syupeng  hello
86580cc4950Syupeng  world
86680cc4950Syupeng
86780cc4950SyupengClient side nstat output::
86880cc4950Syupeng
86980cc4950Syupeng  nstatuser@nstat-a:~$ nstat
87080cc4950Syupeng  #kernel
87180cc4950Syupeng  IpInReceives                    1                  0.0
87280cc4950Syupeng  IpInDelivers                    1                  0.0
87380cc4950Syupeng  IpOutRequests                   1                  0.0
87480cc4950Syupeng  TcpInSegs                       1                  0.0
87580cc4950Syupeng  TcpOutSegs                      1                  0.0
87680cc4950Syupeng  TcpExtTCPHPAcks                 1                  0.0
87780cc4950Syupeng  TcpExtTCPOrigDataSent           1                  0.0
87880cc4950Syupeng  IpExtInOctets                   52                 0.0
87980cc4950Syupeng  IpExtOutOctets                  58                 0.0
88080cc4950Syupeng  IpExtInNoECTPkts                1                  0.0
88180cc4950Syupeng
88280cc4950Syupeng
88380cc4950SyupengServer side nstat output::
88480cc4950Syupeng
88580cc4950Syupeng  nstatuser@nstat-b:~$ nstat
88680cc4950Syupeng  #kernel
88780cc4950Syupeng  IpInReceives                    1                  0.0
88880cc4950Syupeng  IpInDelivers                    1                  0.0
88980cc4950Syupeng  IpOutRequests                   1                  0.0
89080cc4950Syupeng  TcpInSegs                       1                  0.0
89180cc4950Syupeng  TcpOutSegs                      1                  0.0
89280cc4950Syupeng  TcpExtTCPHPHits                 1                  0.0
89380cc4950Syupeng  IpExtInOctets                   58                 0.0
89480cc4950Syupeng  IpExtOutOctets                  52                 0.0
89580cc4950Syupeng  IpExtInNoECTPkts                1                  0.0
89680cc4950Syupeng
89780cc4950SyupengCompare the first client-side nstat and the second client-side nstat,
89880cc4950Syupengwe could find one difference: the first one had a 'TcpExtTCPPureAcks',
89980cc4950Syupengbut the second one had a 'TcpExtTCPHPAcks'. The first server-side
90080cc4950Syupengnstat and the second server-side nstat had a difference too: the
90180cc4950Syupengsecond server-side nstat had a TcpExtTCPHPHits, but the first
90280cc4950Syupengserver-side nstat didn't have it. The network traffic patterns were
90380cc4950Syupengexactly the same: the client sent a packet to the server, the server
90480cc4950Syupengreplied an ACK. But kernel handled them in different ways. When the
90580cc4950SyupengTCP window scale option is not used, kernel will try to enable fast
90680cc4950Syupengpath immediately when the connection comes into the established state,
90780cc4950Syupengbut if the TCP window scale option is used, kernel will disable the
90880cc4950Syupengfast path at first, and try to enable it after kerenl receives
90980cc4950Syupengpackets. We could use the 'ss' command to verify whether the window
91080cc4950Syupengscale option is used. e.g. run below command on either server or
91180cc4950Syupengclient::
91280cc4950Syupeng
91380cc4950Syupeng  nstatuser@nstat-a:~$ ss -o state established -i '( dport = :9000 or sport = :9000 )
91480cc4950Syupeng  Netid    Recv-Q     Send-Q            Local Address:Port             Peer Address:Port
91580cc4950Syupeng  tcp      0          0               192.168.122.250:40654         192.168.122.251:9000
91680cc4950Syupeng             ts sack cubic wscale:7,7 rto:204 rtt:0.98/0.49 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:10 bytes_acked:1 segs_out:2 segs_in:1 send 118.2Mbps lastsnd:46572 lastrcv:46572 lastack:46572 pacing_rate 236.4Mbps rcv_space:29200 rcv_ssthresh:29200 minrtt:0.98
91780cc4950Syupeng
91880cc4950SyupengThe 'wscale:7,7' means both server and client set the window scale
91980cc4950Syupengoption to 7. Now we could explain the nstat output in our test:
92080cc4950Syupeng
92180cc4950SyupengIn the first nstat output of client side, the client sent a packet, server
92280cc4950Syupengreply an ACK, when kernel handled this ACK, the fast path was not
92380cc4950Syupengenabled, so the ACK was counted into 'TcpExtTCPPureAcks'.
92480cc4950Syupeng
92580cc4950SyupengIn the second nstat output of client side, the client sent a packet again,
92680cc4950Syupengand received another ACK from the server, in this time, the fast path is
92780cc4950Syupengenabled, and the ACK was qualified for fast path, so it was handled by
92880cc4950Syupengthe fast path, so this ACK was counted into TcpExtTCPHPAcks.
92980cc4950Syupeng
93080cc4950SyupengIn the first nstat output of server side, fast path was not enabled,
93180cc4950Syupengso there was no 'TcpExtTCPHPHits'.
93280cc4950Syupeng
93380cc4950SyupengIn the second nstat output of server side, the fast path was enabled,
93480cc4950Syupengand the packet received from client qualified for fast path, so it
93580cc4950Syupengwas counted into 'TcpExtTCPHPHits'.
93680cc4950Syupeng
93780cc4950SyupengTcpExtTCPAbortOnClose
938ae5220c6SRandy Dunlap---------------------
93980cc4950SyupengOn the server side, we run below python script::
94080cc4950Syupeng
94180cc4950Syupeng  import socket
94280cc4950Syupeng  import time
94380cc4950Syupeng
94480cc4950Syupeng  port = 9000
94580cc4950Syupeng
94680cc4950Syupeng  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
94780cc4950Syupeng  s.bind(('0.0.0.0', port))
94880cc4950Syupeng  s.listen(1)
94980cc4950Syupeng  sock, addr = s.accept()
95080cc4950Syupeng  while True:
95180cc4950Syupeng      time.sleep(9999999)
95280cc4950Syupeng
95380cc4950SyupengThis python script listen on 9000 port, but doesn't read anything from
95480cc4950Syupengthe connection.
95580cc4950Syupeng
95680cc4950SyupengOn the client side, we send the string "hello" by nc::
95780cc4950Syupeng
95880cc4950Syupeng  nstatuser@nstat-a:~$ echo "hello" | nc nstat-b 9000
95980cc4950Syupeng
96080cc4950SyupengThen, we come back to the server side, the server has received the "hello"
96180cc4950Syupengpacket, and the TCP layer has acked this packet, but the application didn't
96280cc4950Syupengread it yet. We type Ctrl-C to terminate the server script. Then we
96380cc4950Syupengcould find TcpExtTCPAbortOnClose increased 1 on the server side::
96480cc4950Syupeng
96580cc4950Syupeng  nstatuser@nstat-b:~$ nstat | grep -i abort
96680cc4950Syupeng  TcpExtTCPAbortOnClose           1                  0.0
96780cc4950Syupeng
96880cc4950SyupengIf we run tcpdump on the server side, we could find the server sent a
96980cc4950SyupengRST after we type Ctrl-C.
97080cc4950Syupeng
97180cc4950SyupengTcpExtTCPAbortOnMemory and TcpExtTCPAbortOnTimeout
972ae5220c6SRandy Dunlap---------------------------------------------------
97380cc4950SyupengBelow is an example which let the orphan socket count be higher than
97480cc4950Syupengnet.ipv4.tcp_max_orphans.
97580cc4950SyupengChange tcp_max_orphans to a smaller value on client::
97680cc4950Syupeng
97780cc4950Syupeng  sudo bash -c "echo 10 > /proc/sys/net/ipv4/tcp_max_orphans"
97880cc4950Syupeng
97980cc4950SyupengClient code (create 64 connection to server)::
98080cc4950Syupeng
98180cc4950Syupeng  nstatuser@nstat-a:~$ cat client_orphan.py
98280cc4950Syupeng  import socket
98380cc4950Syupeng  import time
98480cc4950Syupeng
98580cc4950Syupeng  server = 'nstat-b' # server address
98680cc4950Syupeng  port = 9000
98780cc4950Syupeng
98880cc4950Syupeng  count = 64
98980cc4950Syupeng
99080cc4950Syupeng  connection_list = []
99180cc4950Syupeng
99280cc4950Syupeng  for i in range(64):
99380cc4950Syupeng      s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
99480cc4950Syupeng      s.connect((server, port))
99580cc4950Syupeng      connection_list.append(s)
99680cc4950Syupeng      print("connection_count: %d" % len(connection_list))
99780cc4950Syupeng
99880cc4950Syupeng  while True:
99980cc4950Syupeng      time.sleep(99999)
100080cc4950Syupeng
100180cc4950SyupengServer code (accept 64 connection from client)::
100280cc4950Syupeng
100380cc4950Syupeng  nstatuser@nstat-b:~$ cat server_orphan.py
100480cc4950Syupeng  import socket
100580cc4950Syupeng  import time
100680cc4950Syupeng
100780cc4950Syupeng  port = 9000
100880cc4950Syupeng  count = 64
100980cc4950Syupeng
101080cc4950Syupeng  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
101180cc4950Syupeng  s.bind(('0.0.0.0', port))
101280cc4950Syupeng  s.listen(count)
101380cc4950Syupeng  connection_list = []
101480cc4950Syupeng  while True:
101580cc4950Syupeng      sock, addr = s.accept()
101680cc4950Syupeng      connection_list.append((sock, addr))
101780cc4950Syupeng      print("connection_count: %d" % len(connection_list))
101880cc4950Syupeng
101980cc4950SyupengRun the python scripts on server and client.
102080cc4950Syupeng
102180cc4950SyupengOn server::
102280cc4950Syupeng
102380cc4950Syupeng  python3 server_orphan.py
102480cc4950Syupeng
102580cc4950SyupengOn client::
102680cc4950Syupeng
102780cc4950Syupeng  python3 client_orphan.py
102880cc4950Syupeng
102980cc4950SyupengRun iptables on server::
103080cc4950Syupeng
103180cc4950Syupeng  sudo iptables -A INPUT -i ens3 -p tcp --destination-port 9000 -j DROP
103280cc4950Syupeng
103380cc4950SyupengType Ctrl-C on client, stop client_orphan.py.
103480cc4950Syupeng
103580cc4950SyupengCheck TcpExtTCPAbortOnMemory on client::
103680cc4950Syupeng
103780cc4950Syupeng  nstatuser@nstat-a:~$ nstat | grep -i abort
103880cc4950Syupeng  TcpExtTCPAbortOnMemory          54                 0.0
103980cc4950Syupeng
104080cc4950SyupengCheck orphane socket count on client::
104180cc4950Syupeng
104280cc4950Syupeng  nstatuser@nstat-a:~$ ss -s
104380cc4950Syupeng  Total: 131 (kernel 0)
104480cc4950Syupeng  TCP:   14 (estab 1, closed 0, orphaned 10, synrecv 0, timewait 0/0), ports 0
104580cc4950Syupeng
104680cc4950Syupeng  Transport Total     IP        IPv6
104780cc4950Syupeng  *         0         -         -
104880cc4950Syupeng  RAW       1         0         1
104980cc4950Syupeng  UDP       1         1         0
105080cc4950Syupeng  TCP       14        13        1
105180cc4950Syupeng  INET      16        14        2
105280cc4950Syupeng  FRAG      0         0         0
105380cc4950Syupeng
105480cc4950SyupengThe explanation of the test: after run server_orphan.py and
105580cc4950Syupengclient_orphan.py, we set up 64 connections between server and
105680cc4950Syupengclient. Run the iptables command, the server will drop all packets from
105780cc4950Syupengthe client, type Ctrl-C on client_orphan.py, the system of the client
105880cc4950Syupengwould try to close these connections, and before they are closed
105980cc4950Syupenggracefully, these connections became orphan sockets. As the iptables
106080cc4950Syupengof the server blocked packets from the client, the server won't receive fin
106180cc4950Syupengfrom the client, so all connection on clients would be stuck on FIN_WAIT_1
106280cc4950Syupengstage, so they will keep as orphan sockets until timeout. We have echo
106380cc4950Syupeng10 to /proc/sys/net/ipv4/tcp_max_orphans, so the client system would
106480cc4950Syupengonly keep 10 orphan sockets, for all other orphan sockets, the client
106580cc4950Syupengsystem sent RST for them and delete them. We have 64 connections, so
106680cc4950Syupengthe 'ss -s' command shows the system has 10 orphan sockets, and the
106780cc4950Syupengvalue of TcpExtTCPAbortOnMemory was 54.
106880cc4950Syupeng
106980cc4950SyupengAn additional explanation about orphan socket count: You could find the
107080cc4950Syupengexactly orphan socket count by the 'ss -s' command, but when kernel
107180cc4950Syupengdecide whither increases TcpExtTCPAbortOnMemory and sends RST, kernel
107280cc4950Syupengdoesn't always check the exactly orphan socket count. For increasing
107380cc4950Syupengperformance, kernel checks an approximate count firstly, if the
107480cc4950Syupengapproximate count is more than tcp_max_orphans, kernel checks the
107580cc4950Syupengexact count again. So if the approximate count is less than
107680cc4950Syupengtcp_max_orphans, but exactly count is more than tcp_max_orphans, you
107780cc4950Syupengwould find TcpExtTCPAbortOnMemory is not increased at all. If
107880cc4950Syupengtcp_max_orphans is large enough, it won't occur, but if you decrease
107980cc4950Syupengtcp_max_orphans to a small value like our test, you might find this
108080cc4950Syupengissue. So in our test, the client set up 64 connections although the
108180cc4950Syupengtcp_max_orphans is 10. If the client only set up 11 connections, we
108280cc4950Syupengcan't find the change of TcpExtTCPAbortOnMemory.
108380cc4950Syupeng
108480cc4950SyupengContinue the previous test, we wait for several minutes. Because of the
108580cc4950Syupengiptables on the server blocked the traffic, the server wouldn't receive
108680cc4950Syupengfin, and all the client's orphan sockets would timeout on the
108780cc4950SyupengFIN_WAIT_1 state finally. So we wait for a few minutes, we could find
108880cc4950Syupeng10 timeout on the client::
108980cc4950Syupeng
109080cc4950Syupeng  nstatuser@nstat-a:~$ nstat | grep -i abort
109180cc4950Syupeng  TcpExtTCPAbortOnTimeout         10                 0.0
109280cc4950Syupeng
109380cc4950SyupengTcpExtTCPAbortOnLinger
1094ae5220c6SRandy Dunlap----------------------
109580cc4950SyupengThe server side code::
109680cc4950Syupeng
109780cc4950Syupeng  nstatuser@nstat-b:~$ cat server_linger.py
109880cc4950Syupeng  import socket
109980cc4950Syupeng  import time
110080cc4950Syupeng
110180cc4950Syupeng  port = 9000
110280cc4950Syupeng
110380cc4950Syupeng  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
110480cc4950Syupeng  s.bind(('0.0.0.0', port))
110580cc4950Syupeng  s.listen(1)
110680cc4950Syupeng  sock, addr = s.accept()
110780cc4950Syupeng  while True:
110880cc4950Syupeng      time.sleep(9999999)
110980cc4950Syupeng
111080cc4950SyupengThe client side code::
111180cc4950Syupeng
111280cc4950Syupeng  nstatuser@nstat-a:~$ cat client_linger.py
111380cc4950Syupeng  import socket
111480cc4950Syupeng  import struct
111580cc4950Syupeng
111680cc4950Syupeng  server = 'nstat-b' # server address
111780cc4950Syupeng  port = 9000
111880cc4950Syupeng
111980cc4950Syupeng  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
112080cc4950Syupeng  s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 10))
112180cc4950Syupeng  s.setsockopt(socket.SOL_TCP, socket.TCP_LINGER2, struct.pack('i', -1))
112280cc4950Syupeng  s.connect((server, port))
112380cc4950Syupeng  s.close()
112480cc4950Syupeng
112580cc4950SyupengRun server_linger.py on server::
112680cc4950Syupeng
112780cc4950Syupeng  nstatuser@nstat-b:~$ python3 server_linger.py
112880cc4950Syupeng
112980cc4950SyupengRun client_linger.py on client::
113080cc4950Syupeng
113180cc4950Syupeng  nstatuser@nstat-a:~$ python3 client_linger.py
113280cc4950Syupeng
113380cc4950SyupengAfter run client_linger.py, check the output of nstat::
113480cc4950Syupeng
113580cc4950Syupeng  nstatuser@nstat-a:~$ nstat | grep -i abort
113680cc4950Syupeng  TcpExtTCPAbortOnLinger          1                  0.0
1137712ee16cSyupeng
1138712ee16cSyupengTcpExtTCPRcvCoalesce
1139ae5220c6SRandy Dunlap--------------------
1140712ee16cSyupengOn the server, we run a program which listen on TCP port 9000, but
1141712ee16cSyupengdoesn't read any data::
1142712ee16cSyupeng
1143712ee16cSyupeng  import socket
1144712ee16cSyupeng  import time
1145712ee16cSyupeng  port = 9000
1146712ee16cSyupeng  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
1147712ee16cSyupeng  s.bind(('0.0.0.0', port))
1148712ee16cSyupeng  s.listen(1)
1149712ee16cSyupeng  sock, addr = s.accept()
1150712ee16cSyupeng  while True:
1151712ee16cSyupeng      time.sleep(9999999)
1152712ee16cSyupeng
1153712ee16cSyupengSave the above code as server_coalesce.py, and run::
1154712ee16cSyupeng
1155712ee16cSyupeng  python3 server_coalesce.py
1156712ee16cSyupeng
1157712ee16cSyupengOn the client, save below code as client_coalesce.py::
1158712ee16cSyupeng
1159712ee16cSyupeng  import socket
1160712ee16cSyupeng  server = 'nstat-b'
1161712ee16cSyupeng  port = 9000
1162712ee16cSyupeng  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
1163712ee16cSyupeng  s.connect((server, port))
1164712ee16cSyupeng
1165712ee16cSyupengRun::
1166712ee16cSyupeng
1167712ee16cSyupeng  nstatuser@nstat-a:~$ python3 -i client_coalesce.py
1168712ee16cSyupeng
1169712ee16cSyupengWe use '-i' to come into the interactive mode, then a packet::
1170712ee16cSyupeng
1171712ee16cSyupeng  >>> s.send(b'foo')
1172712ee16cSyupeng  3
1173712ee16cSyupeng
1174712ee16cSyupengSend a packet again::
1175712ee16cSyupeng
1176712ee16cSyupeng  >>> s.send(b'bar')
1177712ee16cSyupeng  3
1178712ee16cSyupeng
1179712ee16cSyupengOn the server, run nstat::
1180712ee16cSyupeng
1181712ee16cSyupeng  ubuntu@nstat-b:~$ nstat
1182712ee16cSyupeng  #kernel
1183712ee16cSyupeng  IpInReceives                    2                  0.0
1184712ee16cSyupeng  IpInDelivers                    2                  0.0
1185712ee16cSyupeng  IpOutRequests                   2                  0.0
1186712ee16cSyupeng  TcpInSegs                       2                  0.0
1187712ee16cSyupeng  TcpOutSegs                      2                  0.0
1188712ee16cSyupeng  TcpExtTCPRcvCoalesce            1                  0.0
1189712ee16cSyupeng  IpExtInOctets                   110                0.0
1190712ee16cSyupeng  IpExtOutOctets                  104                0.0
1191712ee16cSyupeng  IpExtInNoECTPkts                2                  0.0
1192712ee16cSyupeng
1193712ee16cSyupengThe client sent two packets, server didn't read any data. When
1194712ee16cSyupengthe second packet arrived at server, the first packet was still in
1195712ee16cSyupengthe receiving queue. So the TCP layer merged the two packets, and we
1196712ee16cSyupengcould find the TcpExtTCPRcvCoalesce increased 1.
1197712ee16cSyupeng
1198712ee16cSyupengTcpExtListenOverflows and TcpExtListenDrops
1199ae5220c6SRandy Dunlap-------------------------------------------
1200712ee16cSyupengOn server, run the nc command, listen on port 9000::
1201712ee16cSyupeng
1202712ee16cSyupeng  nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
1203712ee16cSyupeng  Listening on [0.0.0.0] (family 0, port 9000)
1204712ee16cSyupeng
1205712ee16cSyupengOn client, run 3 nc commands in different terminals::
1206712ee16cSyupeng
1207712ee16cSyupeng  nstatuser@nstat-a:~$ nc -v nstat-b 9000
1208712ee16cSyupeng  Connection to nstat-b 9000 port [tcp/*] succeeded!
1209712ee16cSyupeng
1210712ee16cSyupengThe nc command only accepts 1 connection, and the accept queue length
1211712ee16cSyupengis 1. On current linux implementation, set queue length to n means the
1212712ee16cSyupengactual queue length is n+1. Now we create 3 connections, 1 is accepted
1213712ee16cSyupengby nc, 2 in accepted queue, so the accept queue is full.
1214712ee16cSyupeng
1215712ee16cSyupengBefore running the 4th nc, we clean the nstat history on the server::
1216712ee16cSyupeng
1217712ee16cSyupeng  nstatuser@nstat-b:~$ nstat -n
1218712ee16cSyupeng
1219712ee16cSyupengRun the 4th nc on the client::
1220712ee16cSyupeng
1221712ee16cSyupeng  nstatuser@nstat-a:~$ nc -v nstat-b 9000
1222712ee16cSyupeng
1223712ee16cSyupengIf the nc server is running on kernel 4.10 or higher version, you
1224712ee16cSyupengwon't see the "Connection to ... succeeded!" string, because kernel
1225712ee16cSyupengwill drop the SYN if the accept queue is full. If the nc client is running
1226712ee16cSyupengon an old kernel, you would see that the connection is succeeded,
1227712ee16cSyupengbecause kernel would complete the 3 way handshake and keep the socket
1228712ee16cSyupengon half open queue. I did the test on kernel 4.15. Below is the nstat
1229712ee16cSyupengon the server::
1230712ee16cSyupeng
1231712ee16cSyupeng  nstatuser@nstat-b:~$ nstat
1232712ee16cSyupeng  #kernel
1233712ee16cSyupeng  IpInReceives                    4                  0.0
1234712ee16cSyupeng  IpInDelivers                    4                  0.0
1235712ee16cSyupeng  TcpInSegs                       4                  0.0
1236712ee16cSyupeng  TcpExtListenOverflows           4                  0.0
1237712ee16cSyupeng  TcpExtListenDrops               4                  0.0
1238712ee16cSyupeng  IpExtInOctets                   240                0.0
1239712ee16cSyupeng  IpExtInNoECTPkts                4                  0.0
1240712ee16cSyupeng
1241712ee16cSyupengBoth TcpExtListenOverflows and TcpExtListenDrops were 4. If the time
1242712ee16cSyupengbetween the 4th nc and the nstat was longer, the value of
1243712ee16cSyupengTcpExtListenOverflows and TcpExtListenDrops would be larger, because
1244712ee16cSyupengthe SYN of the 4th nc was dropped, the client was retrying.
12458e2ea53aSyupeng
12468e2ea53aSyupengIpInAddrErrors, IpExtInNoRoutes and IpOutNoRoutes
1247ae5220c6SRandy Dunlap-------------------------------------------------
12488e2ea53aSyupengserver A IP address: 192.168.122.250
12498e2ea53aSyupengserver B IP address: 192.168.122.251
12508e2ea53aSyupengPrepare on server A, add a route to server B::
12518e2ea53aSyupeng
12528e2ea53aSyupeng  $ sudo ip route add 8.8.8.8/32 via 192.168.122.251
12538e2ea53aSyupeng
12548e2ea53aSyupengPrepare on server B, disable send_redirects for all interfaces::
12558e2ea53aSyupeng
12568e2ea53aSyupeng  $ sudo sysctl -w net.ipv4.conf.all.send_redirects=0
12578e2ea53aSyupeng  $ sudo sysctl -w net.ipv4.conf.ens3.send_redirects=0
12588e2ea53aSyupeng  $ sudo sysctl -w net.ipv4.conf.lo.send_redirects=0
12598e2ea53aSyupeng  $ sudo sysctl -w net.ipv4.conf.default.send_redirects=0
12608e2ea53aSyupeng
12618e2ea53aSyupengWe want to let sever A send a packet to 8.8.8.8, and route the packet
12628e2ea53aSyupengto server B. When server B receives such packet, it might send a ICMP
12638e2ea53aSyupengRedirect message to server A, set send_redirects to 0 will disable
12648e2ea53aSyupengthis behavior.
12658e2ea53aSyupeng
12668e2ea53aSyupengFirst, generate InAddrErrors. On server B, we disable IP forwarding::
12678e2ea53aSyupeng
12688e2ea53aSyupeng  $ sudo sysctl -w net.ipv4.conf.all.forwarding=0
12698e2ea53aSyupeng
12708e2ea53aSyupengOn server A, we send packets to 8.8.8.8::
12718e2ea53aSyupeng
12728e2ea53aSyupeng  $ nc -v 8.8.8.8 53
12738e2ea53aSyupeng
12748e2ea53aSyupengOn server B, we check the output of nstat::
12758e2ea53aSyupeng
12768e2ea53aSyupeng  $ nstat
12778e2ea53aSyupeng  #kernel
12788e2ea53aSyupeng  IpInReceives                    3                  0.0
12798e2ea53aSyupeng  IpInAddrErrors                  3                  0.0
12808e2ea53aSyupeng  IpExtInOctets                   180                0.0
12818e2ea53aSyupeng  IpExtInNoECTPkts                3                  0.0
12828e2ea53aSyupeng
12838e2ea53aSyupengAs we have let server A route 8.8.8.8 to server B, and we disabled IP
12848e2ea53aSyupengforwarding on server B, Server A sent packets to server B, then server B
12858e2ea53aSyupengdropped packets and increased IpInAddrErrors. As the nc command would
12868e2ea53aSyupengre-send the SYN packet if it didn't receive a SYN+ACK, we could find
12878e2ea53aSyupengmultiple IpInAddrErrors.
12888e2ea53aSyupeng
12898e2ea53aSyupengSecond, generate IpExtInNoRoutes. On server B, we enable IP
12908e2ea53aSyupengforwarding::
12918e2ea53aSyupeng
12928e2ea53aSyupeng  $ sudo sysctl -w net.ipv4.conf.all.forwarding=1
12938e2ea53aSyupeng
12948e2ea53aSyupengCheck the route table of server B and remove the default route::
12958e2ea53aSyupeng
12968e2ea53aSyupeng  $ ip route show
12978e2ea53aSyupeng  default via 192.168.122.1 dev ens3 proto static
12988e2ea53aSyupeng  192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.251
12998e2ea53aSyupeng  $ sudo ip route delete default via 192.168.122.1 dev ens3 proto static
13008e2ea53aSyupeng
13018e2ea53aSyupengOn server A, we contact 8.8.8.8 again::
13028e2ea53aSyupeng
13038e2ea53aSyupeng  $ nc -v 8.8.8.8 53
13048e2ea53aSyupeng  nc: connect to 8.8.8.8 port 53 (tcp) failed: Network is unreachable
13058e2ea53aSyupeng
13068e2ea53aSyupengOn server B, run nstat::
13078e2ea53aSyupeng
13088e2ea53aSyupeng  $ nstat
13098e2ea53aSyupeng  #kernel
13108e2ea53aSyupeng  IpInReceives                    1                  0.0
13118e2ea53aSyupeng  IpOutRequests                   1                  0.0
13128e2ea53aSyupeng  IcmpOutMsgs                     1                  0.0
13138e2ea53aSyupeng  IcmpOutDestUnreachs             1                  0.0
13148e2ea53aSyupeng  IcmpMsgOutType3                 1                  0.0
13158e2ea53aSyupeng  IpExtInNoRoutes                 1                  0.0
13168e2ea53aSyupeng  IpExtInOctets                   60                 0.0
13178e2ea53aSyupeng  IpExtOutOctets                  88                 0.0
13188e2ea53aSyupeng  IpExtInNoECTPkts                1                  0.0
13198e2ea53aSyupeng
13208e2ea53aSyupengWe enabled IP forwarding on server B, when server B received a packet
13218e2ea53aSyupengwhich destination IP address is 8.8.8.8, server B will try to forward
13228e2ea53aSyupengthis packet. We have deleted the default route, there was no route for
13238e2ea53aSyupeng8.8.8.8, so server B increase IpExtInNoRoutes and sent the "ICMP
13248e2ea53aSyupengDestination Unreachable" message to server A.
13258e2ea53aSyupeng
13268e2ea53aSyupengThird, generate IpOutNoRoutes. Run ping command on server B::
13278e2ea53aSyupeng
13288e2ea53aSyupeng  $ ping -c 1 8.8.8.8
13298e2ea53aSyupeng  connect: Network is unreachable
13308e2ea53aSyupeng
13318e2ea53aSyupengRun nstat on server B::
13328e2ea53aSyupeng
13338e2ea53aSyupeng  $ nstat
13348e2ea53aSyupeng  #kernel
13358e2ea53aSyupeng  IpOutNoRoutes                   1                  0.0
13368e2ea53aSyupeng
13378e2ea53aSyupengWe have deleted the default route on server B. Server B couldn't find
13388e2ea53aSyupenga route for the 8.8.8.8 IP address, so server B increased
13398e2ea53aSyupengIpOutNoRoutes.
13402b965472Syupeng
13412b965472SyupengTcpExtTCPACKSkippedSynRecv
1342ae5220c6SRandy Dunlap--------------------------
13432b965472SyupengIn this test, we send 3 same SYN packets from client to server. The
13442b965472Syupengfirst SYN will let server create a socket, set it to Syn-Recv status,
13452b965472Syupengand reply a SYN/ACK. The second SYN will let server reply the SYN/ACK
13462b965472Syupengagain, and record the reply time (the duplicate ACK reply time). The
13472b965472Syupengthird SYN will let server check the previous duplicate ACK reply time,
13482b965472Syupengand decide to skip the duplicate ACK, then increase the
13492b965472SyupengTcpExtTCPACKSkippedSynRecv counter.
13502b965472Syupeng
13512b965472SyupengRun tcpdump to capture a SYN packet::
13522b965472Syupeng
13532b965472Syupeng  nstatuser@nstat-a:~$ sudo tcpdump -c 1 -w /tmp/syn.pcap port 9000
13542b965472Syupeng  tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
13552b965472Syupeng
13562b965472SyupengOpen another terminal, run nc command::
13572b965472Syupeng
13582b965472Syupeng  nstatuser@nstat-a:~$ nc nstat-b 9000
13592b965472Syupeng
13602b965472SyupengAs the nstat-b didn't listen on port 9000, it should reply a RST, and
13612b965472Syupengthe nc command exited immediately. It was enough for the tcpdump
13622b965472Syupengcommand to capture a SYN packet. A linux server might use hardware
13632b965472Syupengoffload for the TCP checksum, so the checksum in the /tmp/syn.pcap
13642b965472Syupengmight be not correct. We call tcprewrite to fix it::
13652b965472Syupeng
13662b965472Syupeng  nstatuser@nstat-a:~$ tcprewrite --infile=/tmp/syn.pcap --outfile=/tmp/syn_fixcsum.pcap --fixcsum
13672b965472Syupeng
13682b965472SyupengOn nstat-b, we run nc to listen on port 9000::
13692b965472Syupeng
13702b965472Syupeng  nstatuser@nstat-b:~$ nc -lkv 9000
13712b965472Syupeng  Listening on [0.0.0.0] (family 0, port 9000)
13722b965472Syupeng
13732b965472SyupengOn nstat-a, we blocked the packet from port 9000, or nstat-a would send
13742b965472SyupengRST to nstat-b::
13752b965472Syupeng
13762b965472Syupeng  nstatuser@nstat-a:~$ sudo iptables -A INPUT -p tcp --sport 9000 -j DROP
13772b965472Syupeng
13782b965472SyupengSend 3 SYN repeatly to nstat-b::
13792b965472Syupeng
13802b965472Syupeng  nstatuser@nstat-a:~$ for i in {1..3}; do sudo tcpreplay -i ens3 /tmp/syn_fixcsum.pcap; done
13812b965472Syupeng
13822b965472SyupengCheck snmp cunter on nstat-b::
13832b965472Syupeng
13842b965472Syupeng  nstatuser@nstat-b:~$ nstat | grep -i skip
13852b965472Syupeng  TcpExtTCPACKSkippedSynRecv      1                  0.0
13862b965472Syupeng
13872b965472SyupengAs we expected, TcpExtTCPACKSkippedSynRecv is 1.
13882b965472Syupeng
13892b965472SyupengTcpExtTCPACKSkippedPAWS
1390ae5220c6SRandy Dunlap-----------------------
13912b965472SyupengTo trigger PAWS, we could send an old SYN.
13922b965472Syupeng
13932b965472SyupengOn nstat-b, let nc listen on port 9000::
13942b965472Syupeng
13952b965472Syupeng  nstatuser@nstat-b:~$ nc -lkv 9000
13962b965472Syupeng  Listening on [0.0.0.0] (family 0, port 9000)
13972b965472Syupeng
13982b965472SyupengOn nstat-a, run tcpdump to capture a SYN::
13992b965472Syupeng
14002b965472Syupeng  nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/paws_pre.pcap -c 1 port 9000
14012b965472Syupeng  tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
14022b965472Syupeng
14032b965472SyupengOn nstat-a, run nc as a client to connect nstat-b::
14042b965472Syupeng
14052b965472Syupeng  nstatuser@nstat-a:~$ nc -v nstat-b 9000
14062b965472Syupeng  Connection to nstat-b 9000 port [tcp/*] succeeded!
14072b965472Syupeng
14082b965472SyupengNow the tcpdump has captured the SYN and exit. We should fix the
14092b965472Syupengchecksum::
14102b965472Syupeng
14112b965472Syupeng  nstatuser@nstat-a:~$ tcprewrite --infile /tmp/paws_pre.pcap --outfile /tmp/paws.pcap --fixcsum
14122b965472Syupeng
14132b965472SyupengSend the SYN packet twice::
14142b965472Syupeng
14152b965472Syupeng  nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/paws.pcap; done
14162b965472Syupeng
14172b965472SyupengOn nstat-b, check the snmp counter::
14182b965472Syupeng
14192b965472Syupeng  nstatuser@nstat-b:~$ nstat | grep -i skip
14202b965472Syupeng  TcpExtTCPACKSkippedPAWS         1                  0.0
14212b965472Syupeng
14222b965472SyupengWe sent two SYN via tcpreplay, both of them would let PAWS check
14232b965472Syupengfailed, the nstat-b replied an ACK for the first SYN, skipped the ACK
14242b965472Syupengfor the second SYN, and updated TcpExtTCPACKSkippedPAWS.
14252b965472Syupeng
14262b965472SyupengTcpExtTCPACKSkippedSeq
1427ae5220c6SRandy Dunlap----------------------
14282b965472SyupengTo trigger TcpExtTCPACKSkippedSeq, we send packets which have valid
14292b965472Syupengtimestamp (to pass PAWS check) but the sequence number is out of
14302b965472Syupengwindow. The linux TCP stack would avoid to skip if the packet has
14312b965472Syupengdata, so we need a pure ACK packet. To generate such a packet, we
14322b965472Syupengcould create two sockets: one on port 9000, another on port 9001. Then
14332b965472Syupengwe capture an ACK on port 9001, change the source/destination port
14342b965472Syupengnumbers to match the port 9000 socket. Then we could trigger
14352b965472SyupengTcpExtTCPACKSkippedSeq via this packet.
14362b965472Syupeng
14372b965472SyupengOn nstat-b, open two terminals, run two nc commands to listen on both
14382b965472Syupengport 9000 and port 9001::
14392b965472Syupeng
14402b965472Syupeng  nstatuser@nstat-b:~$ nc -lkv 9000
14412b965472Syupeng  Listening on [0.0.0.0] (family 0, port 9000)
14422b965472Syupeng
14432b965472Syupeng  nstatuser@nstat-b:~$ nc -lkv 9001
14442b965472Syupeng  Listening on [0.0.0.0] (family 0, port 9001)
14452b965472Syupeng
14462b965472SyupengOn nstat-a, run two nc clients::
14472b965472Syupeng
14482b965472Syupeng  nstatuser@nstat-a:~$ nc -v nstat-b 9000
14492b965472Syupeng  Connection to nstat-b 9000 port [tcp/*] succeeded!
14502b965472Syupeng
14512b965472Syupeng  nstatuser@nstat-a:~$ nc -v nstat-b 9001
14522b965472Syupeng  Connection to nstat-b 9001 port [tcp/*] succeeded!
14532b965472Syupeng
14542b965472SyupengOn nstat-a, run tcpdump to capture an ACK::
14552b965472Syupeng
14562b965472Syupeng  nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/seq_pre.pcap -c 1 dst port 9001
14572b965472Syupeng  tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
14582b965472Syupeng
14592b965472SyupengOn nstat-b, send a packet via the port 9001 socket. E.g. we sent a
14602b965472Syupengstring 'foo' in our example::
14612b965472Syupeng
14622b965472Syupeng  nstatuser@nstat-b:~$ nc -lkv 9001
14632b965472Syupeng  Listening on [0.0.0.0] (family 0, port 9001)
14642b965472Syupeng  Connection from nstat-a 42132 received!
14652b965472Syupeng  foo
14662b965472Syupeng
14672b965472SyupengOn nstat-a, the tcpdump should have caputred the ACK. We should check
14682b965472Syupengthe source port numbers of the two nc clients::
14692b965472Syupeng
14702b965472Syupeng  nstatuser@nstat-a:~$ ss -ta '( dport = :9000 || dport = :9001 )' | tee
14712b965472Syupeng  State  Recv-Q   Send-Q         Local Address:Port           Peer Address:Port
14722b965472Syupeng  ESTAB  0        0            192.168.122.250:50208       192.168.122.251:9000
14732b965472Syupeng  ESTAB  0        0            192.168.122.250:42132       192.168.122.251:9001
14742b965472Syupeng
14752b965472SyupengRun tcprewrite, change port 9001 to port 9000, chagne port 42132 to
14762b965472Syupengport 50208::
14772b965472Syupeng
14782b965472Syupeng  nstatuser@nstat-a:~$ tcprewrite --infile /tmp/seq_pre.pcap --outfile /tmp/seq.pcap -r 9001:9000 -r 42132:50208 --fixcsum
14792b965472Syupeng
14802b965472SyupengNow the /tmp/seq.pcap is the packet we need. Send it to nstat-b::
14812b965472Syupeng
14822b965472Syupeng  nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/seq.pcap; done
14832b965472Syupeng
14842b965472SyupengCheck TcpExtTCPACKSkippedSeq on nstat-b::
14852b965472Syupeng
14862b965472Syupeng  nstatuser@nstat-b:~$ nstat | grep -i skip
14872b965472Syupeng  TcpExtTCPACKSkippedSeq          1                  0.0
1488