1===========
2SNMP counter
3===========
4
5This document explains the meaning of SNMP counters.
6
7General IPv4 counters
8====================
9All layer 4 packets and ICMP packets will change these counters, but
10these counters won't be changed by layer 2 packets (such as STP) or
11ARP packets.
12
13* IpInReceives
14Defined in `RFC1213 ipInReceives`_
15
16.. _RFC1213 ipInReceives: https://tools.ietf.org/html/rfc1213#page-26
17
18The number of packets received by the IP layer. It gets increasing at the
19beginning of ip_rcv function, always be updated together with
20IpExtInOctets. It indicates the number of aggregated segments after
21GRO/LRO.
22
23* IpInDelivers
24Defined in `RFC1213 ipInDelivers`_
25
26.. _RFC1213 ipInDelivers: https://tools.ietf.org/html/rfc1213#page-28
27
28The number of packets delivers to the upper layer protocols. E.g. TCP, UDP,
29ICMP and so on. If no one listens on a raw socket, only kernel
30supported protocols will be delivered, if someone listens on the raw
31socket, all valid IP packets will be delivered.
32
33* IpOutRequests
34Defined in `RFC1213 ipOutRequests`_
35
36.. _RFC1213 ipOutRequests: https://tools.ietf.org/html/rfc1213#page-28
37
38The number of packets sent via IP layer, for both single cast and
39multicast packets, and would always be updated together with
40IpExtOutOctets.
41
42* IpExtInOctets and IpExtOutOctets
43They are Linux kernel extensions, no RFC definitions. Please note,
44RFC1213 indeed defines ifInOctets  and ifOutOctets, but they
45are different things. The ifInOctets and ifOutOctets include the MAC
46layer header size but IpExtInOctets and IpExtOutOctets don't, they
47only include the IP layer header and the IP layer data.
48
49* IpExtInNoECTPkts, IpExtInECT1Pkts, IpExtInECT0Pkts, IpExtInCEPkts
50They indicate the number of four kinds of ECN IP packets, please refer
51`Explicit Congestion Notification`_ for more details.
52
53.. _Explicit Congestion Notification: https://tools.ietf.org/html/rfc3168#page-6
54
55These 4 counters calculate how many packets received per ECN
56status. They count the real frame number regardless the LRO/GRO. So
57for the same packet, you might find that IpInReceives count 1, but
58IpExtInNoECTPkts counts 2 or more.
59
60ICMP counters
61============
62* IcmpInMsgs and IcmpOutMsgs
63Defined by `RFC1213 icmpInMsgs`_ and `RFC1213 icmpOutMsgs`_
64
65.. _RFC1213 icmpInMsgs: https://tools.ietf.org/html/rfc1213#page-41
66.. _RFC1213 icmpOutMsgs: https://tools.ietf.org/html/rfc1213#page-43
67
68As mentioned in the RFC1213, these two counters include errors, they
69would be increased even if the ICMP packet has an invalid type. The
70ICMP output path will check the header of a raw socket, so the
71IcmpOutMsgs would still be updated if the IP header is constructed by
72a userspace program.
73
74* ICMP named types
75| These counters include most of common ICMP types, they are:
76| IcmpInDestUnreachs: `RFC1213 icmpInDestUnreachs`_
77| IcmpInTimeExcds: `RFC1213 icmpInTimeExcds`_
78| IcmpInParmProbs: `RFC1213 icmpInParmProbs`_
79| IcmpInSrcQuenchs: `RFC1213 icmpInSrcQuenchs`_
80| IcmpInRedirects: `RFC1213 icmpInRedirects`_
81| IcmpInEchos: `RFC1213 icmpInEchos`_
82| IcmpInEchoReps: `RFC1213 icmpInEchoReps`_
83| IcmpInTimestamps: `RFC1213 icmpInTimestamps`_
84| IcmpInTimestampReps: `RFC1213 icmpInTimestampReps`_
85| IcmpInAddrMasks: `RFC1213 icmpInAddrMasks`_
86| IcmpInAddrMaskReps: `RFC1213 icmpInAddrMaskReps`_
87| IcmpOutDestUnreachs: `RFC1213 icmpOutDestUnreachs`_
88| IcmpOutTimeExcds: `RFC1213 icmpOutTimeExcds`_
89| IcmpOutParmProbs: `RFC1213 icmpOutParmProbs`_
90| IcmpOutSrcQuenchs: `RFC1213 icmpOutSrcQuenchs`_
91| IcmpOutRedirects: `RFC1213 icmpOutRedirects`_
92| IcmpOutEchos: `RFC1213 icmpOutEchos`_
93| IcmpOutEchoReps: `RFC1213 icmpOutEchoReps`_
94| IcmpOutTimestamps: `RFC1213 icmpOutTimestamps`_
95| IcmpOutTimestampReps: `RFC1213 icmpOutTimestampReps`_
96| IcmpOutAddrMasks: `RFC1213 icmpOutAddrMasks`_
97| IcmpOutAddrMaskReps: `RFC1213 icmpOutAddrMaskReps`_
98
99.. _RFC1213 icmpInDestUnreachs: https://tools.ietf.org/html/rfc1213#page-41
100.. _RFC1213 icmpInTimeExcds: https://tools.ietf.org/html/rfc1213#page-41
101.. _RFC1213 icmpInParmProbs: https://tools.ietf.org/html/rfc1213#page-42
102.. _RFC1213 icmpInSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-42
103.. _RFC1213 icmpInRedirects: https://tools.ietf.org/html/rfc1213#page-42
104.. _RFC1213 icmpInEchos: https://tools.ietf.org/html/rfc1213#page-42
105.. _RFC1213 icmpInEchoReps: https://tools.ietf.org/html/rfc1213#page-42
106.. _RFC1213 icmpInTimestamps: https://tools.ietf.org/html/rfc1213#page-42
107.. _RFC1213 icmpInTimestampReps: https://tools.ietf.org/html/rfc1213#page-43
108.. _RFC1213 icmpInAddrMasks: https://tools.ietf.org/html/rfc1213#page-43
109.. _RFC1213 icmpInAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-43
110
111.. _RFC1213 icmpOutDestUnreachs: https://tools.ietf.org/html/rfc1213#page-44
112.. _RFC1213 icmpOutTimeExcds: https://tools.ietf.org/html/rfc1213#page-44
113.. _RFC1213 icmpOutParmProbs: https://tools.ietf.org/html/rfc1213#page-44
114.. _RFC1213 icmpOutSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-44
115.. _RFC1213 icmpOutRedirects: https://tools.ietf.org/html/rfc1213#page-44
116.. _RFC1213 icmpOutEchos: https://tools.ietf.org/html/rfc1213#page-45
117.. _RFC1213 icmpOutEchoReps: https://tools.ietf.org/html/rfc1213#page-45
118.. _RFC1213 icmpOutTimestamps: https://tools.ietf.org/html/rfc1213#page-45
119.. _RFC1213 icmpOutTimestampReps: https://tools.ietf.org/html/rfc1213#page-45
120.. _RFC1213 icmpOutAddrMasks: https://tools.ietf.org/html/rfc1213#page-45
121.. _RFC1213 icmpOutAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-46
122
123Every ICMP type has two counters: 'In' and 'Out'. E.g., for the ICMP
124Echo packet, they are IcmpInEchos and IcmpOutEchos. Their meanings are
125straightforward. The 'In' counter means kernel receives such a packet
126and the 'Out' counter means kernel sends such a packet.
127
128* ICMP numeric types
129They are IcmpMsgInType[N] and IcmpMsgOutType[N], the [N] indicates the
130ICMP type number. These counters track all kinds of ICMP packets. The
131ICMP type number definition could be found in the `ICMP parameters`_
132document.
133
134.. _ICMP parameters: https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml
135
136For example, if the Linux kernel sends an ICMP Echo packet, the
137IcmpMsgOutType8 would increase 1. And if kernel gets an ICMP Echo Reply
138packet, IcmpMsgInType0 would increase 1.
139
140* IcmpInCsumErrors
141This counter indicates the checksum of the ICMP packet is
142wrong. Kernel verifies the checksum after updating the IcmpInMsgs and
143before updating IcmpMsgInType[N]. If a packet has bad checksum, the
144IcmpInMsgs would be updated but none of IcmpMsgInType[N] would be updated.
145
146* IcmpInErrors and IcmpOutErrors
147Defined by `RFC1213 icmpInErrors`_ and `RFC1213 icmpOutErrors`_
148
149.. _RFC1213 icmpInErrors: https://tools.ietf.org/html/rfc1213#page-41
150.. _RFC1213 icmpOutErrors: https://tools.ietf.org/html/rfc1213#page-43
151
152When an error occurs in the ICMP packet handler path, these two
153counters would be updated. The receiving packet path use IcmpInErrors
154and the sending packet path use IcmpOutErrors. When IcmpInCsumErrors
155is increased, IcmpInErrors would always be increased too.
156
157relationship of the ICMP counters
158-------------------------------
159The sum of IcmpMsgOutType[N] is always equal to IcmpOutMsgs, as they
160are updated at the same time. The sum of IcmpMsgInType[N] plus
161IcmpInErrors should be equal or larger than IcmpInMsgs. When kernel
162receives an ICMP packet, kernel follows below logic:
163
1641. increase IcmpInMsgs
1652. if has any error, update IcmpInErrors and finish the process
1663. update IcmpMsgOutType[N]
1674. handle the packet depending on the type, if has any error, update
168   IcmpInErrors and finish the process
169
170So if all errors occur in step (2), IcmpInMsgs should be equal to the
171sum of IcmpMsgOutType[N] plus IcmpInErrors. If all errors occur in
172step (4), IcmpInMsgs should be equal to the sum of
173IcmpMsgOutType[N]. If the errors occur in both step (2) and step (4),
174IcmpInMsgs should be less than the sum of IcmpMsgOutType[N] plus
175IcmpInErrors.
176
177General TCP counters
178==================
179* TcpInSegs
180Defined in `RFC1213 tcpInSegs`_
181
182.. _RFC1213 tcpInSegs: https://tools.ietf.org/html/rfc1213#page-48
183
184The number of packets received by the TCP layer. As mentioned in
185RFC1213, it includes the packets received in error, such as checksum
186error, invalid TCP header and so on. Only one error won't be included:
187if the layer 2 destination address is not the NIC's layer 2
188address. It might happen if the packet is a multicast or broadcast
189packet, or the NIC is in promiscuous mode. In these situations, the
190packets would be delivered to the TCP layer, but the TCP layer will discard
191these packets before increasing TcpInSegs. The TcpInSegs counter
192isn't aware of GRO. So if two packets are merged by GRO, the TcpInSegs
193counter would only increase 1.
194
195* TcpOutSegs
196Defined in `RFC1213 tcpOutSegs`_
197
198.. _RFC1213 tcpOutSegs: https://tools.ietf.org/html/rfc1213#page-48
199
200The number of packets sent by the TCP layer. As mentioned in RFC1213,
201it excludes the retransmitted packets. But it includes the SYN, ACK
202and RST packets. Doesn't like TcpInSegs, the TcpOutSegs is aware of
203GSO, so if a packet would be split to 2 by GSO, TcpOutSegs will
204increase 2.
205
206* TcpActiveOpens
207Defined in `RFC1213 tcpActiveOpens`_
208
209.. _RFC1213 tcpActiveOpens: https://tools.ietf.org/html/rfc1213#page-47
210
211It means the TCP layer sends a SYN, and come into the SYN-SENT
212state. Every time TcpActiveOpens increases 1, TcpOutSegs should always
213increase 1.
214
215* TcpPassiveOpens
216Defined in `RFC1213 tcpPassiveOpens`_
217
218.. _RFC1213 tcpPassiveOpens: https://tools.ietf.org/html/rfc1213#page-47
219
220It means the TCP layer receives a SYN, replies a SYN+ACK, come into
221the SYN-RCVD state.
222
223TCP Fast Open
224============
225When kernel receives a TCP packet, it has two paths to handler the
226packet, one is fast path, another is slow path. The comment in kernel
227code provides a good explanation of them, I pasted them below::
228
229  It is split into a fast path and a slow path. The fast path is
230  disabled when:
231
232  - A zero window was announced from us
233  - zero window probing
234    is only handled properly on the slow path.
235  - Out of order segments arrived.
236  - Urgent data is expected.
237  - There is no buffer space left
238  - Unexpected TCP flags/window values/header lengths are received
239    (detected by checking the TCP header against pred_flags)
240  - Data is sent in both directions. The fast path only supports pure senders
241    or pure receivers (this means either the sequence number or the ack
242    value must stay constant)
243  - Unexpected TCP option.
244
245Kernel will try to use fast path unless any of the above conditions
246are satisfied. If the packets are out of order, kernel will handle
247them in slow path, which means the performance might be not very
248good. Kernel would also come into slow path if the "Delayed ack" is
249used, because when using "Delayed ack", the data is sent in both
250directions. When the TCP window scale option is not used, kernel will
251try to enable fast path immediately when the connection comes into the
252established state, but if the TCP window scale option is used, kernel
253will disable the fast path at first, and try to enable it after kernel
254receives packets.
255
256* TcpExtTCPPureAcks and TcpExtTCPHPAcks
257If a packet set ACK flag and has no data, it is a pure ACK packet, if
258kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1,
259if kernel handles it in the slow path, TcpExtTCPPureAcks will
260increase 1.
261
262* TcpExtTCPHPHits
263If a TCP packet has data (which means it is not a pure ACK packet),
264and this packet is handled in the fast path, TcpExtTCPHPHits will
265increase 1.
266
267
268TCP abort
269========
270
271
272* TcpExtTCPAbortOnData
273It means TCP layer has data in flight, but need to close the
274connection. So TCP layer sends a RST to the other side, indicate the
275connection is not closed very graceful. An easy way to increase this
276counter is using the SO_LINGER option. Please refer to the SO_LINGER
277section of the `socket man page`_:
278
279.. _socket man page: http://man7.org/linux/man-pages/man7/socket.7.html
280
281By default, when an application closes a connection, the close function
282will return immediately and kernel will try to send the in-flight data
283async. If you use the SO_LINGER option, set l_onoff to 1, and l_linger
284to a positive number, the close function won't return immediately, but
285wait for the in-flight data are acked by the other side, the max wait
286time is l_linger seconds. If set l_onoff to 1 and set l_linger to 0,
287when the application closes a connection, kernel will send a RST
288immediately and increase the TcpExtTCPAbortOnData counter.
289
290* TcpExtTCPAbortOnClose
291This counter means the application has unread data in the TCP layer when
292the application wants to close the TCP connection. In such a situation,
293kernel will send a RST to the other side of the TCP connection.
294
295* TcpExtTCPAbortOnMemory
296When an application closes a TCP connection, kernel still need to track
297the connection, let it complete the TCP disconnect process. E.g. an
298app calls the close method of a socket, kernel sends fin to the other
299side of the connection, then the app has no relationship with the
300socket any more, but kernel need to keep the socket, this socket
301becomes an orphan socket, kernel waits for the reply of the other side,
302and would come to the TIME_WAIT state finally. When kernel has no
303enough memory to keep the orphan socket, kernel would send an RST to
304the other side, and delete the socket, in such situation, kernel will
305increase 1 to the TcpExtTCPAbortOnMemory. Two conditions would trigger
306TcpExtTCPAbortOnMemory:
307
3081. the memory used by the TCP protocol is higher than the third value of
309the tcp_mem. Please refer the tcp_mem section in the `TCP man page`_:
310
311.. _TCP man page: http://man7.org/linux/man-pages/man7/tcp.7.html
312
3132. the orphan socket count is higher than net.ipv4.tcp_max_orphans
314
315
316* TcpExtTCPAbortOnTimeout
317This counter will increase when any of the TCP timers expire. In such
318situation, kernel won't send RST, just give up the connection.
319
320* TcpExtTCPAbortOnLinger
321When a TCP connection comes into FIN_WAIT_2 state, instead of waiting
322for the fin packet from the other side, kernel could send a RST and
323delete the socket immediately. This is not the default behavior of
324Linux kernel TCP stack. By configuring the TCP_LINGER2 socket option,
325you could let kernel follow this behavior.
326
327* TcpExtTCPAbortFailed
328The kernel TCP layer will send RST if the `RFC2525 2.17 section`_ is
329satisfied. If an internal error occurs during this process,
330TcpExtTCPAbortFailed will be increased.
331
332.. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50
333
334examples
335=======
336
337ping test
338--------
339Run the ping command against the public dns server 8.8.8.8::
340
341  nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1
342  PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
343  64 bytes from 8.8.8.8: icmp_seq=1 ttl=119 time=17.8 ms
344
345  --- 8.8.8.8 ping statistics ---
346  1 packets transmitted, 1 received, 0% packet loss, time 0ms
347  rtt min/avg/max/mdev = 17.875/17.875/17.875/0.000 ms
348
349The nstayt result::
350
351  nstatuser@nstat-a:~$ nstat
352  #kernel
353  IpInReceives                    1                  0.0
354  IpInDelivers                    1                  0.0
355  IpOutRequests                   1                  0.0
356  IcmpInMsgs                      1                  0.0
357  IcmpInEchoReps                  1                  0.0
358  IcmpOutMsgs                     1                  0.0
359  IcmpOutEchos                    1                  0.0
360  IcmpMsgInType0                  1                  0.0
361  IcmpMsgOutType8                 1                  0.0
362  IpExtInOctets                   84                 0.0
363  IpExtOutOctets                  84                 0.0
364  IpExtInNoECTPkts                1                  0.0
365
366The Linux server sent an ICMP Echo packet, so IpOutRequests,
367IcmpOutMsgs, IcmpOutEchos and IcmpMsgOutType8 were increased 1. The
368server got ICMP Echo Reply from 8.8.8.8, so IpInReceives, IcmpInMsgs,
369IcmpInEchoReps and IcmpMsgInType0 were increased 1. The ICMP Echo Reply
370was passed to the ICMP layer via IP layer, so IpInDelivers was
371increased 1. The default ping data size is 48, so an ICMP Echo packet
372and its corresponding Echo Reply packet are constructed by:
373
374* 14 bytes MAC header
375* 20 bytes IP header
376* 16 bytes ICMP header
377* 48 bytes data (default value of the ping command)
378
379So the IpExtInOctets and IpExtOutOctets are 20+16+48=84.
380
381tcp 3-way handshake
382------------------
383On server side, we run::
384
385  nstatuser@nstat-b:~$ nc -lknv 0.0.0.0 9000
386  Listening on [0.0.0.0] (family 0, port 9000)
387
388On client side, we run::
389
390  nstatuser@nstat-a:~$ nc -nv 192.168.122.251 9000
391  Connection to 192.168.122.251 9000 port [tcp/*] succeeded!
392
393The server listened on tcp 9000 port, the client connected to it, they
394completed the 3-way handshake.
395
396On server side, we can find below nstat output::
397
398  nstatuser@nstat-b:~$ nstat | grep -i tcp
399  TcpPassiveOpens                 1                  0.0
400  TcpInSegs                       2                  0.0
401  TcpOutSegs                      1                  0.0
402  TcpExtTCPPureAcks               1                  0.0
403
404On client side, we can find below nstat output::
405
406  nstatuser@nstat-a:~$ nstat | grep -i tcp
407  TcpActiveOpens                  1                  0.0
408  TcpInSegs                       1                  0.0
409  TcpOutSegs                      2                  0.0
410
411When the server received the first SYN, it replied a SYN+ACK, and came into
412SYN-RCVD state, so TcpPassiveOpens increased 1. The server received
413SYN, sent SYN+ACK, received ACK, so server sent 1 packet, received 2
414packets, TcpInSegs increased 2, TcpOutSegs increased 1. The last ACK
415of the 3-way handshake is a pure ACK without data, so
416TcpExtTCPPureAcks increased 1.
417
418When the client sent SYN, the client came into the SYN-SENT state, so
419TcpActiveOpens increased 1, the client sent SYN, received SYN+ACK, sent
420ACK, so client sent 2 packets, received 1 packet, TcpInSegs increased
4211, TcpOutSegs increased 2.
422
423TCP normal traffic
424-----------------
425Run nc on server::
426
427  nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
428  Listening on [0.0.0.0] (family 0, port 9000)
429
430Run nc on client::
431
432  nstatuser@nstat-a:~$ nc -v nstat-b 9000
433  Connection to nstat-b 9000 port [tcp/*] succeeded!
434
435Input a string in the nc client ('hello' in our example)::
436
437  nstatuser@nstat-a:~$ nc -v nstat-b 9000
438  Connection to nstat-b 9000 port [tcp/*] succeeded!
439  hello
440
441The client side nstat output::
442
443  nstatuser@nstat-a:~$ nstat
444  #kernel
445  IpInReceives                    1                  0.0
446  IpInDelivers                    1                  0.0
447  IpOutRequests                   1                  0.0
448  TcpInSegs                       1                  0.0
449  TcpOutSegs                      1                  0.0
450  TcpExtTCPPureAcks               1                  0.0
451  TcpExtTCPOrigDataSent           1                  0.0
452  IpExtInOctets                   52                 0.0
453  IpExtOutOctets                  58                 0.0
454  IpExtInNoECTPkts                1                  0.0
455
456The server side nstat output::
457
458  nstatuser@nstat-b:~$ nstat
459  #kernel
460  IpInReceives                    1                  0.0
461  IpInDelivers                    1                  0.0
462  IpOutRequests                   1                  0.0
463  TcpInSegs                       1                  0.0
464  TcpOutSegs                      1                  0.0
465  IpExtInOctets                   58                 0.0
466  IpExtOutOctets                  52                 0.0
467  IpExtInNoECTPkts                1                  0.0
468
469Input a string in nc client side again ('world' in our exmaple)::
470
471  nstatuser@nstat-a:~$ nc -v nstat-b 9000
472  Connection to nstat-b 9000 port [tcp/*] succeeded!
473  hello
474  world
475
476Client side nstat output::
477
478  nstatuser@nstat-a:~$ nstat
479  #kernel
480  IpInReceives                    1                  0.0
481  IpInDelivers                    1                  0.0
482  IpOutRequests                   1                  0.0
483  TcpInSegs                       1                  0.0
484  TcpOutSegs                      1                  0.0
485  TcpExtTCPHPAcks                 1                  0.0
486  TcpExtTCPOrigDataSent           1                  0.0
487  IpExtInOctets                   52                 0.0
488  IpExtOutOctets                  58                 0.0
489  IpExtInNoECTPkts                1                  0.0
490
491
492Server side nstat output::
493
494  nstatuser@nstat-b:~$ nstat
495  #kernel
496  IpInReceives                    1                  0.0
497  IpInDelivers                    1                  0.0
498  IpOutRequests                   1                  0.0
499  TcpInSegs                       1                  0.0
500  TcpOutSegs                      1                  0.0
501  TcpExtTCPHPHits                 1                  0.0
502  IpExtInOctets                   58                 0.0
503  IpExtOutOctets                  52                 0.0
504  IpExtInNoECTPkts                1                  0.0
505
506Compare the first client-side nstat and the second client-side nstat,
507we could find one difference: the first one had a 'TcpExtTCPPureAcks',
508but the second one had a 'TcpExtTCPHPAcks'. The first server-side
509nstat and the second server-side nstat had a difference too: the
510second server-side nstat had a TcpExtTCPHPHits, but the first
511server-side nstat didn't have it. The network traffic patterns were
512exactly the same: the client sent a packet to the server, the server
513replied an ACK. But kernel handled them in different ways. When the
514TCP window scale option is not used, kernel will try to enable fast
515path immediately when the connection comes into the established state,
516but if the TCP window scale option is used, kernel will disable the
517fast path at first, and try to enable it after kerenl receives
518packets. We could use the 'ss' command to verify whether the window
519scale option is used. e.g. run below command on either server or
520client::
521
522  nstatuser@nstat-a:~$ ss -o state established -i '( dport = :9000 or sport = :9000 )
523  Netid    Recv-Q     Send-Q            Local Address:Port             Peer Address:Port
524  tcp      0          0               192.168.122.250:40654         192.168.122.251:9000
525             ts sack cubic wscale:7,7 rto:204 rtt:0.98/0.49 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:10 bytes_acked:1 segs_out:2 segs_in:1 send 118.2Mbps lastsnd:46572 lastrcv:46572 lastack:46572 pacing_rate 236.4Mbps rcv_space:29200 rcv_ssthresh:29200 minrtt:0.98
526
527The 'wscale:7,7' means both server and client set the window scale
528option to 7. Now we could explain the nstat output in our test:
529
530In the first nstat output of client side, the client sent a packet, server
531reply an ACK, when kernel handled this ACK, the fast path was not
532enabled, so the ACK was counted into 'TcpExtTCPPureAcks'.
533
534In the second nstat output of client side, the client sent a packet again,
535and received another ACK from the server, in this time, the fast path is
536enabled, and the ACK was qualified for fast path, so it was handled by
537the fast path, so this ACK was counted into TcpExtTCPHPAcks.
538
539In the first nstat output of server side, fast path was not enabled,
540so there was no 'TcpExtTCPHPHits'.
541
542In the second nstat output of server side, the fast path was enabled,
543and the packet received from client qualified for fast path, so it
544was counted into 'TcpExtTCPHPHits'.
545
546TcpExtTCPAbortOnClose
547--------------------
548On the server side, we run below python script::
549
550  import socket
551  import time
552
553  port = 9000
554
555  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
556  s.bind(('0.0.0.0', port))
557  s.listen(1)
558  sock, addr = s.accept()
559  while True:
560      time.sleep(9999999)
561
562This python script listen on 9000 port, but doesn't read anything from
563the connection.
564
565On the client side, we send the string "hello" by nc::
566
567  nstatuser@nstat-a:~$ echo "hello" | nc nstat-b 9000
568
569Then, we come back to the server side, the server has received the "hello"
570packet, and the TCP layer has acked this packet, but the application didn't
571read it yet. We type Ctrl-C to terminate the server script. Then we
572could find TcpExtTCPAbortOnClose increased 1 on the server side::
573
574  nstatuser@nstat-b:~$ nstat | grep -i abort
575  TcpExtTCPAbortOnClose           1                  0.0
576
577If we run tcpdump on the server side, we could find the server sent a
578RST after we type Ctrl-C.
579
580TcpExtTCPAbortOnMemory and TcpExtTCPAbortOnTimeout
581-----------------------------------------------
582Below is an example which let the orphan socket count be higher than
583net.ipv4.tcp_max_orphans.
584Change tcp_max_orphans to a smaller value on client::
585
586  sudo bash -c "echo 10 > /proc/sys/net/ipv4/tcp_max_orphans"
587
588Client code (create 64 connection to server)::
589
590  nstatuser@nstat-a:~$ cat client_orphan.py
591  import socket
592  import time
593
594  server = 'nstat-b' # server address
595  port = 9000
596
597  count = 64
598
599  connection_list = []
600
601  for i in range(64):
602      s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
603      s.connect((server, port))
604      connection_list.append(s)
605      print("connection_count: %d" % len(connection_list))
606
607  while True:
608      time.sleep(99999)
609
610Server code (accept 64 connection from client)::
611
612  nstatuser@nstat-b:~$ cat server_orphan.py
613  import socket
614  import time
615
616  port = 9000
617  count = 64
618
619  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
620  s.bind(('0.0.0.0', port))
621  s.listen(count)
622  connection_list = []
623  while True:
624      sock, addr = s.accept()
625      connection_list.append((sock, addr))
626      print("connection_count: %d" % len(connection_list))
627
628Run the python scripts on server and client.
629
630On server::
631
632  python3 server_orphan.py
633
634On client::
635
636  python3 client_orphan.py
637
638Run iptables on server::
639
640  sudo iptables -A INPUT -i ens3 -p tcp --destination-port 9000 -j DROP
641
642Type Ctrl-C on client, stop client_orphan.py.
643
644Check TcpExtTCPAbortOnMemory on client::
645
646  nstatuser@nstat-a:~$ nstat | grep -i abort
647  TcpExtTCPAbortOnMemory          54                 0.0
648
649Check orphane socket count on client::
650
651  nstatuser@nstat-a:~$ ss -s
652  Total: 131 (kernel 0)
653  TCP:   14 (estab 1, closed 0, orphaned 10, synrecv 0, timewait 0/0), ports 0
654
655  Transport Total     IP        IPv6
656  *         0         -         -
657  RAW       1         0         1
658  UDP       1         1         0
659  TCP       14        13        1
660  INET      16        14        2
661  FRAG      0         0         0
662
663The explanation of the test: after run server_orphan.py and
664client_orphan.py, we set up 64 connections between server and
665client. Run the iptables command, the server will drop all packets from
666the client, type Ctrl-C on client_orphan.py, the system of the client
667would try to close these connections, and before they are closed
668gracefully, these connections became orphan sockets. As the iptables
669of the server blocked packets from the client, the server won't receive fin
670from the client, so all connection on clients would be stuck on FIN_WAIT_1
671stage, so they will keep as orphan sockets until timeout. We have echo
67210 to /proc/sys/net/ipv4/tcp_max_orphans, so the client system would
673only keep 10 orphan sockets, for all other orphan sockets, the client
674system sent RST for them and delete them. We have 64 connections, so
675the 'ss -s' command shows the system has 10 orphan sockets, and the
676value of TcpExtTCPAbortOnMemory was 54.
677
678An additional explanation about orphan socket count: You could find the
679exactly orphan socket count by the 'ss -s' command, but when kernel
680decide whither increases TcpExtTCPAbortOnMemory and sends RST, kernel
681doesn't always check the exactly orphan socket count. For increasing
682performance, kernel checks an approximate count firstly, if the
683approximate count is more than tcp_max_orphans, kernel checks the
684exact count again. So if the approximate count is less than
685tcp_max_orphans, but exactly count is more than tcp_max_orphans, you
686would find TcpExtTCPAbortOnMemory is not increased at all. If
687tcp_max_orphans is large enough, it won't occur, but if you decrease
688tcp_max_orphans to a small value like our test, you might find this
689issue. So in our test, the client set up 64 connections although the
690tcp_max_orphans is 10. If the client only set up 11 connections, we
691can't find the change of TcpExtTCPAbortOnMemory.
692
693Continue the previous test, we wait for several minutes. Because of the
694iptables on the server blocked the traffic, the server wouldn't receive
695fin, and all the client's orphan sockets would timeout on the
696FIN_WAIT_1 state finally. So we wait for a few minutes, we could find
69710 timeout on the client::
698
699  nstatuser@nstat-a:~$ nstat | grep -i abort
700  TcpExtTCPAbortOnTimeout         10                 0.0
701
702TcpExtTCPAbortOnLinger
703---------------------
704The server side code::
705
706  nstatuser@nstat-b:~$ cat server_linger.py
707  import socket
708  import time
709
710  port = 9000
711
712  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
713  s.bind(('0.0.0.0', port))
714  s.listen(1)
715  sock, addr = s.accept()
716  while True:
717      time.sleep(9999999)
718
719The client side code::
720
721  nstatuser@nstat-a:~$ cat client_linger.py
722  import socket
723  import struct
724
725  server = 'nstat-b' # server address
726  port = 9000
727
728  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
729  s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 10))
730  s.setsockopt(socket.SOL_TCP, socket.TCP_LINGER2, struct.pack('i', -1))
731  s.connect((server, port))
732  s.close()
733
734Run server_linger.py on server::
735
736  nstatuser@nstat-b:~$ python3 server_linger.py
737
738Run client_linger.py on client::
739
740  nstatuser@nstat-a:~$ python3 client_linger.py
741
742After run client_linger.py, check the output of nstat::
743
744  nstatuser@nstat-a:~$ nstat | grep -i abort
745  TcpExtTCPAbortOnLinger          1                  0.0
746