1=========== 2SNMP counter 3=========== 4 5This document explains the meaning of SNMP counters. 6 7General IPv4 counters 8==================== 9All layer 4 packets and ICMP packets will change these counters, but 10these counters won't be changed by layer 2 packets (such as STP) or 11ARP packets. 12 13* IpInReceives 14Defined in `RFC1213 ipInReceives`_ 15 16.. _RFC1213 ipInReceives: https://tools.ietf.org/html/rfc1213#page-26 17 18The number of packets received by the IP layer. It gets increasing at the 19beginning of ip_rcv function, always be updated together with 20IpExtInOctets. It indicates the number of aggregated segments after 21GRO/LRO. 22 23* IpInDelivers 24Defined in `RFC1213 ipInDelivers`_ 25 26.. _RFC1213 ipInDelivers: https://tools.ietf.org/html/rfc1213#page-28 27 28The number of packets delivers to the upper layer protocols. E.g. TCP, UDP, 29ICMP and so on. If no one listens on a raw socket, only kernel 30supported protocols will be delivered, if someone listens on the raw 31socket, all valid IP packets will be delivered. 32 33* IpOutRequests 34Defined in `RFC1213 ipOutRequests`_ 35 36.. _RFC1213 ipOutRequests: https://tools.ietf.org/html/rfc1213#page-28 37 38The number of packets sent via IP layer, for both single cast and 39multicast packets, and would always be updated together with 40IpExtOutOctets. 41 42* IpExtInOctets and IpExtOutOctets 43They are Linux kernel extensions, no RFC definitions. Please note, 44RFC1213 indeed defines ifInOctets and ifOutOctets, but they 45are different things. The ifInOctets and ifOutOctets include the MAC 46layer header size but IpExtInOctets and IpExtOutOctets don't, they 47only include the IP layer header and the IP layer data. 48 49* IpExtInNoECTPkts, IpExtInECT1Pkts, IpExtInECT0Pkts, IpExtInCEPkts 50They indicate the number of four kinds of ECN IP packets, please refer 51`Explicit Congestion Notification`_ for more details. 52 53.. _Explicit Congestion Notification: https://tools.ietf.org/html/rfc3168#page-6 54 55These 4 counters calculate how many packets received per ECN 56status. They count the real frame number regardless the LRO/GRO. So 57for the same packet, you might find that IpInReceives count 1, but 58IpExtInNoECTPkts counts 2 or more. 59 60ICMP counters 61============ 62* IcmpInMsgs and IcmpOutMsgs 63Defined by `RFC1213 icmpInMsgs`_ and `RFC1213 icmpOutMsgs`_ 64 65.. _RFC1213 icmpInMsgs: https://tools.ietf.org/html/rfc1213#page-41 66.. _RFC1213 icmpOutMsgs: https://tools.ietf.org/html/rfc1213#page-43 67 68As mentioned in the RFC1213, these two counters include errors, they 69would be increased even if the ICMP packet has an invalid type. The 70ICMP output path will check the header of a raw socket, so the 71IcmpOutMsgs would still be updated if the IP header is constructed by 72a userspace program. 73 74* ICMP named types 75| These counters include most of common ICMP types, they are: 76| IcmpInDestUnreachs: `RFC1213 icmpInDestUnreachs`_ 77| IcmpInTimeExcds: `RFC1213 icmpInTimeExcds`_ 78| IcmpInParmProbs: `RFC1213 icmpInParmProbs`_ 79| IcmpInSrcQuenchs: `RFC1213 icmpInSrcQuenchs`_ 80| IcmpInRedirects: `RFC1213 icmpInRedirects`_ 81| IcmpInEchos: `RFC1213 icmpInEchos`_ 82| IcmpInEchoReps: `RFC1213 icmpInEchoReps`_ 83| IcmpInTimestamps: `RFC1213 icmpInTimestamps`_ 84| IcmpInTimestampReps: `RFC1213 icmpInTimestampReps`_ 85| IcmpInAddrMasks: `RFC1213 icmpInAddrMasks`_ 86| IcmpInAddrMaskReps: `RFC1213 icmpInAddrMaskReps`_ 87| IcmpOutDestUnreachs: `RFC1213 icmpOutDestUnreachs`_ 88| IcmpOutTimeExcds: `RFC1213 icmpOutTimeExcds`_ 89| IcmpOutParmProbs: `RFC1213 icmpOutParmProbs`_ 90| IcmpOutSrcQuenchs: `RFC1213 icmpOutSrcQuenchs`_ 91| IcmpOutRedirects: `RFC1213 icmpOutRedirects`_ 92| IcmpOutEchos: `RFC1213 icmpOutEchos`_ 93| IcmpOutEchoReps: `RFC1213 icmpOutEchoReps`_ 94| IcmpOutTimestamps: `RFC1213 icmpOutTimestamps`_ 95| IcmpOutTimestampReps: `RFC1213 icmpOutTimestampReps`_ 96| IcmpOutAddrMasks: `RFC1213 icmpOutAddrMasks`_ 97| IcmpOutAddrMaskReps: `RFC1213 icmpOutAddrMaskReps`_ 98 99.. _RFC1213 icmpInDestUnreachs: https://tools.ietf.org/html/rfc1213#page-41 100.. _RFC1213 icmpInTimeExcds: https://tools.ietf.org/html/rfc1213#page-41 101.. _RFC1213 icmpInParmProbs: https://tools.ietf.org/html/rfc1213#page-42 102.. _RFC1213 icmpInSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-42 103.. _RFC1213 icmpInRedirects: https://tools.ietf.org/html/rfc1213#page-42 104.. _RFC1213 icmpInEchos: https://tools.ietf.org/html/rfc1213#page-42 105.. _RFC1213 icmpInEchoReps: https://tools.ietf.org/html/rfc1213#page-42 106.. _RFC1213 icmpInTimestamps: https://tools.ietf.org/html/rfc1213#page-42 107.. _RFC1213 icmpInTimestampReps: https://tools.ietf.org/html/rfc1213#page-43 108.. _RFC1213 icmpInAddrMasks: https://tools.ietf.org/html/rfc1213#page-43 109.. _RFC1213 icmpInAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-43 110 111.. _RFC1213 icmpOutDestUnreachs: https://tools.ietf.org/html/rfc1213#page-44 112.. _RFC1213 icmpOutTimeExcds: https://tools.ietf.org/html/rfc1213#page-44 113.. _RFC1213 icmpOutParmProbs: https://tools.ietf.org/html/rfc1213#page-44 114.. _RFC1213 icmpOutSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-44 115.. _RFC1213 icmpOutRedirects: https://tools.ietf.org/html/rfc1213#page-44 116.. _RFC1213 icmpOutEchos: https://tools.ietf.org/html/rfc1213#page-45 117.. _RFC1213 icmpOutEchoReps: https://tools.ietf.org/html/rfc1213#page-45 118.. _RFC1213 icmpOutTimestamps: https://tools.ietf.org/html/rfc1213#page-45 119.. _RFC1213 icmpOutTimestampReps: https://tools.ietf.org/html/rfc1213#page-45 120.. _RFC1213 icmpOutAddrMasks: https://tools.ietf.org/html/rfc1213#page-45 121.. _RFC1213 icmpOutAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-46 122 123Every ICMP type has two counters: 'In' and 'Out'. E.g., for the ICMP 124Echo packet, they are IcmpInEchos and IcmpOutEchos. Their meanings are 125straightforward. The 'In' counter means kernel receives such a packet 126and the 'Out' counter means kernel sends such a packet. 127 128* ICMP numeric types 129They are IcmpMsgInType[N] and IcmpMsgOutType[N], the [N] indicates the 130ICMP type number. These counters track all kinds of ICMP packets. The 131ICMP type number definition could be found in the `ICMP parameters`_ 132document. 133 134.. _ICMP parameters: https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml 135 136For example, if the Linux kernel sends an ICMP Echo packet, the 137IcmpMsgOutType8 would increase 1. And if kernel gets an ICMP Echo Reply 138packet, IcmpMsgInType0 would increase 1. 139 140* IcmpInCsumErrors 141This counter indicates the checksum of the ICMP packet is 142wrong. Kernel verifies the checksum after updating the IcmpInMsgs and 143before updating IcmpMsgInType[N]. If a packet has bad checksum, the 144IcmpInMsgs would be updated but none of IcmpMsgInType[N] would be updated. 145 146* IcmpInErrors and IcmpOutErrors 147Defined by `RFC1213 icmpInErrors`_ and `RFC1213 icmpOutErrors`_ 148 149.. _RFC1213 icmpInErrors: https://tools.ietf.org/html/rfc1213#page-41 150.. _RFC1213 icmpOutErrors: https://tools.ietf.org/html/rfc1213#page-43 151 152When an error occurs in the ICMP packet handler path, these two 153counters would be updated. The receiving packet path use IcmpInErrors 154and the sending packet path use IcmpOutErrors. When IcmpInCsumErrors 155is increased, IcmpInErrors would always be increased too. 156 157relationship of the ICMP counters 158------------------------------- 159The sum of IcmpMsgOutType[N] is always equal to IcmpOutMsgs, as they 160are updated at the same time. The sum of IcmpMsgInType[N] plus 161IcmpInErrors should be equal or larger than IcmpInMsgs. When kernel 162receives an ICMP packet, kernel follows below logic: 163 1641. increase IcmpInMsgs 1652. if has any error, update IcmpInErrors and finish the process 1663. update IcmpMsgOutType[N] 1674. handle the packet depending on the type, if has any error, update 168 IcmpInErrors and finish the process 169 170So if all errors occur in step (2), IcmpInMsgs should be equal to the 171sum of IcmpMsgOutType[N] plus IcmpInErrors. If all errors occur in 172step (4), IcmpInMsgs should be equal to the sum of 173IcmpMsgOutType[N]. If the errors occur in both step (2) and step (4), 174IcmpInMsgs should be less than the sum of IcmpMsgOutType[N] plus 175IcmpInErrors. 176 177General TCP counters 178================== 179* TcpInSegs 180Defined in `RFC1213 tcpInSegs`_ 181 182.. _RFC1213 tcpInSegs: https://tools.ietf.org/html/rfc1213#page-48 183 184The number of packets received by the TCP layer. As mentioned in 185RFC1213, it includes the packets received in error, such as checksum 186error, invalid TCP header and so on. Only one error won't be included: 187if the layer 2 destination address is not the NIC's layer 2 188address. It might happen if the packet is a multicast or broadcast 189packet, or the NIC is in promiscuous mode. In these situations, the 190packets would be delivered to the TCP layer, but the TCP layer will discard 191these packets before increasing TcpInSegs. The TcpInSegs counter 192isn't aware of GRO. So if two packets are merged by GRO, the TcpInSegs 193counter would only increase 1. 194 195* TcpOutSegs 196Defined in `RFC1213 tcpOutSegs`_ 197 198.. _RFC1213 tcpOutSegs: https://tools.ietf.org/html/rfc1213#page-48 199 200The number of packets sent by the TCP layer. As mentioned in RFC1213, 201it excludes the retransmitted packets. But it includes the SYN, ACK 202and RST packets. Doesn't like TcpInSegs, the TcpOutSegs is aware of 203GSO, so if a packet would be split to 2 by GSO, TcpOutSegs will 204increase 2. 205 206* TcpActiveOpens 207Defined in `RFC1213 tcpActiveOpens`_ 208 209.. _RFC1213 tcpActiveOpens: https://tools.ietf.org/html/rfc1213#page-47 210 211It means the TCP layer sends a SYN, and come into the SYN-SENT 212state. Every time TcpActiveOpens increases 1, TcpOutSegs should always 213increase 1. 214 215* TcpPassiveOpens 216Defined in `RFC1213 tcpPassiveOpens`_ 217 218.. _RFC1213 tcpPassiveOpens: https://tools.ietf.org/html/rfc1213#page-47 219 220It means the TCP layer receives a SYN, replies a SYN+ACK, come into 221the SYN-RCVD state. 222 223TCP Fast Open 224============ 225When kernel receives a TCP packet, it has two paths to handler the 226packet, one is fast path, another is slow path. The comment in kernel 227code provides a good explanation of them, I pasted them below:: 228 229 It is split into a fast path and a slow path. The fast path is 230 disabled when: 231 232 - A zero window was announced from us 233 - zero window probing 234 is only handled properly on the slow path. 235 - Out of order segments arrived. 236 - Urgent data is expected. 237 - There is no buffer space left 238 - Unexpected TCP flags/window values/header lengths are received 239 (detected by checking the TCP header against pred_flags) 240 - Data is sent in both directions. The fast path only supports pure senders 241 or pure receivers (this means either the sequence number or the ack 242 value must stay constant) 243 - Unexpected TCP option. 244 245Kernel will try to use fast path unless any of the above conditions 246are satisfied. If the packets are out of order, kernel will handle 247them in slow path, which means the performance might be not very 248good. Kernel would also come into slow path if the "Delayed ack" is 249used, because when using "Delayed ack", the data is sent in both 250directions. When the TCP window scale option is not used, kernel will 251try to enable fast path immediately when the connection comes into the 252established state, but if the TCP window scale option is used, kernel 253will disable the fast path at first, and try to enable it after kernel 254receives packets. 255 256* TcpExtTCPPureAcks and TcpExtTCPHPAcks 257If a packet set ACK flag and has no data, it is a pure ACK packet, if 258kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1, 259if kernel handles it in the slow path, TcpExtTCPPureAcks will 260increase 1. 261 262* TcpExtTCPHPHits 263If a TCP packet has data (which means it is not a pure ACK packet), 264and this packet is handled in the fast path, TcpExtTCPHPHits will 265increase 1. 266 267 268TCP abort 269======== 270 271 272* TcpExtTCPAbortOnData 273It means TCP layer has data in flight, but need to close the 274connection. So TCP layer sends a RST to the other side, indicate the 275connection is not closed very graceful. An easy way to increase this 276counter is using the SO_LINGER option. Please refer to the SO_LINGER 277section of the `socket man page`_: 278 279.. _socket man page: http://man7.org/linux/man-pages/man7/socket.7.html 280 281By default, when an application closes a connection, the close function 282will return immediately and kernel will try to send the in-flight data 283async. If you use the SO_LINGER option, set l_onoff to 1, and l_linger 284to a positive number, the close function won't return immediately, but 285wait for the in-flight data are acked by the other side, the max wait 286time is l_linger seconds. If set l_onoff to 1 and set l_linger to 0, 287when the application closes a connection, kernel will send a RST 288immediately and increase the TcpExtTCPAbortOnData counter. 289 290* TcpExtTCPAbortOnClose 291This counter means the application has unread data in the TCP layer when 292the application wants to close the TCP connection. In such a situation, 293kernel will send a RST to the other side of the TCP connection. 294 295* TcpExtTCPAbortOnMemory 296When an application closes a TCP connection, kernel still need to track 297the connection, let it complete the TCP disconnect process. E.g. an 298app calls the close method of a socket, kernel sends fin to the other 299side of the connection, then the app has no relationship with the 300socket any more, but kernel need to keep the socket, this socket 301becomes an orphan socket, kernel waits for the reply of the other side, 302and would come to the TIME_WAIT state finally. When kernel has no 303enough memory to keep the orphan socket, kernel would send an RST to 304the other side, and delete the socket, in such situation, kernel will 305increase 1 to the TcpExtTCPAbortOnMemory. Two conditions would trigger 306TcpExtTCPAbortOnMemory: 307 3081. the memory used by the TCP protocol is higher than the third value of 309the tcp_mem. Please refer the tcp_mem section in the `TCP man page`_: 310 311.. _TCP man page: http://man7.org/linux/man-pages/man7/tcp.7.html 312 3132. the orphan socket count is higher than net.ipv4.tcp_max_orphans 314 315 316* TcpExtTCPAbortOnTimeout 317This counter will increase when any of the TCP timers expire. In such 318situation, kernel won't send RST, just give up the connection. 319 320* TcpExtTCPAbortOnLinger 321When a TCP connection comes into FIN_WAIT_2 state, instead of waiting 322for the fin packet from the other side, kernel could send a RST and 323delete the socket immediately. This is not the default behavior of 324Linux kernel TCP stack. By configuring the TCP_LINGER2 socket option, 325you could let kernel follow this behavior. 326 327* TcpExtTCPAbortFailed 328The kernel TCP layer will send RST if the `RFC2525 2.17 section`_ is 329satisfied. If an internal error occurs during this process, 330TcpExtTCPAbortFailed will be increased. 331 332.. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50 333 334examples 335======= 336 337ping test 338-------- 339Run the ping command against the public dns server 8.8.8.8:: 340 341 nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1 342 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 343 64 bytes from 8.8.8.8: icmp_seq=1 ttl=119 time=17.8 ms 344 345 --- 8.8.8.8 ping statistics --- 346 1 packets transmitted, 1 received, 0% packet loss, time 0ms 347 rtt min/avg/max/mdev = 17.875/17.875/17.875/0.000 ms 348 349The nstayt result:: 350 351 nstatuser@nstat-a:~$ nstat 352 #kernel 353 IpInReceives 1 0.0 354 IpInDelivers 1 0.0 355 IpOutRequests 1 0.0 356 IcmpInMsgs 1 0.0 357 IcmpInEchoReps 1 0.0 358 IcmpOutMsgs 1 0.0 359 IcmpOutEchos 1 0.0 360 IcmpMsgInType0 1 0.0 361 IcmpMsgOutType8 1 0.0 362 IpExtInOctets 84 0.0 363 IpExtOutOctets 84 0.0 364 IpExtInNoECTPkts 1 0.0 365 366The Linux server sent an ICMP Echo packet, so IpOutRequests, 367IcmpOutMsgs, IcmpOutEchos and IcmpMsgOutType8 were increased 1. The 368server got ICMP Echo Reply from 8.8.8.8, so IpInReceives, IcmpInMsgs, 369IcmpInEchoReps and IcmpMsgInType0 were increased 1. The ICMP Echo Reply 370was passed to the ICMP layer via IP layer, so IpInDelivers was 371increased 1. The default ping data size is 48, so an ICMP Echo packet 372and its corresponding Echo Reply packet are constructed by: 373 374* 14 bytes MAC header 375* 20 bytes IP header 376* 16 bytes ICMP header 377* 48 bytes data (default value of the ping command) 378 379So the IpExtInOctets and IpExtOutOctets are 20+16+48=84. 380 381tcp 3-way handshake 382------------------ 383On server side, we run:: 384 385 nstatuser@nstat-b:~$ nc -lknv 0.0.0.0 9000 386 Listening on [0.0.0.0] (family 0, port 9000) 387 388On client side, we run:: 389 390 nstatuser@nstat-a:~$ nc -nv 192.168.122.251 9000 391 Connection to 192.168.122.251 9000 port [tcp/*] succeeded! 392 393The server listened on tcp 9000 port, the client connected to it, they 394completed the 3-way handshake. 395 396On server side, we can find below nstat output:: 397 398 nstatuser@nstat-b:~$ nstat | grep -i tcp 399 TcpPassiveOpens 1 0.0 400 TcpInSegs 2 0.0 401 TcpOutSegs 1 0.0 402 TcpExtTCPPureAcks 1 0.0 403 404On client side, we can find below nstat output:: 405 406 nstatuser@nstat-a:~$ nstat | grep -i tcp 407 TcpActiveOpens 1 0.0 408 TcpInSegs 1 0.0 409 TcpOutSegs 2 0.0 410 411When the server received the first SYN, it replied a SYN+ACK, and came into 412SYN-RCVD state, so TcpPassiveOpens increased 1. The server received 413SYN, sent SYN+ACK, received ACK, so server sent 1 packet, received 2 414packets, TcpInSegs increased 2, TcpOutSegs increased 1. The last ACK 415of the 3-way handshake is a pure ACK without data, so 416TcpExtTCPPureAcks increased 1. 417 418When the client sent SYN, the client came into the SYN-SENT state, so 419TcpActiveOpens increased 1, the client sent SYN, received SYN+ACK, sent 420ACK, so client sent 2 packets, received 1 packet, TcpInSegs increased 4211, TcpOutSegs increased 2. 422 423TCP normal traffic 424----------------- 425Run nc on server:: 426 427 nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000 428 Listening on [0.0.0.0] (family 0, port 9000) 429 430Run nc on client:: 431 432 nstatuser@nstat-a:~$ nc -v nstat-b 9000 433 Connection to nstat-b 9000 port [tcp/*] succeeded! 434 435Input a string in the nc client ('hello' in our example):: 436 437 nstatuser@nstat-a:~$ nc -v nstat-b 9000 438 Connection to nstat-b 9000 port [tcp/*] succeeded! 439 hello 440 441The client side nstat output:: 442 443 nstatuser@nstat-a:~$ nstat 444 #kernel 445 IpInReceives 1 0.0 446 IpInDelivers 1 0.0 447 IpOutRequests 1 0.0 448 TcpInSegs 1 0.0 449 TcpOutSegs 1 0.0 450 TcpExtTCPPureAcks 1 0.0 451 TcpExtTCPOrigDataSent 1 0.0 452 IpExtInOctets 52 0.0 453 IpExtOutOctets 58 0.0 454 IpExtInNoECTPkts 1 0.0 455 456The server side nstat output:: 457 458 nstatuser@nstat-b:~$ nstat 459 #kernel 460 IpInReceives 1 0.0 461 IpInDelivers 1 0.0 462 IpOutRequests 1 0.0 463 TcpInSegs 1 0.0 464 TcpOutSegs 1 0.0 465 IpExtInOctets 58 0.0 466 IpExtOutOctets 52 0.0 467 IpExtInNoECTPkts 1 0.0 468 469Input a string in nc client side again ('world' in our exmaple):: 470 471 nstatuser@nstat-a:~$ nc -v nstat-b 9000 472 Connection to nstat-b 9000 port [tcp/*] succeeded! 473 hello 474 world 475 476Client side nstat output:: 477 478 nstatuser@nstat-a:~$ nstat 479 #kernel 480 IpInReceives 1 0.0 481 IpInDelivers 1 0.0 482 IpOutRequests 1 0.0 483 TcpInSegs 1 0.0 484 TcpOutSegs 1 0.0 485 TcpExtTCPHPAcks 1 0.0 486 TcpExtTCPOrigDataSent 1 0.0 487 IpExtInOctets 52 0.0 488 IpExtOutOctets 58 0.0 489 IpExtInNoECTPkts 1 0.0 490 491 492Server side nstat output:: 493 494 nstatuser@nstat-b:~$ nstat 495 #kernel 496 IpInReceives 1 0.0 497 IpInDelivers 1 0.0 498 IpOutRequests 1 0.0 499 TcpInSegs 1 0.0 500 TcpOutSegs 1 0.0 501 TcpExtTCPHPHits 1 0.0 502 IpExtInOctets 58 0.0 503 IpExtOutOctets 52 0.0 504 IpExtInNoECTPkts 1 0.0 505 506Compare the first client-side nstat and the second client-side nstat, 507we could find one difference: the first one had a 'TcpExtTCPPureAcks', 508but the second one had a 'TcpExtTCPHPAcks'. The first server-side 509nstat and the second server-side nstat had a difference too: the 510second server-side nstat had a TcpExtTCPHPHits, but the first 511server-side nstat didn't have it. The network traffic patterns were 512exactly the same: the client sent a packet to the server, the server 513replied an ACK. But kernel handled them in different ways. When the 514TCP window scale option is not used, kernel will try to enable fast 515path immediately when the connection comes into the established state, 516but if the TCP window scale option is used, kernel will disable the 517fast path at first, and try to enable it after kerenl receives 518packets. We could use the 'ss' command to verify whether the window 519scale option is used. e.g. run below command on either server or 520client:: 521 522 nstatuser@nstat-a:~$ ss -o state established -i '( dport = :9000 or sport = :9000 ) 523 Netid Recv-Q Send-Q Local Address:Port Peer Address:Port 524 tcp 0 0 192.168.122.250:40654 192.168.122.251:9000 525 ts sack cubic wscale:7,7 rto:204 rtt:0.98/0.49 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:10 bytes_acked:1 segs_out:2 segs_in:1 send 118.2Mbps lastsnd:46572 lastrcv:46572 lastack:46572 pacing_rate 236.4Mbps rcv_space:29200 rcv_ssthresh:29200 minrtt:0.98 526 527The 'wscale:7,7' means both server and client set the window scale 528option to 7. Now we could explain the nstat output in our test: 529 530In the first nstat output of client side, the client sent a packet, server 531reply an ACK, when kernel handled this ACK, the fast path was not 532enabled, so the ACK was counted into 'TcpExtTCPPureAcks'. 533 534In the second nstat output of client side, the client sent a packet again, 535and received another ACK from the server, in this time, the fast path is 536enabled, and the ACK was qualified for fast path, so it was handled by 537the fast path, so this ACK was counted into TcpExtTCPHPAcks. 538 539In the first nstat output of server side, fast path was not enabled, 540so there was no 'TcpExtTCPHPHits'. 541 542In the second nstat output of server side, the fast path was enabled, 543and the packet received from client qualified for fast path, so it 544was counted into 'TcpExtTCPHPHits'. 545 546TcpExtTCPAbortOnClose 547-------------------- 548On the server side, we run below python script:: 549 550 import socket 551 import time 552 553 port = 9000 554 555 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 556 s.bind(('0.0.0.0', port)) 557 s.listen(1) 558 sock, addr = s.accept() 559 while True: 560 time.sleep(9999999) 561 562This python script listen on 9000 port, but doesn't read anything from 563the connection. 564 565On the client side, we send the string "hello" by nc:: 566 567 nstatuser@nstat-a:~$ echo "hello" | nc nstat-b 9000 568 569Then, we come back to the server side, the server has received the "hello" 570packet, and the TCP layer has acked this packet, but the application didn't 571read it yet. We type Ctrl-C to terminate the server script. Then we 572could find TcpExtTCPAbortOnClose increased 1 on the server side:: 573 574 nstatuser@nstat-b:~$ nstat | grep -i abort 575 TcpExtTCPAbortOnClose 1 0.0 576 577If we run tcpdump on the server side, we could find the server sent a 578RST after we type Ctrl-C. 579 580TcpExtTCPAbortOnMemory and TcpExtTCPAbortOnTimeout 581----------------------------------------------- 582Below is an example which let the orphan socket count be higher than 583net.ipv4.tcp_max_orphans. 584Change tcp_max_orphans to a smaller value on client:: 585 586 sudo bash -c "echo 10 > /proc/sys/net/ipv4/tcp_max_orphans" 587 588Client code (create 64 connection to server):: 589 590 nstatuser@nstat-a:~$ cat client_orphan.py 591 import socket 592 import time 593 594 server = 'nstat-b' # server address 595 port = 9000 596 597 count = 64 598 599 connection_list = [] 600 601 for i in range(64): 602 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 603 s.connect((server, port)) 604 connection_list.append(s) 605 print("connection_count: %d" % len(connection_list)) 606 607 while True: 608 time.sleep(99999) 609 610Server code (accept 64 connection from client):: 611 612 nstatuser@nstat-b:~$ cat server_orphan.py 613 import socket 614 import time 615 616 port = 9000 617 count = 64 618 619 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 620 s.bind(('0.0.0.0', port)) 621 s.listen(count) 622 connection_list = [] 623 while True: 624 sock, addr = s.accept() 625 connection_list.append((sock, addr)) 626 print("connection_count: %d" % len(connection_list)) 627 628Run the python scripts on server and client. 629 630On server:: 631 632 python3 server_orphan.py 633 634On client:: 635 636 python3 client_orphan.py 637 638Run iptables on server:: 639 640 sudo iptables -A INPUT -i ens3 -p tcp --destination-port 9000 -j DROP 641 642Type Ctrl-C on client, stop client_orphan.py. 643 644Check TcpExtTCPAbortOnMemory on client:: 645 646 nstatuser@nstat-a:~$ nstat | grep -i abort 647 TcpExtTCPAbortOnMemory 54 0.0 648 649Check orphane socket count on client:: 650 651 nstatuser@nstat-a:~$ ss -s 652 Total: 131 (kernel 0) 653 TCP: 14 (estab 1, closed 0, orphaned 10, synrecv 0, timewait 0/0), ports 0 654 655 Transport Total IP IPv6 656 * 0 - - 657 RAW 1 0 1 658 UDP 1 1 0 659 TCP 14 13 1 660 INET 16 14 2 661 FRAG 0 0 0 662 663The explanation of the test: after run server_orphan.py and 664client_orphan.py, we set up 64 connections between server and 665client. Run the iptables command, the server will drop all packets from 666the client, type Ctrl-C on client_orphan.py, the system of the client 667would try to close these connections, and before they are closed 668gracefully, these connections became orphan sockets. As the iptables 669of the server blocked packets from the client, the server won't receive fin 670from the client, so all connection on clients would be stuck on FIN_WAIT_1 671stage, so they will keep as orphan sockets until timeout. We have echo 67210 to /proc/sys/net/ipv4/tcp_max_orphans, so the client system would 673only keep 10 orphan sockets, for all other orphan sockets, the client 674system sent RST for them and delete them. We have 64 connections, so 675the 'ss -s' command shows the system has 10 orphan sockets, and the 676value of TcpExtTCPAbortOnMemory was 54. 677 678An additional explanation about orphan socket count: You could find the 679exactly orphan socket count by the 'ss -s' command, but when kernel 680decide whither increases TcpExtTCPAbortOnMemory and sends RST, kernel 681doesn't always check the exactly orphan socket count. For increasing 682performance, kernel checks an approximate count firstly, if the 683approximate count is more than tcp_max_orphans, kernel checks the 684exact count again. So if the approximate count is less than 685tcp_max_orphans, but exactly count is more than tcp_max_orphans, you 686would find TcpExtTCPAbortOnMemory is not increased at all. If 687tcp_max_orphans is large enough, it won't occur, but if you decrease 688tcp_max_orphans to a small value like our test, you might find this 689issue. So in our test, the client set up 64 connections although the 690tcp_max_orphans is 10. If the client only set up 11 connections, we 691can't find the change of TcpExtTCPAbortOnMemory. 692 693Continue the previous test, we wait for several minutes. Because of the 694iptables on the server blocked the traffic, the server wouldn't receive 695fin, and all the client's orphan sockets would timeout on the 696FIN_WAIT_1 state finally. So we wait for a few minutes, we could find 69710 timeout on the client:: 698 699 nstatuser@nstat-a:~$ nstat | grep -i abort 700 TcpExtTCPAbortOnTimeout 10 0.0 701 702TcpExtTCPAbortOnLinger 703--------------------- 704The server side code:: 705 706 nstatuser@nstat-b:~$ cat server_linger.py 707 import socket 708 import time 709 710 port = 9000 711 712 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 713 s.bind(('0.0.0.0', port)) 714 s.listen(1) 715 sock, addr = s.accept() 716 while True: 717 time.sleep(9999999) 718 719The client side code:: 720 721 nstatuser@nstat-a:~$ cat client_linger.py 722 import socket 723 import struct 724 725 server = 'nstat-b' # server address 726 port = 9000 727 728 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 729 s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 10)) 730 s.setsockopt(socket.SOL_TCP, socket.TCP_LINGER2, struct.pack('i', -1)) 731 s.connect((server, port)) 732 s.close() 733 734Run server_linger.py on server:: 735 736 nstatuser@nstat-b:~$ python3 server_linger.py 737 738Run client_linger.py on client:: 739 740 nstatuser@nstat-a:~$ python3 client_linger.py 741 742After run client_linger.py, check the output of nstat:: 743 744 nstatuser@nstat-a:~$ nstat | grep -i abort 745 TcpExtTCPAbortOnLinger 1 0.0 746