1================================ 2Documentation for /proc/sys/net/ 3================================ 4 5Copyright 6 7Copyright (c) 1999 8 9 - Terrehon Bowden <terrehon@pacbell.net> 10 - Bodo Bauer <bb@ricochet.net> 11 12Copyright (c) 2000 13 14 - Jorge Nerin <comandante@zaralinux.com> 15 16Copyright (c) 2009 17 18 - Shen Feng <shen@cn.fujitsu.com> 19 20For general info and legal blurb, please look in index.rst. 21 22------------------------------------------------------------------------------ 23 24This file contains the documentation for the sysctl files in 25/proc/sys/net 26 27The interface to the networking parts of the kernel is located in 28/proc/sys/net. The following table shows all possible subdirectories. You may 29see only some of them, depending on your kernel's configuration. 30 31 32Table : Subdirectories in /proc/sys/net 33 34 ========= =================== = ========== =================== 35 Directory Content Directory Content 36 ========= =================== = ========== =================== 37 802 E802 protocol mptcp Multipath TCP 38 appletalk Appletalk protocol netfilter Network Filter 39 ax25 AX25 netrom NET/ROM 40 bridge Bridging rose X.25 PLP layer 41 core General parameter tipc TIPC 42 ethernet Ethernet protocol unix Unix domain sockets 43 ipv4 IP version 4 x25 X.25 protocol 44 ipv6 IP version 6 45 ========= =================== = ========== =================== 46 471. /proc/sys/net/core - Network core options 48============================================ 49 50bpf_jit_enable 51-------------- 52 53This enables the BPF Just in Time (JIT) compiler. BPF is a flexible 54and efficient infrastructure allowing to execute bytecode at various 55hook points. It is used in a number of Linux kernel subsystems such 56as networking (e.g. XDP, tc), tracing (e.g. kprobes, uprobes, tracepoints) 57and security (e.g. seccomp). LLVM has a BPF back end that can compile 58restricted C into a sequence of BPF instructions. After program load 59through bpf(2) and passing a verifier in the kernel, a JIT will then 60translate these BPF proglets into native CPU instructions. There are 61two flavors of JITs, the newer eBPF JIT currently supported on: 62 63 - x86_64 64 - x86_32 65 - arm64 66 - arm32 67 - ppc64 68 - ppc32 69 - sparc64 70 - mips64 71 - s390x 72 - riscv64 73 - riscv32 74 75And the older cBPF JIT supported on the following archs: 76 77 - mips 78 - sparc 79 80eBPF JITs are a superset of cBPF JITs, meaning the kernel will 81migrate cBPF instructions into eBPF instructions and then JIT 82compile them transparently. Older cBPF JITs can only translate 83tcpdump filters, seccomp rules, etc, but not mentioned eBPF 84programs loaded through bpf(2). 85 86Values: 87 88 - 0 - disable the JIT (default value) 89 - 1 - enable the JIT 90 - 2 - enable the JIT and ask the compiler to emit traces on kernel log. 91 92bpf_jit_harden 93-------------- 94 95This enables hardening for the BPF JIT compiler. Supported are eBPF 96JIT backends. Enabling hardening trades off performance, but can 97mitigate JIT spraying. 98 99Values: 100 101 - 0 - disable JIT hardening (default value) 102 - 1 - enable JIT hardening for unprivileged users only 103 - 2 - enable JIT hardening for all users 104 105bpf_jit_kallsyms 106---------------- 107 108When BPF JIT compiler is enabled, then compiled images are unknown 109addresses to the kernel, meaning they neither show up in traces nor 110in /proc/kallsyms. This enables export of these addresses, which can 111be used for debugging/tracing. If bpf_jit_harden is enabled, this 112feature is disabled. 113 114Values : 115 116 - 0 - disable JIT kallsyms export (default value) 117 - 1 - enable JIT kallsyms export for privileged users only 118 119bpf_jit_limit 120------------- 121 122This enforces a global limit for memory allocations to the BPF JIT 123compiler in order to reject unprivileged JIT requests once it has 124been surpassed. bpf_jit_limit contains the value of the global limit 125in bytes. 126 127dev_weight 128---------- 129 130The maximum number of packets that kernel can handle on a NAPI interrupt, 131it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware 132aggregated packet is counted as one packet in this context. 133 134Default: 64 135 136dev_weight_rx_bias 137------------------ 138 139RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function 140of the driver for the per softirq cycle netdev_budget. This parameter influences 141the proportion of the configured netdev_budget that is spent on RPS based packet 142processing during RX softirq cycles. It is further meant for making current 143dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack. 144(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based 145on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias). 146 147Default: 1 148 149dev_weight_tx_bias 150------------------ 151 152Scales the maximum number of packets that can be processed during a TX softirq cycle. 153Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric 154net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog. 155 156Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias). 157 158Default: 1 159 160default_qdisc 161------------- 162 163The default queuing discipline to use for network devices. This allows 164overriding the default of pfifo_fast with an alternative. Since the default 165queuing discipline is created without additional parameters so is best suited 166to queuing disciplines that work well without configuration like stochastic 167fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use 168queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin 169which require setting up classes and bandwidths. Note that physical multiqueue 170interfaces still use mq as root qdisc, which in turn uses this default for its 171leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead 172default to noqueue. 173 174Default: pfifo_fast 175 176busy_read 177--------- 178 179Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL) 180Approximate time in us to busy loop waiting for packets on the device queue. 181This sets the default value of the SO_BUSY_POLL socket option. 182Can be set or overridden per socket by setting socket option SO_BUSY_POLL, 183which is the preferred method of enabling. If you need to enable the feature 184globally via sysctl, a value of 50 is recommended. 185 186Will increase power usage. 187 188Default: 0 (off) 189 190busy_poll 191---------------- 192Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL) 193Approximate time in us to busy loop waiting for events. 194Recommended value depends on the number of sockets you poll on. 195For several sockets 50, for several hundreds 100. 196For more than that you probably want to use epoll. 197Note that only sockets with SO_BUSY_POLL set will be busy polled, 198so you want to either selectively set SO_BUSY_POLL on those sockets or set 199sysctl.net.busy_read globally. 200 201Will increase power usage. 202 203Default: 0 (off) 204 205rmem_default 206------------ 207 208The default setting of the socket receive buffer in bytes. 209 210rmem_max 211-------- 212 213The maximum receive socket buffer size in bytes. 214 215tstamp_allow_data 216----------------- 217Allow processes to receive tx timestamps looped together with the original 218packet contents. If disabled, transmit timestamp requests from unprivileged 219processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set. 220 221Default: 1 (on) 222 223 224wmem_default 225------------ 226 227The default setting (in bytes) of the socket send buffer. 228 229wmem_max 230-------- 231 232The maximum send socket buffer size in bytes. 233 234message_burst and message_cost 235------------------------------ 236 237These parameters are used to limit the warning messages written to the kernel 238log from the networking code. They enforce a rate limit to make a 239denial-of-service attack impossible. A higher message_cost factor, results in 240fewer messages that will be written. Message_burst controls when messages will 241be dropped. The default settings limit warning messages to one every five 242seconds. 243 244warnings 245-------- 246 247This sysctl is now unused. 248 249This was used to control console messages from the networking stack that 250occur because of problems on the network like duplicate address or bad 251checksums. 252 253These messages are now emitted at KERN_DEBUG and can generally be enabled 254and controlled by the dynamic_debug facility. 255 256netdev_budget 257------------- 258 259Maximum number of packets taken from all interfaces in one polling cycle (NAPI 260poll). In one polling cycle interfaces which are registered to polling are 261probed in a round-robin manner. Also, a polling cycle may not exceed 262netdev_budget_usecs microseconds, even if netdev_budget has not been 263exhausted. 264 265netdev_budget_usecs 266--------------------- 267 268Maximum number of microseconds in one NAPI polling cycle. Polling 269will exit when either netdev_budget_usecs have elapsed during the 270poll cycle or the number of packets processed reaches netdev_budget. 271 272netdev_max_backlog 273------------------ 274 275Maximum number of packets, queued on the INPUT side, when the interface 276receives packets faster than kernel can process them. 277 278netdev_rss_key 279-------------- 280 281RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is 282randomly generated. 283Some user space might need to gather its content even if drivers do not 284provide ethtool -x support yet. 285 286:: 287 288 myhost:~# cat /proc/sys/net/core/netdev_rss_key 289 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total) 290 291File contains nul bytes if no driver ever called netdev_rss_key_fill() function. 292 293Note: 294 /proc/sys/net/core/netdev_rss_key contains 52 bytes of key, 295 but most drivers only use 40 bytes of it. 296 297:: 298 299 myhost:~# ethtool -x eth0 300 RX flow hash indirection table for eth0 with 8 RX ring(s): 301 0: 0 1 2 3 4 5 6 7 302 RSS hash key: 303 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89 304 305netdev_tstamp_prequeue 306---------------------- 307 308If set to 0, RX packet timestamps can be sampled after RPS processing, when 309the target CPU processes packets. It might give some delay on timestamps, but 310permit to distribute the load on several cpus. 311 312If set to 1 (default), timestamps are sampled as soon as possible, before 313queueing. 314 315netdev_unregister_timeout_secs 316------------------------------ 317 318Unregister network device timeout in seconds. 319This option controls the timeout (in seconds) used to issue a warning while 320waiting for a network device refcount to drop to 0 during device 321unregistration. A lower value may be useful during bisection to detect 322a leaked reference faster. A larger value may be useful to prevent false 323warnings on slow/loaded systems. 324Default value is 10, minimum 1, maximum 3600. 325 326skb_defer_max 327------------- 328 329Max size (in skbs) of the per-cpu list of skbs being freed 330by the cpu which allocated them. Used by TCP stack so far. 331 332Default: 64 333 334optmem_max 335---------- 336 337Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence 338of struct cmsghdr structures with appended data. 339 340fb_tunnels_only_for_init_net 341---------------------------- 342 343Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0, 344sit0, ip6tnl0, ip6gre0) are automatically created. There are 3 possibilities 345(a) value = 0; respective fallback tunnels are created when module is 346loaded in every net namespaces (backward compatible behavior). 347(b) value = 1; [kcmd value: initns] respective fallback tunnels are 348created only in init net namespace and every other net namespace will 349not have them. 350(c) value = 2; [kcmd value: none] fallback tunnels are not created 351when a module is loaded in any of the net namespace. Setting value to 352"2" is pointless after boot if these modules are built-in, so there is 353a kernel command-line option that can change this default. Please refer to 354Documentation/admin-guide/kernel-parameters.txt for additional details. 355 356Not creating fallback tunnels gives control to userspace to create 357whatever is needed only and avoid creating devices which are redundant. 358 359Default : 0 (for compatibility reasons) 360 361devconf_inherit_init_net 362------------------------ 363 364Controls if a new network namespace should inherit all current 365settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By 366default, we keep the current behavior: for IPv4 we inherit all current 367settings from init_net and for IPv6 we reset all settings to default. 368 369If set to 1, both IPv4 and IPv6 settings are forced to inherit from 370current ones in init_net. If set to 2, both IPv4 and IPv6 settings are 371forced to reset to their default values. If set to 3, both IPv4 and IPv6 372settings are forced to inherit from current ones in the netns where this 373new netns has been created. 374 375Default : 0 (for compatibility reasons) 376 377txrehash 378-------- 379 380Controls default hash rethink behaviour on listening socket when SO_TXREHASH 381option is set to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt). 382 383If set to 1 (default), hash rethink is performed on listening socket. 384If set to 0, hash rethink is not performed. 385 386gro_normal_batch 387---------------- 388 389Maximum number of the segments to batch up on output of GRO. When a packet 390exits GRO, either as a coalesced superframe or as an original packet which 391GRO has decided not to coalesce, it is placed on a per-NAPI list. This 392list is then passed to the stack when the number of segments reaches the 393gro_normal_batch limit. 394 395high_order_alloc_disable 396------------------------ 397 398By default the allocator for page frags tries to use high order pages (order-3 399on x86). While the default behavior gives good results in most cases, some users 400might have hit a contention in page allocations/freeing. This was especially 401true on older kernels (< 5.14) when high-order pages were not stored on per-cpu 402lists. This allows to opt-in for order-0 allocation instead but is now mostly of 403historical importance. 404 405Default: 0 406 4072. /proc/sys/net/unix - Parameters for Unix domain sockets 408---------------------------------------------------------- 409 410There is only one file in this directory. 411unix_dgram_qlen limits the max number of datagrams queued in Unix domain 412socket's buffer. It will not take effect unless PF_UNIX flag is specified. 413 414 4153. /proc/sys/net/ipv4 - IPV4 settings 416------------------------------------- 417Please see: Documentation/networking/ip-sysctl.rst and 418Documentation/admin-guide/sysctl/net.rst for descriptions of these entries. 419 420 4214. Appletalk 422------------ 423 424The /proc/sys/net/appletalk directory holds the Appletalk configuration data 425when Appletalk is loaded. The configurable parameters are: 426 427aarp-expiry-time 428---------------- 429 430The amount of time we keep an ARP entry before expiring it. Used to age out 431old hosts. 432 433aarp-resolve-time 434----------------- 435 436The amount of time we will spend trying to resolve an Appletalk address. 437 438aarp-retransmit-limit 439--------------------- 440 441The number of times we will retransmit a query before giving up. 442 443aarp-tick-time 444-------------- 445 446Controls the rate at which expires are checked. 447 448The directory /proc/net/appletalk holds the list of active Appletalk sockets 449on a machine. 450 451The fields indicate the DDP type, the local address (in network:node format) 452the remote address, the size of the transmit pending queue, the size of the 453received queue (bytes waiting for applications to read) the state and the uid 454owning the socket. 455 456/proc/net/atalk_iface lists all the interfaces configured for appletalk.It 457shows the name of the interface, its Appletalk address, the network range on 458that address (or network number for phase 1 networks), and the status of the 459interface. 460 461/proc/net/atalk_route lists each known network route. It lists the target 462(network) that the route leads to, the router (may be directly connected), the 463route flags, and the device the route is using. 464 4655. TIPC 466------- 467 468tipc_rmem 469--------- 470 471The TIPC protocol now has a tunable for the receive memory, similar to the 472tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max) 473 474:: 475 476 # cat /proc/sys/net/tipc/tipc_rmem 477 4252725 34021800 68043600 478 # 479 480The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values 481are scaled (shifted) versions of that same value. Note that the min value 482is not at this point in time used in any meaningful way, but the triplet is 483preserved in order to be consistent with things like tcp_rmem. 484 485named_timeout 486------------- 487 488TIPC name table updates are distributed asynchronously in a cluster, without 489any form of transaction handling. This means that different race scenarios are 490possible. One such is that a name withdrawal sent out by one node and received 491by another node may arrive after a second, overlapping name publication already 492has been accepted from a third node, although the conflicting updates 493originally may have been issued in the correct sequential order. 494If named_timeout is nonzero, failed topology updates will be placed on a defer 495queue until another event arrives that clears the error, or until the timeout 496expires. Value is in milliseconds. 497