1.. SPDX-License-Identifier: GPL-2.0 2 3==================================== 4Netfilter's flowtable infrastructure 5==================================== 6 7This documentation describes the software flowtable infrastructure available in 8Netfilter since Linux kernel 4.16. 9 10Overview 11-------- 12 13Initial packets follow the classic forwarding path, once the flow enters the 14established state according to the conntrack semantics (ie. we have seen traffic 15in both directions), then you can decide to offload the flow to the flowtable 16from the forward chain via the 'flow offload' action available in nftables. 17 18Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the 19output netdevice via neigh_xmit(), hence, they bypass the classic forwarding 20path (the visible effect is that you do not see these packets from any of the 21netfilter hooks coming after the ingress). In case of flowtable miss, the packet 22follows the classic forward path. 23 24The flowtable uses a resizable hashtable, lookups are based on the following 257-tuple selectors: source, destination, layer 3 and layer 4 protocols, source 26and destination ports and the input interface (useful in case there are several 27conntrack zones in place). 28 29Flowtables are populated via the 'flow offload' nftables action, so the user can 30selectively specify what flows are placed into the flow table. Hence, packets 31follow the classic forwarding path unless the user explicitly instruct packets 32to use this new alternative forwarding path via nftables policy. 33 34This is represented in Fig.1, which describes the classic forwarding path 35including the Netfilter hooks and the flowtable fastpath bypass. 36 37:: 38 39 userspace process 40 ^ | 41 | | 42 _____|____ ____\/___ 43 / \ / \ 44 | input | | output | 45 \__________/ \_________/ 46 ^ | 47 | | 48 _________ __________ --------- _____\/_____ 49 / \ / \ |Routing | / \ 50 --> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit 51 \_________/ \__________/ ---------- \____________/ ^ 52 | ^ | ^ | 53 flowtable | ____\/___ | | 54 | | / \ | | 55 __\/___ | | forward |------------ | 56 |-----| | \_________/ | 57 |-----| | 'flow offload' rule | 58 |-----| | adds entry to | 59 |_____| | flowtable | 60 | | | 61 / \ | | 62 /hit\_no_| | 63 \ ? / | 64 \ / | 65 |__yes_________________fastpath bypass ____________________________| 66 67 Fig.1 Netfilter hooks and flowtable interactions 68 69The flowtable entry also stores the NAT configuration, so all packets are 70mangled according to the NAT policy that matches the initial packets that went 71through the classic forwarding path. The TTL is decremented before calling 72neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding 73path given that the transport selectors are missing, therefore flowtable lookup 74is not possible. 75 76Example configuration 77--------------------- 78 79Enabling the flowtable bypass is relatively easy, you only need to create a 80flowtable and add one rule to your forward chain:: 81 82 table inet x { 83 flowtable f { 84 hook ingress priority 0; devices = { eth0, eth1 }; 85 } 86 chain y { 87 type filter hook forward priority 0; policy accept; 88 ip protocol tcp flow offload @f 89 counter packets 0 bytes 0 90 } 91 } 92 93This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1 94netdevices. You can create as many flowtables as you want in case you need to 95perform resource partitioning. The flowtable priority defines the order in which 96hooks are run in the pipeline, this is convenient in case you already have a 97nftables ingress chain (make sure the flowtable priority is smaller than the 98nftables ingress chain hence the flowtable runs before in the pipeline). 99 100The 'flow offload' action from the forward chain 'y' adds an entry to the 101flowtable for the TCP syn-ack packet coming in the reply direction. Once the 102flow is offloaded, you will observe that the counter rule in the example above 103does not get updated for the packets that are being forwarded through the 104forwarding bypass. 105 106More reading 107------------ 108 109This documentation is based on the LWN.net articles [1]_\ [2]_. Rafal Milecki 110also made a very complete and comprehensive summary called "A state of network 111acceleration" that describes how things were before this infrastructure was 112mailined [3]_ and it also makes a rough summary of this work [4]_. 113 114.. [1] https://lwn.net/Articles/738214/ 115.. [2] https://lwn.net/Articles/742164/ 116.. [3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html 117.. [4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html 118