1.. SPDX-License-Identifier: GPL-2.0
2
3============================
4BPF_PROG_TYPE_FLOW_DISSECTOR
5============================
6
7Overview
8========
9
10Flow dissector is a routine that parses metadata out of the packets. It's
11used in the various places in the networking subsystem (RFS, flow hash, etc).
12
13BPF flow dissector is an attempt to reimplement C-based flow dissector logic
14in BPF to gain all the benefits of BPF verifier (namely, limits on the
15number of instructions and tail calls).
16
17API
18===
19
20BPF flow dissector programs operate on an ``__sk_buff``. However, only the
21limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
22``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
23and output arguments.
24
25The inputs are:
26  * ``nhoff`` - initial offset of the networking header
27  * ``thoff`` - initial offset of the transport header, initialized to nhoff
28  * ``n_proto`` - L3 protocol type, parsed out of L2 header
29
30Flow dissector BPF program should fill out the rest of the ``struct
31bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
32also adjusted accordingly.
33
34The return code of the BPF program is either BPF_OK to indicate successful
35dissection, or BPF_DROP to indicate parsing error.
36
37__sk_buff->data
38===============
39
40In the VLAN-less case, this is what the initial state of the BPF flow
41dissector looks like::
42
43  +------+------+------------+-----------+
44  | DMAC | SMAC | ETHER_TYPE | L3_HEADER |
45  +------+------+------------+-----------+
46                              ^
47                              |
48                              +-- flow dissector starts here
49
50
51.. code:: c
52
53  skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
54  flow_keys->thoff = nhoff
55  flow_keys->n_proto = ETHER_TYPE
56
57In case of VLAN, flow dissector can be called with the two different states.
58
59Pre-VLAN parsing::
60
61  +------+------+------+-----+-----------+-----------+
62  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
63  +------+------+------+-----+-----------+-----------+
64                        ^
65                        |
66                        +-- flow dissector starts here
67
68.. code:: c
69
70  skb->data + flow_keys->nhoff point the to first byte of TCI
71  flow_keys->thoff = nhoff
72  flow_keys->n_proto = TPID
73
74Please note that TPID can be 802.1AD and, hence, BPF program would
75have to parse VLAN information twice for double tagged packets.
76
77
78Post-VLAN parsing::
79
80  +------+------+------+-----+-----------+-----------+
81  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
82  +------+------+------+-----+-----------+-----------+
83                                          ^
84                                          |
85                                          +-- flow dissector starts here
86
87.. code:: c
88
89  skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
90  flow_keys->thoff = nhoff
91  flow_keys->n_proto = ETHER_TYPE
92
93In this case VLAN information has been processed before the flow dissector
94and BPF flow dissector is not required to handle it.
95
96
97The takeaway here is as follows: BPF flow dissector program can be called with
98the optional VLAN header and should gracefully handle both cases: when single
99or double VLAN is present and when it is not present. The same program
100can be called for both cases and would have to be written carefully to
101handle both cases.
102
103
104Reference Implementation
105========================
106
107See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
108implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
109for the loader. bpftool can be used to load BPF flow dissector program as well.
110
111The reference implementation is organized as follows:
112  * ``jmp_table`` map that contains sub-programs for each supported L3 protocol
113  * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
114    does ``bpf_tail_call`` to the appropriate L3 handler
115
116Since BPF at this point doesn't support looping (or any jumping back),
117jmp_table is used instead to handle multiple levels of encapsulation (and
118IPv6 options).
119
120
121Current Limitations
122===================
123BPF flow dissector doesn't support exporting all the metadata that in-kernel
124C-based implementation can export. Notable example is single VLAN (802.1Q)
125and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
126for a set of information that's currently can be exported from the BPF context.
127