180695946SStanislav Fomichev.. SPDX-License-Identifier: GPL-2.0 280695946SStanislav Fomichev 380695946SStanislav Fomichev============================ 480695946SStanislav FomichevBPF_PROG_TYPE_FLOW_DISSECTOR 580695946SStanislav Fomichev============================ 680695946SStanislav Fomichev 780695946SStanislav FomichevOverview 880695946SStanislav Fomichev======== 980695946SStanislav Fomichev 1080695946SStanislav FomichevFlow dissector is a routine that parses metadata out of the packets. It's 1180695946SStanislav Fomichevused in the various places in the networking subsystem (RFS, flow hash, etc). 1280695946SStanislav Fomichev 1380695946SStanislav FomichevBPF flow dissector is an attempt to reimplement C-based flow dissector logic 1480695946SStanislav Fomichevin BPF to gain all the benefits of BPF verifier (namely, limits on the 1580695946SStanislav Fomichevnumber of instructions and tail calls). 1680695946SStanislav Fomichev 1780695946SStanislav FomichevAPI 1880695946SStanislav Fomichev=== 1980695946SStanislav Fomichev 2080695946SStanislav FomichevBPF flow dissector programs operate on an ``__sk_buff``. However, only the 2180695946SStanislav Fomichevlimited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``. 2280695946SStanislav Fomichev``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input 2380695946SStanislav Fomichevand output arguments. 2480695946SStanislav Fomichev 2580695946SStanislav FomichevThe inputs are: 2680695946SStanislav Fomichev * ``nhoff`` - initial offset of the networking header 2780695946SStanislav Fomichev * ``thoff`` - initial offset of the transport header, initialized to nhoff 2880695946SStanislav Fomichev * ``n_proto`` - L3 protocol type, parsed out of L2 header 291ac6b126SStanislav Fomichev * ``flags`` - optional flags 3080695946SStanislav Fomichev 3180695946SStanislav FomichevFlow dissector BPF program should fill out the rest of the ``struct 3280695946SStanislav Fomichevbpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be 3380695946SStanislav Fomichevalso adjusted accordingly. 3480695946SStanislav Fomichev 3580695946SStanislav FomichevThe return code of the BPF program is either BPF_OK to indicate successful 3680695946SStanislav Fomichevdissection, or BPF_DROP to indicate parsing error. 3780695946SStanislav Fomichev 3880695946SStanislav Fomichev__sk_buff->data 3980695946SStanislav Fomichev=============== 4080695946SStanislav Fomichev 4180695946SStanislav FomichevIn the VLAN-less case, this is what the initial state of the BPF flow 4280695946SStanislav Fomichevdissector looks like:: 4380695946SStanislav Fomichev 4480695946SStanislav Fomichev +------+------+------------+-----------+ 4580695946SStanislav Fomichev | DMAC | SMAC | ETHER_TYPE | L3_HEADER | 4680695946SStanislav Fomichev +------+------+------------+-----------+ 4780695946SStanislav Fomichev ^ 4880695946SStanislav Fomichev | 4980695946SStanislav Fomichev +-- flow dissector starts here 5080695946SStanislav Fomichev 5180695946SStanislav Fomichev 5280695946SStanislav Fomichev.. code:: c 5380695946SStanislav Fomichev 5480695946SStanislav Fomichev skb->data + flow_keys->nhoff point to the first byte of L3_HEADER 5580695946SStanislav Fomichev flow_keys->thoff = nhoff 5680695946SStanislav Fomichev flow_keys->n_proto = ETHER_TYPE 5780695946SStanislav Fomichev 5880695946SStanislav FomichevIn case of VLAN, flow dissector can be called with the two different states. 5980695946SStanislav Fomichev 6080695946SStanislav FomichevPre-VLAN parsing:: 6180695946SStanislav Fomichev 6280695946SStanislav Fomichev +------+------+------+-----+-----------+-----------+ 6380695946SStanislav Fomichev | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | 6480695946SStanislav Fomichev +------+------+------+-----+-----------+-----------+ 6580695946SStanislav Fomichev ^ 6680695946SStanislav Fomichev | 6780695946SStanislav Fomichev +-- flow dissector starts here 6880695946SStanislav Fomichev 6980695946SStanislav Fomichev.. code:: c 7080695946SStanislav Fomichev 7180695946SStanislav Fomichev skb->data + flow_keys->nhoff point the to first byte of TCI 7280695946SStanislav Fomichev flow_keys->thoff = nhoff 7380695946SStanislav Fomichev flow_keys->n_proto = TPID 7480695946SStanislav Fomichev 7580695946SStanislav FomichevPlease note that TPID can be 802.1AD and, hence, BPF program would 7680695946SStanislav Fomichevhave to parse VLAN information twice for double tagged packets. 7780695946SStanislav Fomichev 7880695946SStanislav Fomichev 7980695946SStanislav FomichevPost-VLAN parsing:: 8080695946SStanislav Fomichev 8180695946SStanislav Fomichev +------+------+------+-----+-----------+-----------+ 8280695946SStanislav Fomichev | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | 8380695946SStanislav Fomichev +------+------+------+-----+-----------+-----------+ 8480695946SStanislav Fomichev ^ 8580695946SStanislav Fomichev | 8680695946SStanislav Fomichev +-- flow dissector starts here 8780695946SStanislav Fomichev 8880695946SStanislav Fomichev.. code:: c 8980695946SStanislav Fomichev 9080695946SStanislav Fomichev skb->data + flow_keys->nhoff point the to first byte of L3_HEADER 9180695946SStanislav Fomichev flow_keys->thoff = nhoff 9280695946SStanislav Fomichev flow_keys->n_proto = ETHER_TYPE 9380695946SStanislav Fomichev 9480695946SStanislav FomichevIn this case VLAN information has been processed before the flow dissector 9580695946SStanislav Fomichevand BPF flow dissector is not required to handle it. 9680695946SStanislav Fomichev 9780695946SStanislav Fomichev 9880695946SStanislav FomichevThe takeaway here is as follows: BPF flow dissector program can be called with 9980695946SStanislav Fomichevthe optional VLAN header and should gracefully handle both cases: when single 10080695946SStanislav Fomichevor double VLAN is present and when it is not present. The same program 10180695946SStanislav Fomichevcan be called for both cases and would have to be written carefully to 10280695946SStanislav Fomichevhandle both cases. 10380695946SStanislav Fomichev 10480695946SStanislav Fomichev 1051ac6b126SStanislav FomichevFlags 1061ac6b126SStanislav Fomichev===== 1071ac6b126SStanislav Fomichev 1081ac6b126SStanislav Fomichev``flow_keys->flags`` might contain optional input flags that work as follows: 1091ac6b126SStanislav Fomichev 1101ac6b126SStanislav Fomichev* ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to 1111ac6b126SStanislav Fomichev continue parsing first fragment; the default expected behavior is that 1121ac6b126SStanislav Fomichev flow dissector returns as soon as it finds out that the packet is fragmented; 1131ac6b126SStanislav Fomichev used by ``eth_get_headlen`` to estimate length of all headers for GRO. 1141ac6b126SStanislav Fomichev* ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to 1151ac6b126SStanislav Fomichev stop parsing as soon as it reaches IPv6 flow label; used by 1161ac6b126SStanislav Fomichev ``___skb_get_hash`` and ``__skb_get_hash_symmetric`` to get flow hash. 1171ac6b126SStanislav Fomichev* ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop 1181ac6b126SStanislav Fomichev parsing as soon as it reaches encapsulated headers; used by routing 1191ac6b126SStanislav Fomichev infrastructure. 1201ac6b126SStanislav Fomichev 1211ac6b126SStanislav Fomichev 12280695946SStanislav FomichevReference Implementation 12380695946SStanislav Fomichev======================== 12480695946SStanislav Fomichev 12580695946SStanislav FomichevSee ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference 12680695946SStanislav Fomichevimplementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]`` 12780695946SStanislav Fomichevfor the loader. bpftool can be used to load BPF flow dissector program as well. 12880695946SStanislav Fomichev 12980695946SStanislav FomichevThe reference implementation is organized as follows: 13080695946SStanislav Fomichev * ``jmp_table`` map that contains sub-programs for each supported L3 protocol 13180695946SStanislav Fomichev * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and 13280695946SStanislav Fomichev does ``bpf_tail_call`` to the appropriate L3 handler 13380695946SStanislav Fomichev 13480695946SStanislav FomichevSince BPF at this point doesn't support looping (or any jumping back), 13580695946SStanislav Fomichevjmp_table is used instead to handle multiple levels of encapsulation (and 13680695946SStanislav FomichevIPv6 options). 13780695946SStanislav Fomichev 13880695946SStanislav Fomichev 13980695946SStanislav FomichevCurrent Limitations 14080695946SStanislav Fomichev=================== 14180695946SStanislav FomichevBPF flow dissector doesn't support exporting all the metadata that in-kernel 14280695946SStanislav FomichevC-based implementation can export. Notable example is single VLAN (802.1Q) 14380695946SStanislav Fomichevand double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys`` 14480695946SStanislav Fomichevfor a set of information that's currently can be exported from the BPF context. 145*a11c397cSStanislav Fomichev 146*a11c397cSStanislav FomichevWhen BPF flow dissector is attached to the root network namespace (machine-wide 147*a11c397cSStanislav Fomichevpolicy), users can't override it in their child network namespaces. 148