xref: /openbmc/linux/Documentation/bpf/verifier.rst (revision 9a87ffc99ec8eb8d35eed7c4f816d75f5cc9662e)
188691e9eSChristoph Hellwig
288691e9eSChristoph Hellwig=============
388691e9eSChristoph HellwigeBPF verifier
488691e9eSChristoph Hellwig=============
588691e9eSChristoph Hellwig
688691e9eSChristoph HellwigThe safety of the eBPF program is determined in two steps.
788691e9eSChristoph Hellwig
888691e9eSChristoph HellwigFirst step does DAG check to disallow loops and other CFG validation.
988691e9eSChristoph HellwigIn particular it will detect programs that have unreachable instructions.
1088691e9eSChristoph Hellwig(though classic BPF checker allows them)
1188691e9eSChristoph Hellwig
1288691e9eSChristoph HellwigSecond step starts from the first insn and descends all possible paths.
1388691e9eSChristoph HellwigIt simulates execution of every insn and observes the state change of
1488691e9eSChristoph Hellwigregisters and stack.
1588691e9eSChristoph Hellwig
1688691e9eSChristoph HellwigAt the start of the program the register R1 contains a pointer to context
1788691e9eSChristoph Hellwigand has type PTR_TO_CTX.
1888691e9eSChristoph HellwigIf verifier sees an insn that does R2=R1, then R2 has now type
1988691e9eSChristoph HellwigPTR_TO_CTX as well and can be used on the right hand side of expression.
2088691e9eSChristoph HellwigIf R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=SCALAR_VALUE,
2188691e9eSChristoph Hellwigsince addition of two valid pointers makes invalid pointer.
2288691e9eSChristoph Hellwig(In 'secure' mode verifier will reject any type of pointer arithmetic to make
2388691e9eSChristoph Hellwigsure that kernel addresses don't leak to unprivileged users)
2488691e9eSChristoph Hellwig
2588691e9eSChristoph HellwigIf register was never written to, it's not readable::
2688691e9eSChristoph Hellwig
2788691e9eSChristoph Hellwig  bpf_mov R0 = R2
2888691e9eSChristoph Hellwig  bpf_exit
2988691e9eSChristoph Hellwig
3088691e9eSChristoph Hellwigwill be rejected, since R2 is unreadable at the start of the program.
3188691e9eSChristoph Hellwig
3288691e9eSChristoph HellwigAfter kernel function call, R1-R5 are reset to unreadable and
3388691e9eSChristoph HellwigR0 has a return type of the function.
3488691e9eSChristoph Hellwig
3588691e9eSChristoph HellwigSince R6-R9 are callee saved, their state is preserved across the call.
3688691e9eSChristoph Hellwig
3788691e9eSChristoph Hellwig::
3888691e9eSChristoph Hellwig
3988691e9eSChristoph Hellwig  bpf_mov R6 = 1
4088691e9eSChristoph Hellwig  bpf_call foo
4188691e9eSChristoph Hellwig  bpf_mov R0 = R6
4288691e9eSChristoph Hellwig  bpf_exit
4388691e9eSChristoph Hellwig
4488691e9eSChristoph Hellwigis a correct program. If there was R1 instead of R6, it would have
4588691e9eSChristoph Hellwigbeen rejected.
4688691e9eSChristoph Hellwig
4788691e9eSChristoph Hellwigload/store instructions are allowed only with registers of valid types, which
4888691e9eSChristoph Hellwigare PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked.
4988691e9eSChristoph HellwigFor example::
5088691e9eSChristoph Hellwig
5188691e9eSChristoph Hellwig bpf_mov R1 = 1
5288691e9eSChristoph Hellwig bpf_mov R2 = 2
5388691e9eSChristoph Hellwig bpf_xadd *(u32 *)(R1 + 3) += R2
5488691e9eSChristoph Hellwig bpf_exit
5588691e9eSChristoph Hellwig
5688691e9eSChristoph Hellwigwill be rejected, since R1 doesn't have a valid pointer type at the time of
5788691e9eSChristoph Hellwigexecution of instruction bpf_xadd.
5888691e9eSChristoph Hellwig
5988691e9eSChristoph HellwigAt the start R1 type is PTR_TO_CTX (a pointer to generic ``struct bpf_context``)
6088691e9eSChristoph HellwigA callback is used to customize verifier to restrict eBPF program access to only
6188691e9eSChristoph Hellwigcertain fields within ctx structure with specified size and alignment.
6288691e9eSChristoph Hellwig
6388691e9eSChristoph HellwigFor example, the following insn::
6488691e9eSChristoph Hellwig
6588691e9eSChristoph Hellwig  bpf_ld R0 = *(u32 *)(R6 + 8)
6688691e9eSChristoph Hellwig
6788691e9eSChristoph Hellwigintends to load a word from address R6 + 8 and store it into R0
6888691e9eSChristoph HellwigIf R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
6988691e9eSChristoph Hellwigthat offset 8 of size 4 bytes can be accessed for reading, otherwise
7088691e9eSChristoph Hellwigthe verifier will reject the program.
7188691e9eSChristoph HellwigIf R6=PTR_TO_STACK, then access should be aligned and be within
7288691e9eSChristoph Hellwigstack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8,
7388691e9eSChristoph Hellwigso it will fail verification, since it's out of bounds.
7488691e9eSChristoph Hellwig
7588691e9eSChristoph HellwigThe verifier will allow eBPF program to read data from stack only after
7688691e9eSChristoph Hellwigit wrote into it.
7788691e9eSChristoph Hellwig
7888691e9eSChristoph HellwigClassic BPF verifier does similar check with M[0-15] memory slots.
7988691e9eSChristoph HellwigFor example::
8088691e9eSChristoph Hellwig
8188691e9eSChristoph Hellwig  bpf_ld R0 = *(u32 *)(R10 - 4)
8288691e9eSChristoph Hellwig  bpf_exit
8388691e9eSChristoph Hellwig
8488691e9eSChristoph Hellwigis invalid program.
8588691e9eSChristoph HellwigThough R10 is correct read-only register and has type PTR_TO_STACK
8688691e9eSChristoph Hellwigand R10 - 4 is within stack bounds, there were no stores into that location.
8788691e9eSChristoph Hellwig
8888691e9eSChristoph HellwigPointer register spill/fill is tracked as well, since four (R6-R9)
8988691e9eSChristoph Hellwigcallee saved registers may not be enough for some programs.
9088691e9eSChristoph Hellwig
9188691e9eSChristoph HellwigAllowed function calls are customized with bpf_verifier_ops->get_func_proto()
9288691e9eSChristoph HellwigThe eBPF verifier will check that registers match argument constraints.
9388691e9eSChristoph HellwigAfter the call register R0 will be set to return type of the function.
9488691e9eSChristoph Hellwig
9588691e9eSChristoph HellwigFunction calls is a main mechanism to extend functionality of eBPF programs.
9688691e9eSChristoph HellwigSocket filters may let programs to call one set of functions, whereas tracing
9788691e9eSChristoph Hellwigfilters may allow completely different set.
9888691e9eSChristoph Hellwig
9988691e9eSChristoph HellwigIf a function made accessible to eBPF program, it needs to be thought through
10088691e9eSChristoph Hellwigfrom safety point of view. The verifier will guarantee that the function is
10188691e9eSChristoph Hellwigcalled with valid arguments.
10288691e9eSChristoph Hellwig
10388691e9eSChristoph Hellwigseccomp vs socket filters have different security restrictions for classic BPF.
10488691e9eSChristoph HellwigSeccomp solves this by two stage verifier: classic BPF verifier is followed
10588691e9eSChristoph Hellwigby seccomp verifier. In case of eBPF one configurable verifier is shared for
10688691e9eSChristoph Hellwigall use cases.
10788691e9eSChristoph Hellwig
10888691e9eSChristoph HellwigSee details of eBPF verifier in kernel/bpf/verifier.c
10988691e9eSChristoph Hellwig
11088691e9eSChristoph HellwigRegister value tracking
11188691e9eSChristoph Hellwig=======================
11288691e9eSChristoph Hellwig
11388691e9eSChristoph HellwigIn order to determine the safety of an eBPF program, the verifier must track
11488691e9eSChristoph Hellwigthe range of possible values in each register and also in each stack slot.
11588691e9eSChristoph HellwigThis is done with ``struct bpf_reg_state``, defined in include/linux/
11688691e9eSChristoph Hellwigbpf_verifier.h, which unifies tracking of scalar and pointer values.  Each
11788691e9eSChristoph Hellwigregister state has a type, which is either NOT_INIT (the register has not been
11888691e9eSChristoph Hellwigwritten to), SCALAR_VALUE (some value which is not usable as a pointer), or a
11988691e9eSChristoph Hellwigpointer type.  The types of pointers describe their base, as follows:
12088691e9eSChristoph Hellwig
12188691e9eSChristoph Hellwig
12288691e9eSChristoph Hellwig    PTR_TO_CTX
12388691e9eSChristoph Hellwig			Pointer to bpf_context.
12488691e9eSChristoph Hellwig    CONST_PTR_TO_MAP
12588691e9eSChristoph Hellwig			Pointer to struct bpf_map.  "Const" because arithmetic
12688691e9eSChristoph Hellwig			on these pointers is forbidden.
12788691e9eSChristoph Hellwig    PTR_TO_MAP_VALUE
12888691e9eSChristoph Hellwig			Pointer to the value stored in a map element.
12988691e9eSChristoph Hellwig    PTR_TO_MAP_VALUE_OR_NULL
13088691e9eSChristoph Hellwig			Either a pointer to a map value, or NULL; map accesses
13188691e9eSChristoph Hellwig			(see maps.rst) return this type, which becomes a
13288691e9eSChristoph Hellwig			PTR_TO_MAP_VALUE when checked != NULL. Arithmetic on
13388691e9eSChristoph Hellwig			these pointers is forbidden.
13488691e9eSChristoph Hellwig    PTR_TO_STACK
13588691e9eSChristoph Hellwig			Frame pointer.
13688691e9eSChristoph Hellwig    PTR_TO_PACKET
13788691e9eSChristoph Hellwig			skb->data.
13888691e9eSChristoph Hellwig    PTR_TO_PACKET_END
13988691e9eSChristoph Hellwig			skb->data + headlen; arithmetic forbidden.
14088691e9eSChristoph Hellwig    PTR_TO_SOCKET
14188691e9eSChristoph Hellwig			Pointer to struct bpf_sock_ops, implicitly refcounted.
14288691e9eSChristoph Hellwig    PTR_TO_SOCKET_OR_NULL
14388691e9eSChristoph Hellwig			Either a pointer to a socket, or NULL; socket lookup
14488691e9eSChristoph Hellwig			returns this type, which becomes a PTR_TO_SOCKET when
14588691e9eSChristoph Hellwig			checked != NULL. PTR_TO_SOCKET is reference-counted,
14688691e9eSChristoph Hellwig			so programs must release the reference through the
14788691e9eSChristoph Hellwig			socket release function before the end of the program.
14888691e9eSChristoph Hellwig			Arithmetic on these pointers is forbidden.
14988691e9eSChristoph Hellwig
15088691e9eSChristoph HellwigHowever, a pointer may be offset from this base (as a result of pointer
15188691e9eSChristoph Hellwigarithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
15288691e9eSChristoph Hellwigoffset'.  The former is used when an exactly-known value (e.g. an immediate
15388691e9eSChristoph Hellwigoperand) is added to a pointer, while the latter is used for values which are
15488691e9eSChristoph Hellwignot exactly known.  The variable offset is also used in SCALAR_VALUEs, to track
15588691e9eSChristoph Hellwigthe range of possible values in the register.
15688691e9eSChristoph Hellwig
15788691e9eSChristoph HellwigThe verifier's knowledge about the variable offset consists of:
15888691e9eSChristoph Hellwig
15988691e9eSChristoph Hellwig* minimum and maximum values as unsigned
16088691e9eSChristoph Hellwig* minimum and maximum values as signed
16188691e9eSChristoph Hellwig
16288691e9eSChristoph Hellwig* knowledge of the values of individual bits, in the form of a 'tnum': a u64
16388691e9eSChristoph Hellwig  'mask' and a u64 'value'.  1s in the mask represent bits whose value is unknown;
16488691e9eSChristoph Hellwig  1s in the value represent bits known to be 1.  Bits known to be 0 have 0 in both
16588691e9eSChristoph Hellwig  mask and value; no bit should ever be 1 in both.  For example, if a byte is read
16688691e9eSChristoph Hellwig  into a register from memory, the register's top 56 bits are known zero, while
16788691e9eSChristoph Hellwig  the low 8 are unknown - which is represented as the tnum (0x0; 0xff).  If we
16888691e9eSChristoph Hellwig  then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0;
16988691e9eSChristoph Hellwig  0x1ff), because of potential carries.
17088691e9eSChristoph Hellwig
17188691e9eSChristoph HellwigBesides arithmetic, the register state can also be updated by conditional
17288691e9eSChristoph Hellwigbranches.  For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch
17388691e9eSChristoph Hellwigit will have a umin_value (unsigned minimum value) of 9, whereas in the 'false'
17488691e9eSChristoph Hellwigbranch it will have a umax_value of 8.  A signed compare (with BPF_JSGT or
17588691e9eSChristoph HellwigBPF_JSGE) would instead update the signed minimum/maximum values.  Information
17688691e9eSChristoph Hellwigfrom the signed and unsigned bounds can be combined; for instance if a value is
17788691e9eSChristoph Hellwigfirst tested < 8 and then tested s> 4, the verifier will conclude that the value
17888691e9eSChristoph Hellwigis also > 4 and s< 8, since the bounds prevent crossing the sign boundary.
17988691e9eSChristoph Hellwig
18088691e9eSChristoph HellwigPTR_TO_PACKETs with a variable offset part have an 'id', which is common to all
18188691e9eSChristoph Hellwigpointers sharing that same variable offset.  This is important for packet range
18288691e9eSChristoph Hellwigchecks: after adding a variable to a packet pointer register A, if you then copy
18388691e9eSChristoph Hellwigit to another register B and then add a constant 4 to A, both registers will
18488691e9eSChristoph Hellwigshare the same 'id' but the A will have a fixed offset of +4.  Then if A is
18588691e9eSChristoph Hellwigbounds-checked and found to be less than a PTR_TO_PACKET_END, the register B is
18688691e9eSChristoph Hellwignow known to have a safe range of at least 4 bytes.  See 'Direct packet access',
18788691e9eSChristoph Hellwigbelow, for more on PTR_TO_PACKET ranges.
18888691e9eSChristoph Hellwig
18988691e9eSChristoph HellwigThe 'id' field is also used on PTR_TO_MAP_VALUE_OR_NULL, common to all copies of
19088691e9eSChristoph Hellwigthe pointer returned from a map lookup.  This means that when one copy is
19188691e9eSChristoph Hellwigchecked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs.
19288691e9eSChristoph HellwigAs well as range-checking, the tracked information is also used for enforcing
19388691e9eSChristoph Hellwigalignment of pointer accesses.  For instance, on most systems the packet pointer
19488691e9eSChristoph Hellwigis 2 bytes after a 4-byte alignment.  If a program adds 14 bytes to that to jump
1951d3cab43SRandy Dunlapover the Ethernet header, then reads IHL and adds (IHL * 4), the resulting
19688691e9eSChristoph Hellwigpointer will have a variable offset known to be 4n+2 for some n, so adding the 2
19788691e9eSChristoph Hellwigbytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
19888691e9eSChristoph Hellwigthat pointer are safe.
19988691e9eSChristoph HellwigThe 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
20088691e9eSChristoph Hellwigto all copies of the pointer returned from a socket lookup. This has similar
20188691e9eSChristoph Hellwigbehaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
20288691e9eSChristoph Hellwigit also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
20388691e9eSChristoph Hellwigrepresents a reference to the corresponding ``struct sock``. To ensure that the
20488691e9eSChristoph Hellwigreference is not leaked, it is imperative to NULL-check the reference and in
20588691e9eSChristoph Hellwigthe non-NULL case, and pass the valid reference to the socket release function.
20688691e9eSChristoph Hellwig
20788691e9eSChristoph HellwigDirect packet access
20888691e9eSChristoph Hellwig====================
20988691e9eSChristoph Hellwig
21088691e9eSChristoph HellwigIn cls_bpf and act_bpf programs the verifier allows direct access to the packet
21188691e9eSChristoph Hellwigdata via skb->data and skb->data_end pointers.
21288691e9eSChristoph HellwigEx::
21388691e9eSChristoph Hellwig
21488691e9eSChristoph Hellwig    1:  r4 = *(u32 *)(r1 +80)  /* load skb->data_end */
21588691e9eSChristoph Hellwig    2:  r3 = *(u32 *)(r1 +76)  /* load skb->data */
21688691e9eSChristoph Hellwig    3:  r5 = r3
21788691e9eSChristoph Hellwig    4:  r5 += 14
21888691e9eSChristoph Hellwig    5:  if r5 > r4 goto pc+16
21988691e9eSChristoph Hellwig    R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
22088691e9eSChristoph Hellwig    6:  r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */
22188691e9eSChristoph Hellwig
22288691e9eSChristoph Hellwigthis 2byte load from the packet is safe to do, since the program author
22388691e9eSChristoph Hellwigdid check ``if (skb->data + 14 > skb->data_end) goto err`` at insn #5 which
22488691e9eSChristoph Hellwigmeans that in the fall-through case the register R3 (which points to skb->data)
22588691e9eSChristoph Hellwighas at least 14 directly accessible bytes. The verifier marks it
22688691e9eSChristoph Hellwigas R3=pkt(id=0,off=0,r=14).
22788691e9eSChristoph Hellwigid=0 means that no additional variables were added to the register.
22888691e9eSChristoph Hellwigoff=0 means that no additional constants were added.
22988691e9eSChristoph Hellwigr=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok.
23088691e9eSChristoph HellwigNote that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points
23188691e9eSChristoph Hellwigto the packet data, but constant 14 was added to the register, so
23288691e9eSChristoph Hellwigit now points to ``skb->data + 14`` and accessible range is [R5, R5 + 14 - 14)
23388691e9eSChristoph Hellwigwhich is zero bytes.
23488691e9eSChristoph Hellwig
23588691e9eSChristoph HellwigMore complex packet access may look like::
23688691e9eSChristoph Hellwig
23788691e9eSChristoph Hellwig
23888691e9eSChristoph Hellwig    R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
23988691e9eSChristoph Hellwig    6:  r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
24088691e9eSChristoph Hellwig    7:  r4 = *(u8 *)(r3 +12)
24188691e9eSChristoph Hellwig    8:  r4 *= 14
24288691e9eSChristoph Hellwig    9:  r3 = *(u32 *)(r1 +76) /* load skb->data */
24388691e9eSChristoph Hellwig    10:  r3 += r4
24488691e9eSChristoph Hellwig    11:  r2 = r1
24588691e9eSChristoph Hellwig    12:  r2 <<= 48
24688691e9eSChristoph Hellwig    13:  r2 >>= 48
24788691e9eSChristoph Hellwig    14:  r3 += r2
24888691e9eSChristoph Hellwig    15:  r2 = r3
24988691e9eSChristoph Hellwig    16:  r2 += 8
25088691e9eSChristoph Hellwig    17:  r1 = *(u32 *)(r1 +80) /* load skb->data_end */
25188691e9eSChristoph Hellwig    18:  if r2 > r1 goto pc+2
25288691e9eSChristoph Hellwig    R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp
25388691e9eSChristoph Hellwig    19:  r1 = *(u8 *)(r3 +4)
25488691e9eSChristoph Hellwig
25588691e9eSChristoph HellwigThe state of the register R3 is R3=pkt(id=2,off=0,r=8)
25688691e9eSChristoph Hellwigid=2 means that two ``r3 += rX`` instructions were seen, so r3 points to some
25788691e9eSChristoph Hellwigoffset within a packet and since the program author did
25888691e9eSChristoph Hellwig``if (r3 + 8 > r1) goto err`` at insn #18, the safe range is [R3, R3 + 8).
25988691e9eSChristoph HellwigThe verifier only allows 'add'/'sub' operations on packet registers. Any other
26088691e9eSChristoph Hellwigoperation will set the register state to 'SCALAR_VALUE' and it won't be
26188691e9eSChristoph Hellwigavailable for direct packet access.
26288691e9eSChristoph Hellwig
26388691e9eSChristoph HellwigOperation ``r3 += rX`` may overflow and become less than original skb->data,
26488691e9eSChristoph Hellwigtherefore the verifier has to prevent that.  So when it sees ``r3 += rX``
26588691e9eSChristoph Hellwiginstruction and rX is more than 16-bit value, any subsequent bounds-check of r3
26688691e9eSChristoph Hellwigagainst skb->data_end will not give us 'range' information, so attempts to read
26788691e9eSChristoph Hellwigthrough the pointer will give "invalid access to packet" error.
26888691e9eSChristoph Hellwig
26988691e9eSChristoph HellwigEx. after insn ``r4 = *(u8 *)(r3 +12)`` (insn #7 above) the state of r4 is
27088691e9eSChristoph HellwigR4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits
27188691e9eSChristoph Hellwigof the register are guaranteed to be zero, and nothing is known about the lower
27288691e9eSChristoph Hellwig8 bits. After insn ``r4 *= 14`` the state becomes
27388691e9eSChristoph HellwigR4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit
27488691e9eSChristoph Hellwigvalue by constant 14 will keep upper 52 bits as zero, also the least significant
27588691e9eSChristoph Hellwigbit will be zero as 14 is even.  Similarly ``r2 >>= 48`` will make
27688691e9eSChristoph HellwigR2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign
27788691e9eSChristoph Hellwigextending.  This logic is implemented in adjust_reg_min_max_vals() function,
27888691e9eSChristoph Hellwigwhich calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice
27988691e9eSChristoph Hellwigversa) and adjust_scalar_min_max_vals() for operations on two scalars.
28088691e9eSChristoph Hellwig
28188691e9eSChristoph HellwigThe end result is that bpf program author can access packet directly
28288691e9eSChristoph Hellwigusing normal C code as::
28388691e9eSChristoph Hellwig
28488691e9eSChristoph Hellwig  void *data = (void *)(long)skb->data;
28588691e9eSChristoph Hellwig  void *data_end = (void *)(long)skb->data_end;
28688691e9eSChristoph Hellwig  struct eth_hdr *eth = data;
28788691e9eSChristoph Hellwig  struct iphdr *iph = data + sizeof(*eth);
28888691e9eSChristoph Hellwig  struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph);
28988691e9eSChristoph Hellwig
29088691e9eSChristoph Hellwig  if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
29188691e9eSChristoph Hellwig	  return 0;
29288691e9eSChristoph Hellwig  if (eth->h_proto != htons(ETH_P_IP))
29388691e9eSChristoph Hellwig	  return 0;
29488691e9eSChristoph Hellwig  if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
29588691e9eSChristoph Hellwig	  return 0;
29688691e9eSChristoph Hellwig  if (udp->dest == 53 || udp->source == 9)
29788691e9eSChristoph Hellwig	  ...;
29888691e9eSChristoph Hellwig
29988691e9eSChristoph Hellwigwhich makes such programs easier to write comparing to LD_ABS insn
30088691e9eSChristoph Hellwigand significantly faster.
30188691e9eSChristoph Hellwig
30288691e9eSChristoph HellwigPruning
30388691e9eSChristoph Hellwig=======
30488691e9eSChristoph Hellwig
30588691e9eSChristoph HellwigThe verifier does not actually walk all possible paths through the program.  For
30688691e9eSChristoph Hellwigeach new branch to analyse, the verifier looks at all the states it's previously
30788691e9eSChristoph Hellwigbeen in when at this instruction.  If any of them contain the current state as a
30888691e9eSChristoph Hellwigsubset, the branch is 'pruned' - that is, the fact that the previous state was
30988691e9eSChristoph Hellwigaccepted implies the current state would be as well.  For instance, if in the
31088691e9eSChristoph Hellwigprevious state, r1 held a packet-pointer, and in the current state, r1 holds a
31188691e9eSChristoph Hellwigpacket-pointer with a range as long or longer and at least as strict an
31288691e9eSChristoph Hellwigalignment, then r1 is safe.  Similarly, if r2 was NOT_INIT before then it can't
31388691e9eSChristoph Hellwighave been used by any path from that point, so any value in r2 (including
31488691e9eSChristoph Hellwiganother NOT_INIT) is safe.  The implementation is in the function regsafe().
31588691e9eSChristoph HellwigPruning considers not only the registers but also the stack (and any spilled
31688691e9eSChristoph Hellwigregisters it may hold).  They must all be safe for the branch to be pruned.
31788691e9eSChristoph HellwigThis is implemented in states_equal().
31888691e9eSChristoph Hellwig
319*cb601848SEduard ZingermanSome technical details about state pruning implementation could be found below.
320*cb601848SEduard Zingerman
321*cb601848SEduard ZingermanRegister liveness tracking
322*cb601848SEduard Zingerman--------------------------
323*cb601848SEduard Zingerman
324*cb601848SEduard ZingermanIn order to make state pruning effective, liveness state is tracked for each
325*cb601848SEduard Zingermanregister and stack slot. The basic idea is to track which registers and stack
326*cb601848SEduard Zingermanslots are actually used during subseqeuent execution of the program, until
327*cb601848SEduard Zingermanprogram exit is reached. Registers and stack slots that were never used could be
328*cb601848SEduard Zingermanremoved from the cached state thus making more states equivalent to a cached
329*cb601848SEduard Zingermanstate. This could be illustrated by the following program::
330*cb601848SEduard Zingerman
331*cb601848SEduard Zingerman  0: call bpf_get_prandom_u32()
332*cb601848SEduard Zingerman  1: r1 = 0
333*cb601848SEduard Zingerman  2: if r0 == 0 goto +1
334*cb601848SEduard Zingerman  3: r0 = 1
335*cb601848SEduard Zingerman  --- checkpoint ---
336*cb601848SEduard Zingerman  4: r0 = r1
337*cb601848SEduard Zingerman  5: exit
338*cb601848SEduard Zingerman
339*cb601848SEduard ZingermanSuppose that a state cache entry is created at instruction #4 (such entries are
340*cb601848SEduard Zingermanalso called "checkpoints" in the text below). The verifier could reach the
341*cb601848SEduard Zingermaninstruction with one of two possible register states:
342*cb601848SEduard Zingerman
343*cb601848SEduard Zingerman* r0 = 1, r1 = 0
344*cb601848SEduard Zingerman* r0 = 0, r1 = 0
345*cb601848SEduard Zingerman
346*cb601848SEduard ZingermanHowever, only the value of register ``r1`` is important to successfully finish
347*cb601848SEduard Zingermanverification. The goal of the liveness tracking algorithm is to spot this fact
348*cb601848SEduard Zingermanand figure out that both states are actually equivalent.
349*cb601848SEduard Zingerman
350*cb601848SEduard ZingermanData structures
351*cb601848SEduard Zingerman~~~~~~~~~~~~~~~
352*cb601848SEduard Zingerman
353*cb601848SEduard ZingermanLiveness is tracked using the following data structures::
354*cb601848SEduard Zingerman
355*cb601848SEduard Zingerman  enum bpf_reg_liveness {
356*cb601848SEduard Zingerman	REG_LIVE_NONE = 0,
357*cb601848SEduard Zingerman	REG_LIVE_READ32 = 0x1,
358*cb601848SEduard Zingerman	REG_LIVE_READ64 = 0x2,
359*cb601848SEduard Zingerman	REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64,
360*cb601848SEduard Zingerman	REG_LIVE_WRITTEN = 0x4,
361*cb601848SEduard Zingerman	REG_LIVE_DONE = 0x8,
362*cb601848SEduard Zingerman  };
363*cb601848SEduard Zingerman
364*cb601848SEduard Zingerman  struct bpf_reg_state {
365*cb601848SEduard Zingerman 	...
366*cb601848SEduard Zingerman	struct bpf_reg_state *parent;
367*cb601848SEduard Zingerman 	...
368*cb601848SEduard Zingerman	enum bpf_reg_liveness live;
369*cb601848SEduard Zingerman 	...
370*cb601848SEduard Zingerman  };
371*cb601848SEduard Zingerman
372*cb601848SEduard Zingerman  struct bpf_stack_state {
373*cb601848SEduard Zingerman	struct bpf_reg_state spilled_ptr;
374*cb601848SEduard Zingerman	...
375*cb601848SEduard Zingerman  };
376*cb601848SEduard Zingerman
377*cb601848SEduard Zingerman  struct bpf_func_state {
378*cb601848SEduard Zingerman	struct bpf_reg_state regs[MAX_BPF_REG];
379*cb601848SEduard Zingerman        ...
380*cb601848SEduard Zingerman	struct bpf_stack_state *stack;
381*cb601848SEduard Zingerman  }
382*cb601848SEduard Zingerman
383*cb601848SEduard Zingerman  struct bpf_verifier_state {
384*cb601848SEduard Zingerman	struct bpf_func_state *frame[MAX_CALL_FRAMES];
385*cb601848SEduard Zingerman	struct bpf_verifier_state *parent;
386*cb601848SEduard Zingerman        ...
387*cb601848SEduard Zingerman  }
388*cb601848SEduard Zingerman
389*cb601848SEduard Zingerman* ``REG_LIVE_NONE`` is an initial value assigned to ``->live`` fields upon new
390*cb601848SEduard Zingerman  verifier state creation;
391*cb601848SEduard Zingerman
392*cb601848SEduard Zingerman* ``REG_LIVE_WRITTEN`` means that the value of the register (or stack slot) is
393*cb601848SEduard Zingerman  defined by some instruction verified between this verifier state's parent and
394*cb601848SEduard Zingerman  verifier state itself;
395*cb601848SEduard Zingerman
396*cb601848SEduard Zingerman* ``REG_LIVE_READ{32,64}`` means that the value of the register (or stack slot)
397*cb601848SEduard Zingerman  is read by a some child state of this verifier state;
398*cb601848SEduard Zingerman
399*cb601848SEduard Zingerman* ``REG_LIVE_DONE`` is a marker used by ``clean_verifier_state()`` to avoid
400*cb601848SEduard Zingerman  processing same verifier state multiple times and for some sanity checks;
401*cb601848SEduard Zingerman
402*cb601848SEduard Zingerman* ``->live`` field values are formed by combining ``enum bpf_reg_liveness``
403*cb601848SEduard Zingerman  values using bitwise or.
404*cb601848SEduard Zingerman
405*cb601848SEduard ZingermanRegister parentage chains
406*cb601848SEduard Zingerman~~~~~~~~~~~~~~~~~~~~~~~~~
407*cb601848SEduard Zingerman
408*cb601848SEduard ZingermanIn order to propagate information between parent and child states, a *register
409*cb601848SEduard Zingermanparentage chain* is established. Each register or stack slot is linked to a
410*cb601848SEduard Zingermancorresponding register or stack slot in its parent state via a ``->parent``
411*cb601848SEduard Zingermanpointer. This link is established upon state creation in ``is_state_visited()``
412*cb601848SEduard Zingermanand might be modified by ``set_callee_state()`` called from
413*cb601848SEduard Zingerman``__check_func_call()``.
414*cb601848SEduard Zingerman
415*cb601848SEduard ZingermanThe rules for correspondence between registers / stack slots are as follows:
416*cb601848SEduard Zingerman
417*cb601848SEduard Zingerman* For the current stack frame, registers and stack slots of the new state are
418*cb601848SEduard Zingerman  linked to the registers and stack slots of the parent state with the same
419*cb601848SEduard Zingerman  indices.
420*cb601848SEduard Zingerman
421*cb601848SEduard Zingerman* For the outer stack frames, only caller saved registers (r6-r9) and stack
422*cb601848SEduard Zingerman  slots are linked to the registers and stack slots of the parent state with the
423*cb601848SEduard Zingerman  same indices.
424*cb601848SEduard Zingerman
425*cb601848SEduard Zingerman* When function call is processed a new ``struct bpf_func_state`` instance is
426*cb601848SEduard Zingerman  allocated, it encapsulates a new set of registers and stack slots. For this
427*cb601848SEduard Zingerman  new frame, parent links for r6-r9 and stack slots are set to nil, parent links
428*cb601848SEduard Zingerman  for r1-r5 are set to match caller r1-r5 parent links.
429*cb601848SEduard Zingerman
430*cb601848SEduard ZingermanThis could be illustrated by the following diagram (arrows stand for
431*cb601848SEduard Zingerman``->parent`` pointers)::
432*cb601848SEduard Zingerman
433*cb601848SEduard Zingerman      ...                    ; Frame #0, some instructions
434*cb601848SEduard Zingerman  --- checkpoint #0 ---
435*cb601848SEduard Zingerman  1 : r6 = 42                ; Frame #0
436*cb601848SEduard Zingerman  --- checkpoint #1 ---
437*cb601848SEduard Zingerman  2 : call foo()             ; Frame #0
438*cb601848SEduard Zingerman      ...                    ; Frame #1, instructions from foo()
439*cb601848SEduard Zingerman  --- checkpoint #2 ---
440*cb601848SEduard Zingerman      ...                    ; Frame #1, instructions from foo()
441*cb601848SEduard Zingerman  --- checkpoint #3 ---
442*cb601848SEduard Zingerman      exit                   ; Frame #1, return from foo()
443*cb601848SEduard Zingerman  3 : r1 = r6                ; Frame #0  <- current state
444*cb601848SEduard Zingerman
445*cb601848SEduard Zingerman             +-------------------------------+-------------------------------+
446*cb601848SEduard Zingerman             |           Frame #0            |           Frame #1            |
447*cb601848SEduard Zingerman  Checkpoint +-------------------------------+-------------------------------+
448*cb601848SEduard Zingerman  #0         | r0 | r1-r5 | r6-r9 | fp-8 ... |
449*cb601848SEduard Zingerman             +-------------------------------+
450*cb601848SEduard Zingerman                ^    ^       ^       ^
451*cb601848SEduard Zingerman                |    |       |       |
452*cb601848SEduard Zingerman  Checkpoint +-------------------------------+
453*cb601848SEduard Zingerman  #1         | r0 | r1-r5 | r6-r9 | fp-8 ... |
454*cb601848SEduard Zingerman             +-------------------------------+
455*cb601848SEduard Zingerman                     ^       ^       ^
456*cb601848SEduard Zingerman                     |_______|_______|_______________
457*cb601848SEduard Zingerman                             |       |               |
458*cb601848SEduard Zingerman               nil  nil      |       |               |      nil     nil
459*cb601848SEduard Zingerman                |    |       |       |               |       |       |
460*cb601848SEduard Zingerman  Checkpoint +-------------------------------+-------------------------------+
461*cb601848SEduard Zingerman  #2         | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
462*cb601848SEduard Zingerman             +-------------------------------+-------------------------------+
463*cb601848SEduard Zingerman                             ^       ^               ^       ^       ^
464*cb601848SEduard Zingerman               nil  nil      |       |               |       |       |
465*cb601848SEduard Zingerman                |    |       |       |               |       |       |
466*cb601848SEduard Zingerman  Checkpoint +-------------------------------+-------------------------------+
467*cb601848SEduard Zingerman  #3         | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
468*cb601848SEduard Zingerman             +-------------------------------+-------------------------------+
469*cb601848SEduard Zingerman                             ^       ^
470*cb601848SEduard Zingerman               nil  nil      |       |
471*cb601848SEduard Zingerman                |    |       |       |
472*cb601848SEduard Zingerman  Current    +-------------------------------+
473*cb601848SEduard Zingerman  state      | r0 | r1-r5 | r6-r9 | fp-8 ... |
474*cb601848SEduard Zingerman             +-------------------------------+
475*cb601848SEduard Zingerman                             \
476*cb601848SEduard Zingerman                               r6 read mark is propagated via these links
477*cb601848SEduard Zingerman                               all the way up to checkpoint #1.
478*cb601848SEduard Zingerman                               The checkpoint #1 contains a write mark for r6
479*cb601848SEduard Zingerman                               because of instruction (1), thus read propagation
480*cb601848SEduard Zingerman                               does not reach checkpoint #0 (see section below).
481*cb601848SEduard Zingerman
482*cb601848SEduard ZingermanLiveness marks tracking
483*cb601848SEduard Zingerman~~~~~~~~~~~~~~~~~~~~~~~
484*cb601848SEduard Zingerman
485*cb601848SEduard ZingermanFor each processed instruction, the verifier tracks read and written registers
486*cb601848SEduard Zingermanand stack slots. The main idea of the algorithm is that read marks propagate
487*cb601848SEduard Zingermanback along the state parentage chain until they hit a write mark, which 'screens
488*cb601848SEduard Zingermanoff' earlier states from the read. The information about reads is propagated by
489*cb601848SEduard Zingermanfunction ``mark_reg_read()`` which could be summarized as follows::
490*cb601848SEduard Zingerman
491*cb601848SEduard Zingerman  mark_reg_read(struct bpf_reg_state *state, ...):
492*cb601848SEduard Zingerman      parent = state->parent
493*cb601848SEduard Zingerman      while parent:
494*cb601848SEduard Zingerman          if state->live & REG_LIVE_WRITTEN:
495*cb601848SEduard Zingerman              break
496*cb601848SEduard Zingerman          if parent->live & REG_LIVE_READ64:
497*cb601848SEduard Zingerman              break
498*cb601848SEduard Zingerman          parent->live |= REG_LIVE_READ64
499*cb601848SEduard Zingerman          state = parent
500*cb601848SEduard Zingerman          parent = state->parent
501*cb601848SEduard Zingerman
502*cb601848SEduard ZingermanNotes:
503*cb601848SEduard Zingerman
504*cb601848SEduard Zingerman* The read marks are applied to the **parent** state while write marks are
505*cb601848SEduard Zingerman  applied to the **current** state. The write mark on a register or stack slot
506*cb601848SEduard Zingerman  means that it is updated by some instruction in the straight-line code leading
507*cb601848SEduard Zingerman  from the parent state to the current state.
508*cb601848SEduard Zingerman
509*cb601848SEduard Zingerman* Details about REG_LIVE_READ32 are omitted.
510*cb601848SEduard Zingerman
511*cb601848SEduard Zingerman* Function ``propagate_liveness()`` (see section :ref:`read_marks_for_cache_hits`)
512*cb601848SEduard Zingerman  might override the first parent link. Please refer to the comments in the
513*cb601848SEduard Zingerman  ``propagate_liveness()`` and ``mark_reg_read()`` source code for further
514*cb601848SEduard Zingerman  details.
515*cb601848SEduard Zingerman
516*cb601848SEduard ZingermanBecause stack writes could have different sizes ``REG_LIVE_WRITTEN`` marks are
517*cb601848SEduard Zingermanapplied conservatively: stack slots are marked as written only if write size
518*cb601848SEduard Zingermancorresponds to the size of the register, e.g. see function ``save_register_state()``.
519*cb601848SEduard Zingerman
520*cb601848SEduard ZingermanConsider the following example::
521*cb601848SEduard Zingerman
522*cb601848SEduard Zingerman  0: (*u64)(r10 - 8) = 0   ; define 8 bytes of fp-8
523*cb601848SEduard Zingerman  --- checkpoint #0 ---
524*cb601848SEduard Zingerman  1: (*u32)(r10 - 8) = 1   ; redefine lower 4 bytes
525*cb601848SEduard Zingerman  2: r1 = (*u32)(r10 - 8)  ; read lower 4 bytes defined at (1)
526*cb601848SEduard Zingerman  3: r2 = (*u32)(r10 - 4)  ; read upper 4 bytes defined at (0)
527*cb601848SEduard Zingerman
528*cb601848SEduard ZingermanAs stated above, the write at (1) does not count as ``REG_LIVE_WRITTEN``. Should
529*cb601848SEduard Zingermanit be otherwise, the algorithm above wouldn't be able to propagate the read mark
530*cb601848SEduard Zingermanfrom (3) to checkpoint #0.
531*cb601848SEduard Zingerman
532*cb601848SEduard ZingermanOnce the ``BPF_EXIT`` instruction is reached ``update_branch_counts()`` is
533*cb601848SEduard Zingermancalled to update the ``->branches`` counter for each verifier state in a chain
534*cb601848SEduard Zingermanof parent verifier states. When the ``->branches`` counter reaches zero the
535*cb601848SEduard Zingermanverifier state becomes a valid entry in a set of cached verifier states.
536*cb601848SEduard Zingerman
537*cb601848SEduard ZingermanEach entry of the verifier states cache is post-processed by a function
538*cb601848SEduard Zingerman``clean_live_states()``. This function marks all registers and stack slots
539*cb601848SEduard Zingermanwithout ``REG_LIVE_READ{32,64}`` marks as ``NOT_INIT`` or ``STACK_INVALID``.
540*cb601848SEduard ZingermanRegisters/stack slots marked in this way are ignored in function ``stacksafe()``
541*cb601848SEduard Zingermancalled from ``states_equal()`` when a state cache entry is considered for
542*cb601848SEduard Zingermanequivalence with a current state.
543*cb601848SEduard Zingerman
544*cb601848SEduard ZingermanNow it is possible to explain how the example from the beginning of the section
545*cb601848SEduard Zingermanworks::
546*cb601848SEduard Zingerman
547*cb601848SEduard Zingerman  0: call bpf_get_prandom_u32()
548*cb601848SEduard Zingerman  1: r1 = 0
549*cb601848SEduard Zingerman  2: if r0 == 0 goto +1
550*cb601848SEduard Zingerman  3: r0 = 1
551*cb601848SEduard Zingerman  --- checkpoint[0] ---
552*cb601848SEduard Zingerman  4: r0 = r1
553*cb601848SEduard Zingerman  5: exit
554*cb601848SEduard Zingerman
555*cb601848SEduard Zingerman* At instruction #2 branching point is reached and state ``{ r0 == 0, r1 == 0, pc == 4 }``
556*cb601848SEduard Zingerman  is pushed to states processing queue (pc stands for program counter).
557*cb601848SEduard Zingerman
558*cb601848SEduard Zingerman* At instruction #4:
559*cb601848SEduard Zingerman
560*cb601848SEduard Zingerman  * ``checkpoint[0]`` states cache entry is created: ``{ r0 == 1, r1 == 0, pc == 4 }``;
561*cb601848SEduard Zingerman  * ``checkpoint[0].r0`` is marked as written;
562*cb601848SEduard Zingerman  * ``checkpoint[0].r1`` is marked as read;
563*cb601848SEduard Zingerman
564*cb601848SEduard Zingerman* At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed
565*cb601848SEduard Zingerman  by ``clean_live_states()``. After this processing ``checkpoint[0].r0`` has a
566*cb601848SEduard Zingerman  read mark and all other registers and stack slots are marked as ``NOT_INIT``
567*cb601848SEduard Zingerman  or ``STACK_INVALID``
568*cb601848SEduard Zingerman
569*cb601848SEduard Zingerman* The state ``{ r0 == 0, r1 == 0, pc == 4 }`` is popped from the states queue
570*cb601848SEduard Zingerman  and is compared against a cached state ``{ r1 == 0, pc == 4 }``, the states
571*cb601848SEduard Zingerman  are considered equivalent.
572*cb601848SEduard Zingerman
573*cb601848SEduard Zingerman.. _read_marks_for_cache_hits:
574*cb601848SEduard Zingerman
575*cb601848SEduard ZingermanRead marks propagation for cache hits
576*cb601848SEduard Zingerman~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
577*cb601848SEduard Zingerman
578*cb601848SEduard ZingermanAnother point is the handling of read marks when a previously verified state is
579*cb601848SEduard Zingermanfound in the states cache. Upon cache hit verifier must behave in the same way
580*cb601848SEduard Zingermanas if the current state was verified to the program exit. This means that all
581*cb601848SEduard Zingermanread marks, present on registers and stack slots of the cached state, must be
582*cb601848SEduard Zingermanpropagated over the parentage chain of the current state. Example below shows
583*cb601848SEduard Zingermanwhy this is important. Function ``propagate_liveness()`` handles this case.
584*cb601848SEduard Zingerman
585*cb601848SEduard ZingermanConsider the following state parentage chain (S is a starting state, A-E are
586*cb601848SEduard Zingermanderived states, -> arrows show which state is derived from which)::
587*cb601848SEduard Zingerman
588*cb601848SEduard Zingerman                   r1 read
589*cb601848SEduard Zingerman            <-------------                A[r1] == 0
590*cb601848SEduard Zingerman                                          C[r1] == 0
591*cb601848SEduard Zingerman      S ---> A ---> B ---> exit           E[r1] == 1
592*cb601848SEduard Zingerman      |
593*cb601848SEduard Zingerman      ` ---> C ---> D
594*cb601848SEduard Zingerman      |
595*cb601848SEduard Zingerman      ` ---> E      ^
596*cb601848SEduard Zingerman                    |___   suppose all these
597*cb601848SEduard Zingerman             ^           states are at insn #Y
598*cb601848SEduard Zingerman             |
599*cb601848SEduard Zingerman      suppose all these
600*cb601848SEduard Zingerman    states are at insn #X
601*cb601848SEduard Zingerman
602*cb601848SEduard Zingerman* Chain of states ``S -> A -> B -> exit`` is verified first.
603*cb601848SEduard Zingerman
604*cb601848SEduard Zingerman* While ``B -> exit`` is verified, register ``r1`` is read and this read mark is
605*cb601848SEduard Zingerman  propagated up to state ``A``.
606*cb601848SEduard Zingerman
607*cb601848SEduard Zingerman* When chain of states ``C -> D`` is verified the state ``D`` turns out to be
608*cb601848SEduard Zingerman  equivalent to state ``B``.
609*cb601848SEduard Zingerman
610*cb601848SEduard Zingerman* The read mark for ``r1`` has to be propagated to state ``C``, otherwise state
611*cb601848SEduard Zingerman  ``C`` might get mistakenly marked as equivalent to state ``E`` even though
612*cb601848SEduard Zingerman  values for register ``r1`` differ between ``C`` and ``E``.
613*cb601848SEduard Zingerman
61488691e9eSChristoph HellwigUnderstanding eBPF verifier messages
61588691e9eSChristoph Hellwig====================================
61688691e9eSChristoph Hellwig
61788691e9eSChristoph HellwigThe following are few examples of invalid eBPF programs and verifier error
61888691e9eSChristoph Hellwigmessages as seen in the log:
61988691e9eSChristoph Hellwig
62088691e9eSChristoph HellwigProgram with unreachable instructions::
62188691e9eSChristoph Hellwig
62288691e9eSChristoph Hellwig  static struct bpf_insn prog[] = {
62388691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
62488691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
62588691e9eSChristoph Hellwig  };
62688691e9eSChristoph Hellwig
62743429ea7SWan JiabingError::
62888691e9eSChristoph Hellwig
62988691e9eSChristoph Hellwig  unreachable insn 1
63088691e9eSChristoph Hellwig
63188691e9eSChristoph HellwigProgram that reads uninitialized register::
63288691e9eSChristoph Hellwig
63388691e9eSChristoph Hellwig  BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
63488691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
63588691e9eSChristoph Hellwig
63688691e9eSChristoph HellwigError::
63788691e9eSChristoph Hellwig
63888691e9eSChristoph Hellwig  0: (bf) r0 = r2
63988691e9eSChristoph Hellwig  R2 !read_ok
64088691e9eSChristoph Hellwig
64188691e9eSChristoph HellwigProgram that doesn't initialize R0 before exiting::
64288691e9eSChristoph Hellwig
64388691e9eSChristoph Hellwig  BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
64488691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
64588691e9eSChristoph Hellwig
64688691e9eSChristoph HellwigError::
64788691e9eSChristoph Hellwig
64888691e9eSChristoph Hellwig  0: (bf) r2 = r1
64988691e9eSChristoph Hellwig  1: (95) exit
65088691e9eSChristoph Hellwig  R0 !read_ok
65188691e9eSChristoph Hellwig
65288691e9eSChristoph HellwigProgram that accesses stack out of bounds::
65388691e9eSChristoph Hellwig
65488691e9eSChristoph Hellwig    BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
65588691e9eSChristoph Hellwig    BPF_EXIT_INSN(),
65688691e9eSChristoph Hellwig
65788691e9eSChristoph HellwigError::
65888691e9eSChristoph Hellwig
65988691e9eSChristoph Hellwig    0: (7a) *(u64 *)(r10 +8) = 0
66088691e9eSChristoph Hellwig    invalid stack off=8 size=8
66188691e9eSChristoph Hellwig
66288691e9eSChristoph HellwigProgram that doesn't initialize stack before passing its address into function::
66388691e9eSChristoph Hellwig
66488691e9eSChristoph Hellwig  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
66588691e9eSChristoph Hellwig  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
66688691e9eSChristoph Hellwig  BPF_LD_MAP_FD(BPF_REG_1, 0),
66788691e9eSChristoph Hellwig  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
66888691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
66988691e9eSChristoph Hellwig
67088691e9eSChristoph HellwigError::
67188691e9eSChristoph Hellwig
67288691e9eSChristoph Hellwig  0: (bf) r2 = r10
67388691e9eSChristoph Hellwig  1: (07) r2 += -8
67488691e9eSChristoph Hellwig  2: (b7) r1 = 0x0
67588691e9eSChristoph Hellwig  3: (85) call 1
67688691e9eSChristoph Hellwig  invalid indirect read from stack off -8+0 size 8
67788691e9eSChristoph Hellwig
67888691e9eSChristoph HellwigProgram that uses invalid map_fd=0 while calling to map_lookup_elem() function::
67988691e9eSChristoph Hellwig
68088691e9eSChristoph Hellwig  BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
68188691e9eSChristoph Hellwig  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
68288691e9eSChristoph Hellwig  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
68388691e9eSChristoph Hellwig  BPF_LD_MAP_FD(BPF_REG_1, 0),
68488691e9eSChristoph Hellwig  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
68588691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
68688691e9eSChristoph Hellwig
68788691e9eSChristoph HellwigError::
68888691e9eSChristoph Hellwig
68988691e9eSChristoph Hellwig  0: (7a) *(u64 *)(r10 -8) = 0
69088691e9eSChristoph Hellwig  1: (bf) r2 = r10
69188691e9eSChristoph Hellwig  2: (07) r2 += -8
69288691e9eSChristoph Hellwig  3: (b7) r1 = 0x0
69388691e9eSChristoph Hellwig  4: (85) call 1
69488691e9eSChristoph Hellwig  fd 0 is not pointing to valid bpf_map
69588691e9eSChristoph Hellwig
69688691e9eSChristoph HellwigProgram that doesn't check return value of map_lookup_elem() before accessing
69788691e9eSChristoph Hellwigmap element::
69888691e9eSChristoph Hellwig
69988691e9eSChristoph Hellwig  BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
70088691e9eSChristoph Hellwig  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
70188691e9eSChristoph Hellwig  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
70288691e9eSChristoph Hellwig  BPF_LD_MAP_FD(BPF_REG_1, 0),
70388691e9eSChristoph Hellwig  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
70488691e9eSChristoph Hellwig  BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
70588691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
70688691e9eSChristoph Hellwig
70788691e9eSChristoph HellwigError::
70888691e9eSChristoph Hellwig
70988691e9eSChristoph Hellwig  0: (7a) *(u64 *)(r10 -8) = 0
71088691e9eSChristoph Hellwig  1: (bf) r2 = r10
71188691e9eSChristoph Hellwig  2: (07) r2 += -8
71288691e9eSChristoph Hellwig  3: (b7) r1 = 0x0
71388691e9eSChristoph Hellwig  4: (85) call 1
71488691e9eSChristoph Hellwig  5: (7a) *(u64 *)(r0 +0) = 0
71588691e9eSChristoph Hellwig  R0 invalid mem access 'map_value_or_null'
71688691e9eSChristoph Hellwig
71788691e9eSChristoph HellwigProgram that correctly checks map_lookup_elem() returned value for NULL, but
71888691e9eSChristoph Hellwigaccesses the memory with incorrect alignment::
71988691e9eSChristoph Hellwig
72088691e9eSChristoph Hellwig  BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
72188691e9eSChristoph Hellwig  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
72288691e9eSChristoph Hellwig  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
72388691e9eSChristoph Hellwig  BPF_LD_MAP_FD(BPF_REG_1, 0),
72488691e9eSChristoph Hellwig  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
72588691e9eSChristoph Hellwig  BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
72688691e9eSChristoph Hellwig  BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
72788691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
72888691e9eSChristoph Hellwig
72988691e9eSChristoph HellwigError::
73088691e9eSChristoph Hellwig
73188691e9eSChristoph Hellwig  0: (7a) *(u64 *)(r10 -8) = 0
73288691e9eSChristoph Hellwig  1: (bf) r2 = r10
73388691e9eSChristoph Hellwig  2: (07) r2 += -8
73488691e9eSChristoph Hellwig  3: (b7) r1 = 1
73588691e9eSChristoph Hellwig  4: (85) call 1
73688691e9eSChristoph Hellwig  5: (15) if r0 == 0x0 goto pc+1
73788691e9eSChristoph Hellwig   R0=map_ptr R10=fp
73888691e9eSChristoph Hellwig  6: (7a) *(u64 *)(r0 +4) = 0
73988691e9eSChristoph Hellwig  misaligned access off 4 size 8
74088691e9eSChristoph Hellwig
74188691e9eSChristoph HellwigProgram that correctly checks map_lookup_elem() returned value for NULL and
74288691e9eSChristoph Hellwigaccesses memory with correct alignment in one side of 'if' branch, but fails
74388691e9eSChristoph Hellwigto do so in the other side of 'if' branch::
74488691e9eSChristoph Hellwig
74588691e9eSChristoph Hellwig  BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
74688691e9eSChristoph Hellwig  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
74788691e9eSChristoph Hellwig  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
74888691e9eSChristoph Hellwig  BPF_LD_MAP_FD(BPF_REG_1, 0),
74988691e9eSChristoph Hellwig  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
75088691e9eSChristoph Hellwig  BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
75188691e9eSChristoph Hellwig  BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
75288691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
75388691e9eSChristoph Hellwig  BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
75488691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
75588691e9eSChristoph Hellwig
75688691e9eSChristoph HellwigError::
75788691e9eSChristoph Hellwig
75888691e9eSChristoph Hellwig  0: (7a) *(u64 *)(r10 -8) = 0
75988691e9eSChristoph Hellwig  1: (bf) r2 = r10
76088691e9eSChristoph Hellwig  2: (07) r2 += -8
76188691e9eSChristoph Hellwig  3: (b7) r1 = 1
76288691e9eSChristoph Hellwig  4: (85) call 1
76388691e9eSChristoph Hellwig  5: (15) if r0 == 0x0 goto pc+2
76488691e9eSChristoph Hellwig   R0=map_ptr R10=fp
76588691e9eSChristoph Hellwig  6: (7a) *(u64 *)(r0 +0) = 0
76688691e9eSChristoph Hellwig  7: (95) exit
76788691e9eSChristoph Hellwig
76888691e9eSChristoph Hellwig  from 5 to 8: R0=imm0 R10=fp
76988691e9eSChristoph Hellwig  8: (7a) *(u64 *)(r0 +0) = 1
77088691e9eSChristoph Hellwig  R0 invalid mem access 'imm'
77188691e9eSChristoph Hellwig
77288691e9eSChristoph HellwigProgram that performs a socket lookup then sets the pointer to NULL without
77388691e9eSChristoph Hellwigchecking it::
77488691e9eSChristoph Hellwig
77588691e9eSChristoph Hellwig  BPF_MOV64_IMM(BPF_REG_2, 0),
77688691e9eSChristoph Hellwig  BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
77788691e9eSChristoph Hellwig  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
77888691e9eSChristoph Hellwig  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
77988691e9eSChristoph Hellwig  BPF_MOV64_IMM(BPF_REG_3, 4),
78088691e9eSChristoph Hellwig  BPF_MOV64_IMM(BPF_REG_4, 0),
78188691e9eSChristoph Hellwig  BPF_MOV64_IMM(BPF_REG_5, 0),
78288691e9eSChristoph Hellwig  BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
78388691e9eSChristoph Hellwig  BPF_MOV64_IMM(BPF_REG_0, 0),
78488691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
78588691e9eSChristoph Hellwig
78688691e9eSChristoph HellwigError::
78788691e9eSChristoph Hellwig
78888691e9eSChristoph Hellwig  0: (b7) r2 = 0
78988691e9eSChristoph Hellwig  1: (63) *(u32 *)(r10 -8) = r2
79088691e9eSChristoph Hellwig  2: (bf) r2 = r10
79188691e9eSChristoph Hellwig  3: (07) r2 += -8
79288691e9eSChristoph Hellwig  4: (b7) r3 = 4
79388691e9eSChristoph Hellwig  5: (b7) r4 = 0
79488691e9eSChristoph Hellwig  6: (b7) r5 = 0
79588691e9eSChristoph Hellwig  7: (85) call bpf_sk_lookup_tcp#65
79688691e9eSChristoph Hellwig  8: (b7) r0 = 0
79788691e9eSChristoph Hellwig  9: (95) exit
79888691e9eSChristoph Hellwig  Unreleased reference id=1, alloc_insn=7
79988691e9eSChristoph Hellwig
80088691e9eSChristoph HellwigProgram that performs a socket lookup but does not NULL-check the returned
80188691e9eSChristoph Hellwigvalue::
80288691e9eSChristoph Hellwig
80388691e9eSChristoph Hellwig  BPF_MOV64_IMM(BPF_REG_2, 0),
80488691e9eSChristoph Hellwig  BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
80588691e9eSChristoph Hellwig  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
80688691e9eSChristoph Hellwig  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
80788691e9eSChristoph Hellwig  BPF_MOV64_IMM(BPF_REG_3, 4),
80888691e9eSChristoph Hellwig  BPF_MOV64_IMM(BPF_REG_4, 0),
80988691e9eSChristoph Hellwig  BPF_MOV64_IMM(BPF_REG_5, 0),
81088691e9eSChristoph Hellwig  BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
81188691e9eSChristoph Hellwig  BPF_EXIT_INSN(),
81288691e9eSChristoph Hellwig
81388691e9eSChristoph HellwigError::
81488691e9eSChristoph Hellwig
81588691e9eSChristoph Hellwig  0: (b7) r2 = 0
81688691e9eSChristoph Hellwig  1: (63) *(u32 *)(r10 -8) = r2
81788691e9eSChristoph Hellwig  2: (bf) r2 = r10
81888691e9eSChristoph Hellwig  3: (07) r2 += -8
81988691e9eSChristoph Hellwig  4: (b7) r3 = 4
82088691e9eSChristoph Hellwig  5: (b7) r4 = 0
82188691e9eSChristoph Hellwig  6: (b7) r5 = 0
82288691e9eSChristoph Hellwig  7: (85) call bpf_sk_lookup_tcp#65
82388691e9eSChristoph Hellwig  8: (95) exit
82488691e9eSChristoph Hellwig  Unreleased reference id=1, alloc_insn=7
825