188691e9eSChristoph Hellwig 288691e9eSChristoph Hellwig============= 388691e9eSChristoph HellwigeBPF verifier 488691e9eSChristoph Hellwig============= 588691e9eSChristoph Hellwig 688691e9eSChristoph HellwigThe safety of the eBPF program is determined in two steps. 788691e9eSChristoph Hellwig 888691e9eSChristoph HellwigFirst step does DAG check to disallow loops and other CFG validation. 988691e9eSChristoph HellwigIn particular it will detect programs that have unreachable instructions. 1088691e9eSChristoph Hellwig(though classic BPF checker allows them) 1188691e9eSChristoph Hellwig 1288691e9eSChristoph HellwigSecond step starts from the first insn and descends all possible paths. 1388691e9eSChristoph HellwigIt simulates execution of every insn and observes the state change of 1488691e9eSChristoph Hellwigregisters and stack. 1588691e9eSChristoph Hellwig 1688691e9eSChristoph HellwigAt the start of the program the register R1 contains a pointer to context 1788691e9eSChristoph Hellwigand has type PTR_TO_CTX. 1888691e9eSChristoph HellwigIf verifier sees an insn that does R2=R1, then R2 has now type 1988691e9eSChristoph HellwigPTR_TO_CTX as well and can be used on the right hand side of expression. 2088691e9eSChristoph HellwigIf R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=SCALAR_VALUE, 2188691e9eSChristoph Hellwigsince addition of two valid pointers makes invalid pointer. 2288691e9eSChristoph Hellwig(In 'secure' mode verifier will reject any type of pointer arithmetic to make 2388691e9eSChristoph Hellwigsure that kernel addresses don't leak to unprivileged users) 2488691e9eSChristoph Hellwig 2588691e9eSChristoph HellwigIf register was never written to, it's not readable:: 2688691e9eSChristoph Hellwig 2788691e9eSChristoph Hellwig bpf_mov R0 = R2 2888691e9eSChristoph Hellwig bpf_exit 2988691e9eSChristoph Hellwig 3088691e9eSChristoph Hellwigwill be rejected, since R2 is unreadable at the start of the program. 3188691e9eSChristoph Hellwig 3288691e9eSChristoph HellwigAfter kernel function call, R1-R5 are reset to unreadable and 3388691e9eSChristoph HellwigR0 has a return type of the function. 3488691e9eSChristoph Hellwig 3588691e9eSChristoph HellwigSince R6-R9 are callee saved, their state is preserved across the call. 3688691e9eSChristoph Hellwig 3788691e9eSChristoph Hellwig:: 3888691e9eSChristoph Hellwig 3988691e9eSChristoph Hellwig bpf_mov R6 = 1 4088691e9eSChristoph Hellwig bpf_call foo 4188691e9eSChristoph Hellwig bpf_mov R0 = R6 4288691e9eSChristoph Hellwig bpf_exit 4388691e9eSChristoph Hellwig 4488691e9eSChristoph Hellwigis a correct program. If there was R1 instead of R6, it would have 4588691e9eSChristoph Hellwigbeen rejected. 4688691e9eSChristoph Hellwig 4788691e9eSChristoph Hellwigload/store instructions are allowed only with registers of valid types, which 4888691e9eSChristoph Hellwigare PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked. 4988691e9eSChristoph HellwigFor example:: 5088691e9eSChristoph Hellwig 5188691e9eSChristoph Hellwig bpf_mov R1 = 1 5288691e9eSChristoph Hellwig bpf_mov R2 = 2 5388691e9eSChristoph Hellwig bpf_xadd *(u32 *)(R1 + 3) += R2 5488691e9eSChristoph Hellwig bpf_exit 5588691e9eSChristoph Hellwig 5688691e9eSChristoph Hellwigwill be rejected, since R1 doesn't have a valid pointer type at the time of 5788691e9eSChristoph Hellwigexecution of instruction bpf_xadd. 5888691e9eSChristoph Hellwig 5988691e9eSChristoph HellwigAt the start R1 type is PTR_TO_CTX (a pointer to generic ``struct bpf_context``) 6088691e9eSChristoph HellwigA callback is used to customize verifier to restrict eBPF program access to only 6188691e9eSChristoph Hellwigcertain fields within ctx structure with specified size and alignment. 6288691e9eSChristoph Hellwig 6388691e9eSChristoph HellwigFor example, the following insn:: 6488691e9eSChristoph Hellwig 6588691e9eSChristoph Hellwig bpf_ld R0 = *(u32 *)(R6 + 8) 6688691e9eSChristoph Hellwig 6788691e9eSChristoph Hellwigintends to load a word from address R6 + 8 and store it into R0 6888691e9eSChristoph HellwigIf R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know 6988691e9eSChristoph Hellwigthat offset 8 of size 4 bytes can be accessed for reading, otherwise 7088691e9eSChristoph Hellwigthe verifier will reject the program. 7188691e9eSChristoph HellwigIf R6=PTR_TO_STACK, then access should be aligned and be within 7288691e9eSChristoph Hellwigstack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8, 7388691e9eSChristoph Hellwigso it will fail verification, since it's out of bounds. 7488691e9eSChristoph Hellwig 7588691e9eSChristoph HellwigThe verifier will allow eBPF program to read data from stack only after 7688691e9eSChristoph Hellwigit wrote into it. 7788691e9eSChristoph Hellwig 7888691e9eSChristoph HellwigClassic BPF verifier does similar check with M[0-15] memory slots. 7988691e9eSChristoph HellwigFor example:: 8088691e9eSChristoph Hellwig 8188691e9eSChristoph Hellwig bpf_ld R0 = *(u32 *)(R10 - 4) 8288691e9eSChristoph Hellwig bpf_exit 8388691e9eSChristoph Hellwig 8488691e9eSChristoph Hellwigis invalid program. 8588691e9eSChristoph HellwigThough R10 is correct read-only register and has type PTR_TO_STACK 8688691e9eSChristoph Hellwigand R10 - 4 is within stack bounds, there were no stores into that location. 8788691e9eSChristoph Hellwig 8888691e9eSChristoph HellwigPointer register spill/fill is tracked as well, since four (R6-R9) 8988691e9eSChristoph Hellwigcallee saved registers may not be enough for some programs. 9088691e9eSChristoph Hellwig 9188691e9eSChristoph HellwigAllowed function calls are customized with bpf_verifier_ops->get_func_proto() 9288691e9eSChristoph HellwigThe eBPF verifier will check that registers match argument constraints. 9388691e9eSChristoph HellwigAfter the call register R0 will be set to return type of the function. 9488691e9eSChristoph Hellwig 9588691e9eSChristoph HellwigFunction calls is a main mechanism to extend functionality of eBPF programs. 9688691e9eSChristoph HellwigSocket filters may let programs to call one set of functions, whereas tracing 9788691e9eSChristoph Hellwigfilters may allow completely different set. 9888691e9eSChristoph Hellwig 9988691e9eSChristoph HellwigIf a function made accessible to eBPF program, it needs to be thought through 10088691e9eSChristoph Hellwigfrom safety point of view. The verifier will guarantee that the function is 10188691e9eSChristoph Hellwigcalled with valid arguments. 10288691e9eSChristoph Hellwig 10388691e9eSChristoph Hellwigseccomp vs socket filters have different security restrictions for classic BPF. 10488691e9eSChristoph HellwigSeccomp solves this by two stage verifier: classic BPF verifier is followed 10588691e9eSChristoph Hellwigby seccomp verifier. In case of eBPF one configurable verifier is shared for 10688691e9eSChristoph Hellwigall use cases. 10788691e9eSChristoph Hellwig 10888691e9eSChristoph HellwigSee details of eBPF verifier in kernel/bpf/verifier.c 10988691e9eSChristoph Hellwig 11088691e9eSChristoph HellwigRegister value tracking 11188691e9eSChristoph Hellwig======================= 11288691e9eSChristoph Hellwig 11388691e9eSChristoph HellwigIn order to determine the safety of an eBPF program, the verifier must track 11488691e9eSChristoph Hellwigthe range of possible values in each register and also in each stack slot. 11588691e9eSChristoph HellwigThis is done with ``struct bpf_reg_state``, defined in include/linux/ 11688691e9eSChristoph Hellwigbpf_verifier.h, which unifies tracking of scalar and pointer values. Each 11788691e9eSChristoph Hellwigregister state has a type, which is either NOT_INIT (the register has not been 11888691e9eSChristoph Hellwigwritten to), SCALAR_VALUE (some value which is not usable as a pointer), or a 11988691e9eSChristoph Hellwigpointer type. The types of pointers describe their base, as follows: 12088691e9eSChristoph Hellwig 12188691e9eSChristoph Hellwig 12288691e9eSChristoph Hellwig PTR_TO_CTX 12388691e9eSChristoph Hellwig Pointer to bpf_context. 12488691e9eSChristoph Hellwig CONST_PTR_TO_MAP 12588691e9eSChristoph Hellwig Pointer to struct bpf_map. "Const" because arithmetic 12688691e9eSChristoph Hellwig on these pointers is forbidden. 12788691e9eSChristoph Hellwig PTR_TO_MAP_VALUE 12888691e9eSChristoph Hellwig Pointer to the value stored in a map element. 12988691e9eSChristoph Hellwig PTR_TO_MAP_VALUE_OR_NULL 13088691e9eSChristoph Hellwig Either a pointer to a map value, or NULL; map accesses 13188691e9eSChristoph Hellwig (see maps.rst) return this type, which becomes a 13288691e9eSChristoph Hellwig PTR_TO_MAP_VALUE when checked != NULL. Arithmetic on 13388691e9eSChristoph Hellwig these pointers is forbidden. 13488691e9eSChristoph Hellwig PTR_TO_STACK 13588691e9eSChristoph Hellwig Frame pointer. 13688691e9eSChristoph Hellwig PTR_TO_PACKET 13788691e9eSChristoph Hellwig skb->data. 13888691e9eSChristoph Hellwig PTR_TO_PACKET_END 13988691e9eSChristoph Hellwig skb->data + headlen; arithmetic forbidden. 14088691e9eSChristoph Hellwig PTR_TO_SOCKET 14188691e9eSChristoph Hellwig Pointer to struct bpf_sock_ops, implicitly refcounted. 14288691e9eSChristoph Hellwig PTR_TO_SOCKET_OR_NULL 14388691e9eSChristoph Hellwig Either a pointer to a socket, or NULL; socket lookup 14488691e9eSChristoph Hellwig returns this type, which becomes a PTR_TO_SOCKET when 14588691e9eSChristoph Hellwig checked != NULL. PTR_TO_SOCKET is reference-counted, 14688691e9eSChristoph Hellwig so programs must release the reference through the 14788691e9eSChristoph Hellwig socket release function before the end of the program. 14888691e9eSChristoph Hellwig Arithmetic on these pointers is forbidden. 14988691e9eSChristoph Hellwig 15088691e9eSChristoph HellwigHowever, a pointer may be offset from this base (as a result of pointer 15188691e9eSChristoph Hellwigarithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable 15288691e9eSChristoph Hellwigoffset'. The former is used when an exactly-known value (e.g. an immediate 15388691e9eSChristoph Hellwigoperand) is added to a pointer, while the latter is used for values which are 15488691e9eSChristoph Hellwignot exactly known. The variable offset is also used in SCALAR_VALUEs, to track 15588691e9eSChristoph Hellwigthe range of possible values in the register. 15688691e9eSChristoph Hellwig 15788691e9eSChristoph HellwigThe verifier's knowledge about the variable offset consists of: 15888691e9eSChristoph Hellwig 15988691e9eSChristoph Hellwig* minimum and maximum values as unsigned 16088691e9eSChristoph Hellwig* minimum and maximum values as signed 16188691e9eSChristoph Hellwig 16288691e9eSChristoph Hellwig* knowledge of the values of individual bits, in the form of a 'tnum': a u64 16388691e9eSChristoph Hellwig 'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown; 16488691e9eSChristoph Hellwig 1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both 16588691e9eSChristoph Hellwig mask and value; no bit should ever be 1 in both. For example, if a byte is read 16688691e9eSChristoph Hellwig into a register from memory, the register's top 56 bits are known zero, while 16788691e9eSChristoph Hellwig the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we 16888691e9eSChristoph Hellwig then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0; 16988691e9eSChristoph Hellwig 0x1ff), because of potential carries. 17088691e9eSChristoph Hellwig 17188691e9eSChristoph HellwigBesides arithmetic, the register state can also be updated by conditional 17288691e9eSChristoph Hellwigbranches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch 17388691e9eSChristoph Hellwigit will have a umin_value (unsigned minimum value) of 9, whereas in the 'false' 17488691e9eSChristoph Hellwigbranch it will have a umax_value of 8. A signed compare (with BPF_JSGT or 17588691e9eSChristoph HellwigBPF_JSGE) would instead update the signed minimum/maximum values. Information 17688691e9eSChristoph Hellwigfrom the signed and unsigned bounds can be combined; for instance if a value is 17788691e9eSChristoph Hellwigfirst tested < 8 and then tested s> 4, the verifier will conclude that the value 17888691e9eSChristoph Hellwigis also > 4 and s< 8, since the bounds prevent crossing the sign boundary. 17988691e9eSChristoph Hellwig 18088691e9eSChristoph HellwigPTR_TO_PACKETs with a variable offset part have an 'id', which is common to all 18188691e9eSChristoph Hellwigpointers sharing that same variable offset. This is important for packet range 18288691e9eSChristoph Hellwigchecks: after adding a variable to a packet pointer register A, if you then copy 18388691e9eSChristoph Hellwigit to another register B and then add a constant 4 to A, both registers will 18488691e9eSChristoph Hellwigshare the same 'id' but the A will have a fixed offset of +4. Then if A is 18588691e9eSChristoph Hellwigbounds-checked and found to be less than a PTR_TO_PACKET_END, the register B is 18688691e9eSChristoph Hellwignow known to have a safe range of at least 4 bytes. See 'Direct packet access', 18788691e9eSChristoph Hellwigbelow, for more on PTR_TO_PACKET ranges. 18888691e9eSChristoph Hellwig 18988691e9eSChristoph HellwigThe 'id' field is also used on PTR_TO_MAP_VALUE_OR_NULL, common to all copies of 19088691e9eSChristoph Hellwigthe pointer returned from a map lookup. This means that when one copy is 19188691e9eSChristoph Hellwigchecked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs. 19288691e9eSChristoph HellwigAs well as range-checking, the tracked information is also used for enforcing 19388691e9eSChristoph Hellwigalignment of pointer accesses. For instance, on most systems the packet pointer 19488691e9eSChristoph Hellwigis 2 bytes after a 4-byte alignment. If a program adds 14 bytes to that to jump 1951d3cab43SRandy Dunlapover the Ethernet header, then reads IHL and adds (IHL * 4), the resulting 19688691e9eSChristoph Hellwigpointer will have a variable offset known to be 4n+2 for some n, so adding the 2 19788691e9eSChristoph Hellwigbytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through 19888691e9eSChristoph Hellwigthat pointer are safe. 19988691e9eSChristoph HellwigThe 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common 20088691e9eSChristoph Hellwigto all copies of the pointer returned from a socket lookup. This has similar 20188691e9eSChristoph Hellwigbehaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but 20288691e9eSChristoph Hellwigit also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly 20388691e9eSChristoph Hellwigrepresents a reference to the corresponding ``struct sock``. To ensure that the 20488691e9eSChristoph Hellwigreference is not leaked, it is imperative to NULL-check the reference and in 20588691e9eSChristoph Hellwigthe non-NULL case, and pass the valid reference to the socket release function. 20688691e9eSChristoph Hellwig 20788691e9eSChristoph HellwigDirect packet access 20888691e9eSChristoph Hellwig==================== 20988691e9eSChristoph Hellwig 21088691e9eSChristoph HellwigIn cls_bpf and act_bpf programs the verifier allows direct access to the packet 21188691e9eSChristoph Hellwigdata via skb->data and skb->data_end pointers. 21288691e9eSChristoph HellwigEx:: 21388691e9eSChristoph Hellwig 21488691e9eSChristoph Hellwig 1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */ 21588691e9eSChristoph Hellwig 2: r3 = *(u32 *)(r1 +76) /* load skb->data */ 21688691e9eSChristoph Hellwig 3: r5 = r3 21788691e9eSChristoph Hellwig 4: r5 += 14 21888691e9eSChristoph Hellwig 5: if r5 > r4 goto pc+16 21988691e9eSChristoph Hellwig R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp 22088691e9eSChristoph Hellwig 6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */ 22188691e9eSChristoph Hellwig 22288691e9eSChristoph Hellwigthis 2byte load from the packet is safe to do, since the program author 22388691e9eSChristoph Hellwigdid check ``if (skb->data + 14 > skb->data_end) goto err`` at insn #5 which 22488691e9eSChristoph Hellwigmeans that in the fall-through case the register R3 (which points to skb->data) 22588691e9eSChristoph Hellwighas at least 14 directly accessible bytes. The verifier marks it 22688691e9eSChristoph Hellwigas R3=pkt(id=0,off=0,r=14). 22788691e9eSChristoph Hellwigid=0 means that no additional variables were added to the register. 22888691e9eSChristoph Hellwigoff=0 means that no additional constants were added. 22988691e9eSChristoph Hellwigr=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok. 23088691e9eSChristoph HellwigNote that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points 23188691e9eSChristoph Hellwigto the packet data, but constant 14 was added to the register, so 23288691e9eSChristoph Hellwigit now points to ``skb->data + 14`` and accessible range is [R5, R5 + 14 - 14) 23388691e9eSChristoph Hellwigwhich is zero bytes. 23488691e9eSChristoph Hellwig 23588691e9eSChristoph HellwigMore complex packet access may look like:: 23688691e9eSChristoph Hellwig 23788691e9eSChristoph Hellwig 23888691e9eSChristoph Hellwig R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp 23988691e9eSChristoph Hellwig 6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */ 24088691e9eSChristoph Hellwig 7: r4 = *(u8 *)(r3 +12) 24188691e9eSChristoph Hellwig 8: r4 *= 14 24288691e9eSChristoph Hellwig 9: r3 = *(u32 *)(r1 +76) /* load skb->data */ 24388691e9eSChristoph Hellwig 10: r3 += r4 24488691e9eSChristoph Hellwig 11: r2 = r1 24588691e9eSChristoph Hellwig 12: r2 <<= 48 24688691e9eSChristoph Hellwig 13: r2 >>= 48 24788691e9eSChristoph Hellwig 14: r3 += r2 24888691e9eSChristoph Hellwig 15: r2 = r3 24988691e9eSChristoph Hellwig 16: r2 += 8 25088691e9eSChristoph Hellwig 17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */ 25188691e9eSChristoph Hellwig 18: if r2 > r1 goto pc+2 25288691e9eSChristoph Hellwig R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp 25388691e9eSChristoph Hellwig 19: r1 = *(u8 *)(r3 +4) 25488691e9eSChristoph Hellwig 25588691e9eSChristoph HellwigThe state of the register R3 is R3=pkt(id=2,off=0,r=8) 25688691e9eSChristoph Hellwigid=2 means that two ``r3 += rX`` instructions were seen, so r3 points to some 25788691e9eSChristoph Hellwigoffset within a packet and since the program author did 25888691e9eSChristoph Hellwig``if (r3 + 8 > r1) goto err`` at insn #18, the safe range is [R3, R3 + 8). 25988691e9eSChristoph HellwigThe verifier only allows 'add'/'sub' operations on packet registers. Any other 26088691e9eSChristoph Hellwigoperation will set the register state to 'SCALAR_VALUE' and it won't be 26188691e9eSChristoph Hellwigavailable for direct packet access. 26288691e9eSChristoph Hellwig 26388691e9eSChristoph HellwigOperation ``r3 += rX`` may overflow and become less than original skb->data, 26488691e9eSChristoph Hellwigtherefore the verifier has to prevent that. So when it sees ``r3 += rX`` 26588691e9eSChristoph Hellwiginstruction and rX is more than 16-bit value, any subsequent bounds-check of r3 26688691e9eSChristoph Hellwigagainst skb->data_end will not give us 'range' information, so attempts to read 26788691e9eSChristoph Hellwigthrough the pointer will give "invalid access to packet" error. 26888691e9eSChristoph Hellwig 26988691e9eSChristoph HellwigEx. after insn ``r4 = *(u8 *)(r3 +12)`` (insn #7 above) the state of r4 is 27088691e9eSChristoph HellwigR4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits 27188691e9eSChristoph Hellwigof the register are guaranteed to be zero, and nothing is known about the lower 27288691e9eSChristoph Hellwig8 bits. After insn ``r4 *= 14`` the state becomes 27388691e9eSChristoph HellwigR4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit 27488691e9eSChristoph Hellwigvalue by constant 14 will keep upper 52 bits as zero, also the least significant 27588691e9eSChristoph Hellwigbit will be zero as 14 is even. Similarly ``r2 >>= 48`` will make 27688691e9eSChristoph HellwigR2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign 27788691e9eSChristoph Hellwigextending. This logic is implemented in adjust_reg_min_max_vals() function, 27888691e9eSChristoph Hellwigwhich calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice 27988691e9eSChristoph Hellwigversa) and adjust_scalar_min_max_vals() for operations on two scalars. 28088691e9eSChristoph Hellwig 28188691e9eSChristoph HellwigThe end result is that bpf program author can access packet directly 28288691e9eSChristoph Hellwigusing normal C code as:: 28388691e9eSChristoph Hellwig 28488691e9eSChristoph Hellwig void *data = (void *)(long)skb->data; 28588691e9eSChristoph Hellwig void *data_end = (void *)(long)skb->data_end; 28688691e9eSChristoph Hellwig struct eth_hdr *eth = data; 28788691e9eSChristoph Hellwig struct iphdr *iph = data + sizeof(*eth); 28888691e9eSChristoph Hellwig struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph); 28988691e9eSChristoph Hellwig 29088691e9eSChristoph Hellwig if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end) 29188691e9eSChristoph Hellwig return 0; 29288691e9eSChristoph Hellwig if (eth->h_proto != htons(ETH_P_IP)) 29388691e9eSChristoph Hellwig return 0; 29488691e9eSChristoph Hellwig if (iph->protocol != IPPROTO_UDP || iph->ihl != 5) 29588691e9eSChristoph Hellwig return 0; 29688691e9eSChristoph Hellwig if (udp->dest == 53 || udp->source == 9) 29788691e9eSChristoph Hellwig ...; 29888691e9eSChristoph Hellwig 29988691e9eSChristoph Hellwigwhich makes such programs easier to write comparing to LD_ABS insn 30088691e9eSChristoph Hellwigand significantly faster. 30188691e9eSChristoph Hellwig 30288691e9eSChristoph HellwigPruning 30388691e9eSChristoph Hellwig======= 30488691e9eSChristoph Hellwig 30588691e9eSChristoph HellwigThe verifier does not actually walk all possible paths through the program. For 30688691e9eSChristoph Hellwigeach new branch to analyse, the verifier looks at all the states it's previously 30788691e9eSChristoph Hellwigbeen in when at this instruction. If any of them contain the current state as a 30888691e9eSChristoph Hellwigsubset, the branch is 'pruned' - that is, the fact that the previous state was 30988691e9eSChristoph Hellwigaccepted implies the current state would be as well. For instance, if in the 31088691e9eSChristoph Hellwigprevious state, r1 held a packet-pointer, and in the current state, r1 holds a 31188691e9eSChristoph Hellwigpacket-pointer with a range as long or longer and at least as strict an 31288691e9eSChristoph Hellwigalignment, then r1 is safe. Similarly, if r2 was NOT_INIT before then it can't 31388691e9eSChristoph Hellwighave been used by any path from that point, so any value in r2 (including 31488691e9eSChristoph Hellwiganother NOT_INIT) is safe. The implementation is in the function regsafe(). 31588691e9eSChristoph HellwigPruning considers not only the registers but also the stack (and any spilled 31688691e9eSChristoph Hellwigregisters it may hold). They must all be safe for the branch to be pruned. 31788691e9eSChristoph HellwigThis is implemented in states_equal(). 31888691e9eSChristoph Hellwig 319*cb601848SEduard ZingermanSome technical details about state pruning implementation could be found below. 320*cb601848SEduard Zingerman 321*cb601848SEduard ZingermanRegister liveness tracking 322*cb601848SEduard Zingerman-------------------------- 323*cb601848SEduard Zingerman 324*cb601848SEduard ZingermanIn order to make state pruning effective, liveness state is tracked for each 325*cb601848SEduard Zingermanregister and stack slot. The basic idea is to track which registers and stack 326*cb601848SEduard Zingermanslots are actually used during subseqeuent execution of the program, until 327*cb601848SEduard Zingermanprogram exit is reached. Registers and stack slots that were never used could be 328*cb601848SEduard Zingermanremoved from the cached state thus making more states equivalent to a cached 329*cb601848SEduard Zingermanstate. This could be illustrated by the following program:: 330*cb601848SEduard Zingerman 331*cb601848SEduard Zingerman 0: call bpf_get_prandom_u32() 332*cb601848SEduard Zingerman 1: r1 = 0 333*cb601848SEduard Zingerman 2: if r0 == 0 goto +1 334*cb601848SEduard Zingerman 3: r0 = 1 335*cb601848SEduard Zingerman --- checkpoint --- 336*cb601848SEduard Zingerman 4: r0 = r1 337*cb601848SEduard Zingerman 5: exit 338*cb601848SEduard Zingerman 339*cb601848SEduard ZingermanSuppose that a state cache entry is created at instruction #4 (such entries are 340*cb601848SEduard Zingermanalso called "checkpoints" in the text below). The verifier could reach the 341*cb601848SEduard Zingermaninstruction with one of two possible register states: 342*cb601848SEduard Zingerman 343*cb601848SEduard Zingerman* r0 = 1, r1 = 0 344*cb601848SEduard Zingerman* r0 = 0, r1 = 0 345*cb601848SEduard Zingerman 346*cb601848SEduard ZingermanHowever, only the value of register ``r1`` is important to successfully finish 347*cb601848SEduard Zingermanverification. The goal of the liveness tracking algorithm is to spot this fact 348*cb601848SEduard Zingermanand figure out that both states are actually equivalent. 349*cb601848SEduard Zingerman 350*cb601848SEduard ZingermanData structures 351*cb601848SEduard Zingerman~~~~~~~~~~~~~~~ 352*cb601848SEduard Zingerman 353*cb601848SEduard ZingermanLiveness is tracked using the following data structures:: 354*cb601848SEduard Zingerman 355*cb601848SEduard Zingerman enum bpf_reg_liveness { 356*cb601848SEduard Zingerman REG_LIVE_NONE = 0, 357*cb601848SEduard Zingerman REG_LIVE_READ32 = 0x1, 358*cb601848SEduard Zingerman REG_LIVE_READ64 = 0x2, 359*cb601848SEduard Zingerman REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64, 360*cb601848SEduard Zingerman REG_LIVE_WRITTEN = 0x4, 361*cb601848SEduard Zingerman REG_LIVE_DONE = 0x8, 362*cb601848SEduard Zingerman }; 363*cb601848SEduard Zingerman 364*cb601848SEduard Zingerman struct bpf_reg_state { 365*cb601848SEduard Zingerman ... 366*cb601848SEduard Zingerman struct bpf_reg_state *parent; 367*cb601848SEduard Zingerman ... 368*cb601848SEduard Zingerman enum bpf_reg_liveness live; 369*cb601848SEduard Zingerman ... 370*cb601848SEduard Zingerman }; 371*cb601848SEduard Zingerman 372*cb601848SEduard Zingerman struct bpf_stack_state { 373*cb601848SEduard Zingerman struct bpf_reg_state spilled_ptr; 374*cb601848SEduard Zingerman ... 375*cb601848SEduard Zingerman }; 376*cb601848SEduard Zingerman 377*cb601848SEduard Zingerman struct bpf_func_state { 378*cb601848SEduard Zingerman struct bpf_reg_state regs[MAX_BPF_REG]; 379*cb601848SEduard Zingerman ... 380*cb601848SEduard Zingerman struct bpf_stack_state *stack; 381*cb601848SEduard Zingerman } 382*cb601848SEduard Zingerman 383*cb601848SEduard Zingerman struct bpf_verifier_state { 384*cb601848SEduard Zingerman struct bpf_func_state *frame[MAX_CALL_FRAMES]; 385*cb601848SEduard Zingerman struct bpf_verifier_state *parent; 386*cb601848SEduard Zingerman ... 387*cb601848SEduard Zingerman } 388*cb601848SEduard Zingerman 389*cb601848SEduard Zingerman* ``REG_LIVE_NONE`` is an initial value assigned to ``->live`` fields upon new 390*cb601848SEduard Zingerman verifier state creation; 391*cb601848SEduard Zingerman 392*cb601848SEduard Zingerman* ``REG_LIVE_WRITTEN`` means that the value of the register (or stack slot) is 393*cb601848SEduard Zingerman defined by some instruction verified between this verifier state's parent and 394*cb601848SEduard Zingerman verifier state itself; 395*cb601848SEduard Zingerman 396*cb601848SEduard Zingerman* ``REG_LIVE_READ{32,64}`` means that the value of the register (or stack slot) 397*cb601848SEduard Zingerman is read by a some child state of this verifier state; 398*cb601848SEduard Zingerman 399*cb601848SEduard Zingerman* ``REG_LIVE_DONE`` is a marker used by ``clean_verifier_state()`` to avoid 400*cb601848SEduard Zingerman processing same verifier state multiple times and for some sanity checks; 401*cb601848SEduard Zingerman 402*cb601848SEduard Zingerman* ``->live`` field values are formed by combining ``enum bpf_reg_liveness`` 403*cb601848SEduard Zingerman values using bitwise or. 404*cb601848SEduard Zingerman 405*cb601848SEduard ZingermanRegister parentage chains 406*cb601848SEduard Zingerman~~~~~~~~~~~~~~~~~~~~~~~~~ 407*cb601848SEduard Zingerman 408*cb601848SEduard ZingermanIn order to propagate information between parent and child states, a *register 409*cb601848SEduard Zingermanparentage chain* is established. Each register or stack slot is linked to a 410*cb601848SEduard Zingermancorresponding register or stack slot in its parent state via a ``->parent`` 411*cb601848SEduard Zingermanpointer. This link is established upon state creation in ``is_state_visited()`` 412*cb601848SEduard Zingermanand might be modified by ``set_callee_state()`` called from 413*cb601848SEduard Zingerman``__check_func_call()``. 414*cb601848SEduard Zingerman 415*cb601848SEduard ZingermanThe rules for correspondence between registers / stack slots are as follows: 416*cb601848SEduard Zingerman 417*cb601848SEduard Zingerman* For the current stack frame, registers and stack slots of the new state are 418*cb601848SEduard Zingerman linked to the registers and stack slots of the parent state with the same 419*cb601848SEduard Zingerman indices. 420*cb601848SEduard Zingerman 421*cb601848SEduard Zingerman* For the outer stack frames, only caller saved registers (r6-r9) and stack 422*cb601848SEduard Zingerman slots are linked to the registers and stack slots of the parent state with the 423*cb601848SEduard Zingerman same indices. 424*cb601848SEduard Zingerman 425*cb601848SEduard Zingerman* When function call is processed a new ``struct bpf_func_state`` instance is 426*cb601848SEduard Zingerman allocated, it encapsulates a new set of registers and stack slots. For this 427*cb601848SEduard Zingerman new frame, parent links for r6-r9 and stack slots are set to nil, parent links 428*cb601848SEduard Zingerman for r1-r5 are set to match caller r1-r5 parent links. 429*cb601848SEduard Zingerman 430*cb601848SEduard ZingermanThis could be illustrated by the following diagram (arrows stand for 431*cb601848SEduard Zingerman``->parent`` pointers):: 432*cb601848SEduard Zingerman 433*cb601848SEduard Zingerman ... ; Frame #0, some instructions 434*cb601848SEduard Zingerman --- checkpoint #0 --- 435*cb601848SEduard Zingerman 1 : r6 = 42 ; Frame #0 436*cb601848SEduard Zingerman --- checkpoint #1 --- 437*cb601848SEduard Zingerman 2 : call foo() ; Frame #0 438*cb601848SEduard Zingerman ... ; Frame #1, instructions from foo() 439*cb601848SEduard Zingerman --- checkpoint #2 --- 440*cb601848SEduard Zingerman ... ; Frame #1, instructions from foo() 441*cb601848SEduard Zingerman --- checkpoint #3 --- 442*cb601848SEduard Zingerman exit ; Frame #1, return from foo() 443*cb601848SEduard Zingerman 3 : r1 = r6 ; Frame #0 <- current state 444*cb601848SEduard Zingerman 445*cb601848SEduard Zingerman +-------------------------------+-------------------------------+ 446*cb601848SEduard Zingerman | Frame #0 | Frame #1 | 447*cb601848SEduard Zingerman Checkpoint +-------------------------------+-------------------------------+ 448*cb601848SEduard Zingerman #0 | r0 | r1-r5 | r6-r9 | fp-8 ... | 449*cb601848SEduard Zingerman +-------------------------------+ 450*cb601848SEduard Zingerman ^ ^ ^ ^ 451*cb601848SEduard Zingerman | | | | 452*cb601848SEduard Zingerman Checkpoint +-------------------------------+ 453*cb601848SEduard Zingerman #1 | r0 | r1-r5 | r6-r9 | fp-8 ... | 454*cb601848SEduard Zingerman +-------------------------------+ 455*cb601848SEduard Zingerman ^ ^ ^ 456*cb601848SEduard Zingerman |_______|_______|_______________ 457*cb601848SEduard Zingerman | | | 458*cb601848SEduard Zingerman nil nil | | | nil nil 459*cb601848SEduard Zingerman | | | | | | | 460*cb601848SEduard Zingerman Checkpoint +-------------------------------+-------------------------------+ 461*cb601848SEduard Zingerman #2 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... | 462*cb601848SEduard Zingerman +-------------------------------+-------------------------------+ 463*cb601848SEduard Zingerman ^ ^ ^ ^ ^ 464*cb601848SEduard Zingerman nil nil | | | | | 465*cb601848SEduard Zingerman | | | | | | | 466*cb601848SEduard Zingerman Checkpoint +-------------------------------+-------------------------------+ 467*cb601848SEduard Zingerman #3 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... | 468*cb601848SEduard Zingerman +-------------------------------+-------------------------------+ 469*cb601848SEduard Zingerman ^ ^ 470*cb601848SEduard Zingerman nil nil | | 471*cb601848SEduard Zingerman | | | | 472*cb601848SEduard Zingerman Current +-------------------------------+ 473*cb601848SEduard Zingerman state | r0 | r1-r5 | r6-r9 | fp-8 ... | 474*cb601848SEduard Zingerman +-------------------------------+ 475*cb601848SEduard Zingerman \ 476*cb601848SEduard Zingerman r6 read mark is propagated via these links 477*cb601848SEduard Zingerman all the way up to checkpoint #1. 478*cb601848SEduard Zingerman The checkpoint #1 contains a write mark for r6 479*cb601848SEduard Zingerman because of instruction (1), thus read propagation 480*cb601848SEduard Zingerman does not reach checkpoint #0 (see section below). 481*cb601848SEduard Zingerman 482*cb601848SEduard ZingermanLiveness marks tracking 483*cb601848SEduard Zingerman~~~~~~~~~~~~~~~~~~~~~~~ 484*cb601848SEduard Zingerman 485*cb601848SEduard ZingermanFor each processed instruction, the verifier tracks read and written registers 486*cb601848SEduard Zingermanand stack slots. The main idea of the algorithm is that read marks propagate 487*cb601848SEduard Zingermanback along the state parentage chain until they hit a write mark, which 'screens 488*cb601848SEduard Zingermanoff' earlier states from the read. The information about reads is propagated by 489*cb601848SEduard Zingermanfunction ``mark_reg_read()`` which could be summarized as follows:: 490*cb601848SEduard Zingerman 491*cb601848SEduard Zingerman mark_reg_read(struct bpf_reg_state *state, ...): 492*cb601848SEduard Zingerman parent = state->parent 493*cb601848SEduard Zingerman while parent: 494*cb601848SEduard Zingerman if state->live & REG_LIVE_WRITTEN: 495*cb601848SEduard Zingerman break 496*cb601848SEduard Zingerman if parent->live & REG_LIVE_READ64: 497*cb601848SEduard Zingerman break 498*cb601848SEduard Zingerman parent->live |= REG_LIVE_READ64 499*cb601848SEduard Zingerman state = parent 500*cb601848SEduard Zingerman parent = state->parent 501*cb601848SEduard Zingerman 502*cb601848SEduard ZingermanNotes: 503*cb601848SEduard Zingerman 504*cb601848SEduard Zingerman* The read marks are applied to the **parent** state while write marks are 505*cb601848SEduard Zingerman applied to the **current** state. The write mark on a register or stack slot 506*cb601848SEduard Zingerman means that it is updated by some instruction in the straight-line code leading 507*cb601848SEduard Zingerman from the parent state to the current state. 508*cb601848SEduard Zingerman 509*cb601848SEduard Zingerman* Details about REG_LIVE_READ32 are omitted. 510*cb601848SEduard Zingerman 511*cb601848SEduard Zingerman* Function ``propagate_liveness()`` (see section :ref:`read_marks_for_cache_hits`) 512*cb601848SEduard Zingerman might override the first parent link. Please refer to the comments in the 513*cb601848SEduard Zingerman ``propagate_liveness()`` and ``mark_reg_read()`` source code for further 514*cb601848SEduard Zingerman details. 515*cb601848SEduard Zingerman 516*cb601848SEduard ZingermanBecause stack writes could have different sizes ``REG_LIVE_WRITTEN`` marks are 517*cb601848SEduard Zingermanapplied conservatively: stack slots are marked as written only if write size 518*cb601848SEduard Zingermancorresponds to the size of the register, e.g. see function ``save_register_state()``. 519*cb601848SEduard Zingerman 520*cb601848SEduard ZingermanConsider the following example:: 521*cb601848SEduard Zingerman 522*cb601848SEduard Zingerman 0: (*u64)(r10 - 8) = 0 ; define 8 bytes of fp-8 523*cb601848SEduard Zingerman --- checkpoint #0 --- 524*cb601848SEduard Zingerman 1: (*u32)(r10 - 8) = 1 ; redefine lower 4 bytes 525*cb601848SEduard Zingerman 2: r1 = (*u32)(r10 - 8) ; read lower 4 bytes defined at (1) 526*cb601848SEduard Zingerman 3: r2 = (*u32)(r10 - 4) ; read upper 4 bytes defined at (0) 527*cb601848SEduard Zingerman 528*cb601848SEduard ZingermanAs stated above, the write at (1) does not count as ``REG_LIVE_WRITTEN``. Should 529*cb601848SEduard Zingermanit be otherwise, the algorithm above wouldn't be able to propagate the read mark 530*cb601848SEduard Zingermanfrom (3) to checkpoint #0. 531*cb601848SEduard Zingerman 532*cb601848SEduard ZingermanOnce the ``BPF_EXIT`` instruction is reached ``update_branch_counts()`` is 533*cb601848SEduard Zingermancalled to update the ``->branches`` counter for each verifier state in a chain 534*cb601848SEduard Zingermanof parent verifier states. When the ``->branches`` counter reaches zero the 535*cb601848SEduard Zingermanverifier state becomes a valid entry in a set of cached verifier states. 536*cb601848SEduard Zingerman 537*cb601848SEduard ZingermanEach entry of the verifier states cache is post-processed by a function 538*cb601848SEduard Zingerman``clean_live_states()``. This function marks all registers and stack slots 539*cb601848SEduard Zingermanwithout ``REG_LIVE_READ{32,64}`` marks as ``NOT_INIT`` or ``STACK_INVALID``. 540*cb601848SEduard ZingermanRegisters/stack slots marked in this way are ignored in function ``stacksafe()`` 541*cb601848SEduard Zingermancalled from ``states_equal()`` when a state cache entry is considered for 542*cb601848SEduard Zingermanequivalence with a current state. 543*cb601848SEduard Zingerman 544*cb601848SEduard ZingermanNow it is possible to explain how the example from the beginning of the section 545*cb601848SEduard Zingermanworks:: 546*cb601848SEduard Zingerman 547*cb601848SEduard Zingerman 0: call bpf_get_prandom_u32() 548*cb601848SEduard Zingerman 1: r1 = 0 549*cb601848SEduard Zingerman 2: if r0 == 0 goto +1 550*cb601848SEduard Zingerman 3: r0 = 1 551*cb601848SEduard Zingerman --- checkpoint[0] --- 552*cb601848SEduard Zingerman 4: r0 = r1 553*cb601848SEduard Zingerman 5: exit 554*cb601848SEduard Zingerman 555*cb601848SEduard Zingerman* At instruction #2 branching point is reached and state ``{ r0 == 0, r1 == 0, pc == 4 }`` 556*cb601848SEduard Zingerman is pushed to states processing queue (pc stands for program counter). 557*cb601848SEduard Zingerman 558*cb601848SEduard Zingerman* At instruction #4: 559*cb601848SEduard Zingerman 560*cb601848SEduard Zingerman * ``checkpoint[0]`` states cache entry is created: ``{ r0 == 1, r1 == 0, pc == 4 }``; 561*cb601848SEduard Zingerman * ``checkpoint[0].r0`` is marked as written; 562*cb601848SEduard Zingerman * ``checkpoint[0].r1`` is marked as read; 563*cb601848SEduard Zingerman 564*cb601848SEduard Zingerman* At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed 565*cb601848SEduard Zingerman by ``clean_live_states()``. After this processing ``checkpoint[0].r0`` has a 566*cb601848SEduard Zingerman read mark and all other registers and stack slots are marked as ``NOT_INIT`` 567*cb601848SEduard Zingerman or ``STACK_INVALID`` 568*cb601848SEduard Zingerman 569*cb601848SEduard Zingerman* The state ``{ r0 == 0, r1 == 0, pc == 4 }`` is popped from the states queue 570*cb601848SEduard Zingerman and is compared against a cached state ``{ r1 == 0, pc == 4 }``, the states 571*cb601848SEduard Zingerman are considered equivalent. 572*cb601848SEduard Zingerman 573*cb601848SEduard Zingerman.. _read_marks_for_cache_hits: 574*cb601848SEduard Zingerman 575*cb601848SEduard ZingermanRead marks propagation for cache hits 576*cb601848SEduard Zingerman~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 577*cb601848SEduard Zingerman 578*cb601848SEduard ZingermanAnother point is the handling of read marks when a previously verified state is 579*cb601848SEduard Zingermanfound in the states cache. Upon cache hit verifier must behave in the same way 580*cb601848SEduard Zingermanas if the current state was verified to the program exit. This means that all 581*cb601848SEduard Zingermanread marks, present on registers and stack slots of the cached state, must be 582*cb601848SEduard Zingermanpropagated over the parentage chain of the current state. Example below shows 583*cb601848SEduard Zingermanwhy this is important. Function ``propagate_liveness()`` handles this case. 584*cb601848SEduard Zingerman 585*cb601848SEduard ZingermanConsider the following state parentage chain (S is a starting state, A-E are 586*cb601848SEduard Zingermanderived states, -> arrows show which state is derived from which):: 587*cb601848SEduard Zingerman 588*cb601848SEduard Zingerman r1 read 589*cb601848SEduard Zingerman <------------- A[r1] == 0 590*cb601848SEduard Zingerman C[r1] == 0 591*cb601848SEduard Zingerman S ---> A ---> B ---> exit E[r1] == 1 592*cb601848SEduard Zingerman | 593*cb601848SEduard Zingerman ` ---> C ---> D 594*cb601848SEduard Zingerman | 595*cb601848SEduard Zingerman ` ---> E ^ 596*cb601848SEduard Zingerman |___ suppose all these 597*cb601848SEduard Zingerman ^ states are at insn #Y 598*cb601848SEduard Zingerman | 599*cb601848SEduard Zingerman suppose all these 600*cb601848SEduard Zingerman states are at insn #X 601*cb601848SEduard Zingerman 602*cb601848SEduard Zingerman* Chain of states ``S -> A -> B -> exit`` is verified first. 603*cb601848SEduard Zingerman 604*cb601848SEduard Zingerman* While ``B -> exit`` is verified, register ``r1`` is read and this read mark is 605*cb601848SEduard Zingerman propagated up to state ``A``. 606*cb601848SEduard Zingerman 607*cb601848SEduard Zingerman* When chain of states ``C -> D`` is verified the state ``D`` turns out to be 608*cb601848SEduard Zingerman equivalent to state ``B``. 609*cb601848SEduard Zingerman 610*cb601848SEduard Zingerman* The read mark for ``r1`` has to be propagated to state ``C``, otherwise state 611*cb601848SEduard Zingerman ``C`` might get mistakenly marked as equivalent to state ``E`` even though 612*cb601848SEduard Zingerman values for register ``r1`` differ between ``C`` and ``E``. 613*cb601848SEduard Zingerman 61488691e9eSChristoph HellwigUnderstanding eBPF verifier messages 61588691e9eSChristoph Hellwig==================================== 61688691e9eSChristoph Hellwig 61788691e9eSChristoph HellwigThe following are few examples of invalid eBPF programs and verifier error 61888691e9eSChristoph Hellwigmessages as seen in the log: 61988691e9eSChristoph Hellwig 62088691e9eSChristoph HellwigProgram with unreachable instructions:: 62188691e9eSChristoph Hellwig 62288691e9eSChristoph Hellwig static struct bpf_insn prog[] = { 62388691e9eSChristoph Hellwig BPF_EXIT_INSN(), 62488691e9eSChristoph Hellwig BPF_EXIT_INSN(), 62588691e9eSChristoph Hellwig }; 62688691e9eSChristoph Hellwig 62743429ea7SWan JiabingError:: 62888691e9eSChristoph Hellwig 62988691e9eSChristoph Hellwig unreachable insn 1 63088691e9eSChristoph Hellwig 63188691e9eSChristoph HellwigProgram that reads uninitialized register:: 63288691e9eSChristoph Hellwig 63388691e9eSChristoph Hellwig BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), 63488691e9eSChristoph Hellwig BPF_EXIT_INSN(), 63588691e9eSChristoph Hellwig 63688691e9eSChristoph HellwigError:: 63788691e9eSChristoph Hellwig 63888691e9eSChristoph Hellwig 0: (bf) r0 = r2 63988691e9eSChristoph Hellwig R2 !read_ok 64088691e9eSChristoph Hellwig 64188691e9eSChristoph HellwigProgram that doesn't initialize R0 before exiting:: 64288691e9eSChristoph Hellwig 64388691e9eSChristoph Hellwig BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), 64488691e9eSChristoph Hellwig BPF_EXIT_INSN(), 64588691e9eSChristoph Hellwig 64688691e9eSChristoph HellwigError:: 64788691e9eSChristoph Hellwig 64888691e9eSChristoph Hellwig 0: (bf) r2 = r1 64988691e9eSChristoph Hellwig 1: (95) exit 65088691e9eSChristoph Hellwig R0 !read_ok 65188691e9eSChristoph Hellwig 65288691e9eSChristoph HellwigProgram that accesses stack out of bounds:: 65388691e9eSChristoph Hellwig 65488691e9eSChristoph Hellwig BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0), 65588691e9eSChristoph Hellwig BPF_EXIT_INSN(), 65688691e9eSChristoph Hellwig 65788691e9eSChristoph HellwigError:: 65888691e9eSChristoph Hellwig 65988691e9eSChristoph Hellwig 0: (7a) *(u64 *)(r10 +8) = 0 66088691e9eSChristoph Hellwig invalid stack off=8 size=8 66188691e9eSChristoph Hellwig 66288691e9eSChristoph HellwigProgram that doesn't initialize stack before passing its address into function:: 66388691e9eSChristoph Hellwig 66488691e9eSChristoph Hellwig BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 66588691e9eSChristoph Hellwig BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 66688691e9eSChristoph Hellwig BPF_LD_MAP_FD(BPF_REG_1, 0), 66788691e9eSChristoph Hellwig BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 66888691e9eSChristoph Hellwig BPF_EXIT_INSN(), 66988691e9eSChristoph Hellwig 67088691e9eSChristoph HellwigError:: 67188691e9eSChristoph Hellwig 67288691e9eSChristoph Hellwig 0: (bf) r2 = r10 67388691e9eSChristoph Hellwig 1: (07) r2 += -8 67488691e9eSChristoph Hellwig 2: (b7) r1 = 0x0 67588691e9eSChristoph Hellwig 3: (85) call 1 67688691e9eSChristoph Hellwig invalid indirect read from stack off -8+0 size 8 67788691e9eSChristoph Hellwig 67888691e9eSChristoph HellwigProgram that uses invalid map_fd=0 while calling to map_lookup_elem() function:: 67988691e9eSChristoph Hellwig 68088691e9eSChristoph Hellwig BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 68188691e9eSChristoph Hellwig BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 68288691e9eSChristoph Hellwig BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 68388691e9eSChristoph Hellwig BPF_LD_MAP_FD(BPF_REG_1, 0), 68488691e9eSChristoph Hellwig BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 68588691e9eSChristoph Hellwig BPF_EXIT_INSN(), 68688691e9eSChristoph Hellwig 68788691e9eSChristoph HellwigError:: 68888691e9eSChristoph Hellwig 68988691e9eSChristoph Hellwig 0: (7a) *(u64 *)(r10 -8) = 0 69088691e9eSChristoph Hellwig 1: (bf) r2 = r10 69188691e9eSChristoph Hellwig 2: (07) r2 += -8 69288691e9eSChristoph Hellwig 3: (b7) r1 = 0x0 69388691e9eSChristoph Hellwig 4: (85) call 1 69488691e9eSChristoph Hellwig fd 0 is not pointing to valid bpf_map 69588691e9eSChristoph Hellwig 69688691e9eSChristoph HellwigProgram that doesn't check return value of map_lookup_elem() before accessing 69788691e9eSChristoph Hellwigmap element:: 69888691e9eSChristoph Hellwig 69988691e9eSChristoph Hellwig BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 70088691e9eSChristoph Hellwig BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 70188691e9eSChristoph Hellwig BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 70288691e9eSChristoph Hellwig BPF_LD_MAP_FD(BPF_REG_1, 0), 70388691e9eSChristoph Hellwig BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 70488691e9eSChristoph Hellwig BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), 70588691e9eSChristoph Hellwig BPF_EXIT_INSN(), 70688691e9eSChristoph Hellwig 70788691e9eSChristoph HellwigError:: 70888691e9eSChristoph Hellwig 70988691e9eSChristoph Hellwig 0: (7a) *(u64 *)(r10 -8) = 0 71088691e9eSChristoph Hellwig 1: (bf) r2 = r10 71188691e9eSChristoph Hellwig 2: (07) r2 += -8 71288691e9eSChristoph Hellwig 3: (b7) r1 = 0x0 71388691e9eSChristoph Hellwig 4: (85) call 1 71488691e9eSChristoph Hellwig 5: (7a) *(u64 *)(r0 +0) = 0 71588691e9eSChristoph Hellwig R0 invalid mem access 'map_value_or_null' 71688691e9eSChristoph Hellwig 71788691e9eSChristoph HellwigProgram that correctly checks map_lookup_elem() returned value for NULL, but 71888691e9eSChristoph Hellwigaccesses the memory with incorrect alignment:: 71988691e9eSChristoph Hellwig 72088691e9eSChristoph Hellwig BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 72188691e9eSChristoph Hellwig BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 72288691e9eSChristoph Hellwig BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 72388691e9eSChristoph Hellwig BPF_LD_MAP_FD(BPF_REG_1, 0), 72488691e9eSChristoph Hellwig BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 72588691e9eSChristoph Hellwig BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), 72688691e9eSChristoph Hellwig BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), 72788691e9eSChristoph Hellwig BPF_EXIT_INSN(), 72888691e9eSChristoph Hellwig 72988691e9eSChristoph HellwigError:: 73088691e9eSChristoph Hellwig 73188691e9eSChristoph Hellwig 0: (7a) *(u64 *)(r10 -8) = 0 73288691e9eSChristoph Hellwig 1: (bf) r2 = r10 73388691e9eSChristoph Hellwig 2: (07) r2 += -8 73488691e9eSChristoph Hellwig 3: (b7) r1 = 1 73588691e9eSChristoph Hellwig 4: (85) call 1 73688691e9eSChristoph Hellwig 5: (15) if r0 == 0x0 goto pc+1 73788691e9eSChristoph Hellwig R0=map_ptr R10=fp 73888691e9eSChristoph Hellwig 6: (7a) *(u64 *)(r0 +4) = 0 73988691e9eSChristoph Hellwig misaligned access off 4 size 8 74088691e9eSChristoph Hellwig 74188691e9eSChristoph HellwigProgram that correctly checks map_lookup_elem() returned value for NULL and 74288691e9eSChristoph Hellwigaccesses memory with correct alignment in one side of 'if' branch, but fails 74388691e9eSChristoph Hellwigto do so in the other side of 'if' branch:: 74488691e9eSChristoph Hellwig 74588691e9eSChristoph Hellwig BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), 74688691e9eSChristoph Hellwig BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 74788691e9eSChristoph Hellwig BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 74888691e9eSChristoph Hellwig BPF_LD_MAP_FD(BPF_REG_1, 0), 74988691e9eSChristoph Hellwig BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), 75088691e9eSChristoph Hellwig BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 75188691e9eSChristoph Hellwig BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), 75288691e9eSChristoph Hellwig BPF_EXIT_INSN(), 75388691e9eSChristoph Hellwig BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1), 75488691e9eSChristoph Hellwig BPF_EXIT_INSN(), 75588691e9eSChristoph Hellwig 75688691e9eSChristoph HellwigError:: 75788691e9eSChristoph Hellwig 75888691e9eSChristoph Hellwig 0: (7a) *(u64 *)(r10 -8) = 0 75988691e9eSChristoph Hellwig 1: (bf) r2 = r10 76088691e9eSChristoph Hellwig 2: (07) r2 += -8 76188691e9eSChristoph Hellwig 3: (b7) r1 = 1 76288691e9eSChristoph Hellwig 4: (85) call 1 76388691e9eSChristoph Hellwig 5: (15) if r0 == 0x0 goto pc+2 76488691e9eSChristoph Hellwig R0=map_ptr R10=fp 76588691e9eSChristoph Hellwig 6: (7a) *(u64 *)(r0 +0) = 0 76688691e9eSChristoph Hellwig 7: (95) exit 76788691e9eSChristoph Hellwig 76888691e9eSChristoph Hellwig from 5 to 8: R0=imm0 R10=fp 76988691e9eSChristoph Hellwig 8: (7a) *(u64 *)(r0 +0) = 1 77088691e9eSChristoph Hellwig R0 invalid mem access 'imm' 77188691e9eSChristoph Hellwig 77288691e9eSChristoph HellwigProgram that performs a socket lookup then sets the pointer to NULL without 77388691e9eSChristoph Hellwigchecking it:: 77488691e9eSChristoph Hellwig 77588691e9eSChristoph Hellwig BPF_MOV64_IMM(BPF_REG_2, 0), 77688691e9eSChristoph Hellwig BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), 77788691e9eSChristoph Hellwig BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 77888691e9eSChristoph Hellwig BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 77988691e9eSChristoph Hellwig BPF_MOV64_IMM(BPF_REG_3, 4), 78088691e9eSChristoph Hellwig BPF_MOV64_IMM(BPF_REG_4, 0), 78188691e9eSChristoph Hellwig BPF_MOV64_IMM(BPF_REG_5, 0), 78288691e9eSChristoph Hellwig BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), 78388691e9eSChristoph Hellwig BPF_MOV64_IMM(BPF_REG_0, 0), 78488691e9eSChristoph Hellwig BPF_EXIT_INSN(), 78588691e9eSChristoph Hellwig 78688691e9eSChristoph HellwigError:: 78788691e9eSChristoph Hellwig 78888691e9eSChristoph Hellwig 0: (b7) r2 = 0 78988691e9eSChristoph Hellwig 1: (63) *(u32 *)(r10 -8) = r2 79088691e9eSChristoph Hellwig 2: (bf) r2 = r10 79188691e9eSChristoph Hellwig 3: (07) r2 += -8 79288691e9eSChristoph Hellwig 4: (b7) r3 = 4 79388691e9eSChristoph Hellwig 5: (b7) r4 = 0 79488691e9eSChristoph Hellwig 6: (b7) r5 = 0 79588691e9eSChristoph Hellwig 7: (85) call bpf_sk_lookup_tcp#65 79688691e9eSChristoph Hellwig 8: (b7) r0 = 0 79788691e9eSChristoph Hellwig 9: (95) exit 79888691e9eSChristoph Hellwig Unreleased reference id=1, alloc_insn=7 79988691e9eSChristoph Hellwig 80088691e9eSChristoph HellwigProgram that performs a socket lookup but does not NULL-check the returned 80188691e9eSChristoph Hellwigvalue:: 80288691e9eSChristoph Hellwig 80388691e9eSChristoph Hellwig BPF_MOV64_IMM(BPF_REG_2, 0), 80488691e9eSChristoph Hellwig BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), 80588691e9eSChristoph Hellwig BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 80688691e9eSChristoph Hellwig BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 80788691e9eSChristoph Hellwig BPF_MOV64_IMM(BPF_REG_3, 4), 80888691e9eSChristoph Hellwig BPF_MOV64_IMM(BPF_REG_4, 0), 80988691e9eSChristoph Hellwig BPF_MOV64_IMM(BPF_REG_5, 0), 81088691e9eSChristoph Hellwig BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), 81188691e9eSChristoph Hellwig BPF_EXIT_INSN(), 81288691e9eSChristoph Hellwig 81388691e9eSChristoph HellwigError:: 81488691e9eSChristoph Hellwig 81588691e9eSChristoph Hellwig 0: (b7) r2 = 0 81688691e9eSChristoph Hellwig 1: (63) *(u32 *)(r10 -8) = r2 81788691e9eSChristoph Hellwig 2: (bf) r2 = r10 81888691e9eSChristoph Hellwig 3: (07) r2 += -8 81988691e9eSChristoph Hellwig 4: (b7) r3 = 4 82088691e9eSChristoph Hellwig 5: (b7) r4 = 0 82188691e9eSChristoph Hellwig 6: (b7) r5 = 0 82288691e9eSChristoph Hellwig 7: (85) call bpf_sk_lookup_tcp#65 82388691e9eSChristoph Hellwig 8: (95) exit 82488691e9eSChristoph Hellwig Unreleased reference id=1, alloc_insn=7 825