1*192092faSJesper Dangaard BrouerBPF extensibility and applicability to networking, tracing, security 2*192092faSJesper Dangaard Brouerin the linux kernel and several user space implementations of BPF 3*192092faSJesper Dangaard Brouervirtual machine led to a number of misunderstanding on what BPF actually is. 4*192092faSJesper Dangaard BrouerThis short QA is an attempt to address that and outline a direction 5*192092faSJesper Dangaard Brouerof where BPF is heading long term. 6*192092faSJesper Dangaard Brouer 7*192092faSJesper Dangaard BrouerQ: Is BPF a generic instruction set similar to x64 and arm64? 8*192092faSJesper Dangaard BrouerA: NO. 9*192092faSJesper Dangaard Brouer 10*192092faSJesper Dangaard BrouerQ: Is BPF a generic virtual machine ? 11*192092faSJesper Dangaard BrouerA: NO. 12*192092faSJesper Dangaard Brouer 13*192092faSJesper Dangaard BrouerBPF is generic instruction set _with_ C calling convention. 14*192092faSJesper Dangaard Brouer 15*192092faSJesper Dangaard BrouerQ: Why C calling convention was chosen? 16*192092faSJesper Dangaard BrouerA: Because BPF programs are designed to run in the linux kernel 17*192092faSJesper Dangaard Brouer which is written in C, hence BPF defines instruction set compatible 18*192092faSJesper Dangaard Brouer with two most used architectures x64 and arm64 (and takes into 19*192092faSJesper Dangaard Brouer consideration important quirks of other architectures) and 20*192092faSJesper Dangaard Brouer defines calling convention that is compatible with C calling 21*192092faSJesper Dangaard Brouer convention of the linux kernel on those architectures. 22*192092faSJesper Dangaard Brouer 23*192092faSJesper Dangaard BrouerQ: can multiple return values be supported in the future? 24*192092faSJesper Dangaard BrouerA: NO. BPF allows only register R0 to be used as return value. 25*192092faSJesper Dangaard Brouer 26*192092faSJesper Dangaard BrouerQ: can more than 5 function arguments be supported in the future? 27*192092faSJesper Dangaard BrouerA: NO. BPF calling convention only allows registers R1-R5 to be used 28*192092faSJesper Dangaard Brouer as arguments. BPF is not a standalone instruction set. 29*192092faSJesper Dangaard Brouer (unlike x64 ISA that allows msft, cdecl and other conventions) 30*192092faSJesper Dangaard Brouer 31*192092faSJesper Dangaard BrouerQ: can BPF programs access instruction pointer or return address? 32*192092faSJesper Dangaard BrouerA: NO. 33*192092faSJesper Dangaard Brouer 34*192092faSJesper Dangaard BrouerQ: can BPF programs access stack pointer ? 35*192092faSJesper Dangaard BrouerA: NO. Only frame pointer (register R10) is accessible. 36*192092faSJesper Dangaard Brouer From compiler point of view it's necessary to have stack pointer. 37*192092faSJesper Dangaard Brouer For example LLVM defines register R11 as stack pointer in its 38*192092faSJesper Dangaard Brouer BPF backend, but it makes sure that generated code never uses it. 39*192092faSJesper Dangaard Brouer 40*192092faSJesper Dangaard BrouerQ: Does C-calling convention diminishes possible use cases? 41*192092faSJesper Dangaard BrouerA: YES. BPF design forces addition of major functionality in the form 42*192092faSJesper Dangaard Brouer of kernel helper functions and kernel objects like BPF maps with 43*192092faSJesper Dangaard Brouer seamless interoperability between them. It lets kernel call into 44*192092faSJesper Dangaard Brouer BPF programs and programs call kernel helpers with zero overhead. 45*192092faSJesper Dangaard Brouer As all of them were native C code. That is particularly the case 46*192092faSJesper Dangaard Brouer for JITed BPF programs that are indistinguishable from 47*192092faSJesper Dangaard Brouer native kernel C code. 48*192092faSJesper Dangaard Brouer 49*192092faSJesper Dangaard BrouerQ: Does it mean that 'innovative' extensions to BPF code are disallowed? 50*192092faSJesper Dangaard BrouerA: Soft yes. At least for now until BPF core has support for 51*192092faSJesper Dangaard Brouer bpf-to-bpf calls, indirect calls, loops, global variables, 52*192092faSJesper Dangaard Brouer jump tables, read only sections and all other normal constructs 53*192092faSJesper Dangaard Brouer that C code can produce. 54*192092faSJesper Dangaard Brouer 55*192092faSJesper Dangaard BrouerQ: Can loops be supported in a safe way? 56*192092faSJesper Dangaard BrouerA: It's not clear yet. BPF developers are trying to find a way to 57*192092faSJesper Dangaard Brouer support bounded loops where the verifier can guarantee that 58*192092faSJesper Dangaard Brouer the program terminates in less than 4096 instructions. 59*192092faSJesper Dangaard Brouer 60*192092faSJesper Dangaard BrouerQ: How come LD_ABS and LD_IND instruction are present in BPF whereas 61*192092faSJesper Dangaard Brouer C code cannot express them and has to use builtin intrinsics? 62*192092faSJesper Dangaard BrouerA: This is artifact of compatibility with classic BPF. Modern 63*192092faSJesper Dangaard Brouer networking code in BPF performs better without them. 64*192092faSJesper Dangaard Brouer See 'direct packet access'. 65*192092faSJesper Dangaard Brouer 66*192092faSJesper Dangaard BrouerQ: It seems not all BPF instructions are one-to-one to native CPU. 67*192092faSJesper Dangaard Brouer For example why BPF_JNE and other compare and jumps are not cpu-like? 68*192092faSJesper Dangaard BrouerA: This was necessary to avoid introducing flags into ISA which are 69*192092faSJesper Dangaard Brouer impossible to make generic and efficient across CPU architectures. 70*192092faSJesper Dangaard Brouer 71*192092faSJesper Dangaard BrouerQ: why BPF_DIV instruction doesn't map to x64 div? 72*192092faSJesper Dangaard BrouerA: Because if we picked one-to-one relationship to x64 it would have made 73*192092faSJesper Dangaard Brouer it more complicated to support on arm64 and other archs. Also it 74*192092faSJesper Dangaard Brouer needs div-by-zero runtime check. 75*192092faSJesper Dangaard Brouer 76*192092faSJesper Dangaard BrouerQ: why there is no BPF_SDIV for signed divide operation? 77*192092faSJesper Dangaard BrouerA: Because it would be rarely used. llvm errors in such case and 78*192092faSJesper Dangaard Brouer prints a suggestion to use unsigned divide instead 79*192092faSJesper Dangaard Brouer 80*192092faSJesper Dangaard BrouerQ: Why BPF has implicit prologue and epilogue? 81*192092faSJesper Dangaard BrouerA: Because architectures like sparc have register windows and in general 82*192092faSJesper Dangaard Brouer there are enough subtle differences between architectures, so naive 83*192092faSJesper Dangaard Brouer store return address into stack won't work. Another reason is BPF has 84*192092faSJesper Dangaard Brouer to be safe from division by zero (and legacy exception path 85*192092faSJesper Dangaard Brouer of LD_ABS insn). Those instructions need to invoke epilogue and 86*192092faSJesper Dangaard Brouer return implicitly. 87*192092faSJesper Dangaard Brouer 88*192092faSJesper Dangaard BrouerQ: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning? 89*192092faSJesper Dangaard BrouerA: Because classic BPF didn't have them and BPF authors felt that compiler 90*192092faSJesper Dangaard Brouer workaround would be acceptable. Turned out that programs lose performance 91*192092faSJesper Dangaard Brouer due to lack of these compare instructions and they were added. 92*192092faSJesper Dangaard Brouer These two instructions is a perfect example what kind of new BPF 93*192092faSJesper Dangaard Brouer instructions are acceptable and can be added in the future. 94*192092faSJesper Dangaard Brouer These two already had equivalent instructions in native CPUs. 95*192092faSJesper Dangaard Brouer New instructions that don't have one-to-one mapping to HW instructions 96*192092faSJesper Dangaard Brouer will not be accepted. 97*192092faSJesper Dangaard Brouer 98*192092faSJesper Dangaard BrouerQ: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF 99*192092faSJesper Dangaard Brouer registers which makes BPF inefficient virtual machine for 32-bit 100*192092faSJesper Dangaard Brouer CPU architectures and 32-bit HW accelerators. Can true 32-bit registers 101*192092faSJesper Dangaard Brouer be added to BPF in the future? 102*192092faSJesper Dangaard BrouerA: NO. The first thing to improve performance on 32-bit archs is to teach 103*192092faSJesper Dangaard Brouer LLVM to generate code that uses 32-bit subregisters. Then second step 104*192092faSJesper Dangaard Brouer is to teach verifier to mark operations where zero-ing upper bits 105*192092faSJesper Dangaard Brouer is unnecessary. Then JITs can take advantage of those markings and 106*192092faSJesper Dangaard Brouer drastically reduce size of generated code and improve performance. 107*192092faSJesper Dangaard Brouer 108*192092faSJesper Dangaard BrouerQ: Does BPF have a stable ABI? 109*192092faSJesper Dangaard BrouerA: YES. BPF instructions, arguments to BPF programs, set of helper 110*192092faSJesper Dangaard Brouer functions and their arguments, recognized return codes are all part 111*192092faSJesper Dangaard Brouer of ABI. However when tracing programs are using bpf_probe_read() helper 112*192092faSJesper Dangaard Brouer to walk kernel internal datastructures and compile with kernel 113*192092faSJesper Dangaard Brouer internal headers these accesses can and will break with newer 114*192092faSJesper Dangaard Brouer kernels. The union bpf_attr -> kern_version is checked at load time 115*192092faSJesper Dangaard Brouer to prevent accidentally loading kprobe-based bpf programs written 116*192092faSJesper Dangaard Brouer for a different kernel. Networking programs don't do kern_version check. 117*192092faSJesper Dangaard Brouer 118*192092faSJesper Dangaard BrouerQ: How much stack space a BPF program uses? 119*192092faSJesper Dangaard BrouerA: Currently all program types are limited to 512 bytes of stack 120*192092faSJesper Dangaard Brouer space, but the verifier computes the actual amount of stack used 121*192092faSJesper Dangaard Brouer and both interpreter and most JITed code consume necessary amount. 122*192092faSJesper Dangaard Brouer 123*192092faSJesper Dangaard BrouerQ: Can BPF be offloaded to HW? 124*192092faSJesper Dangaard BrouerA: YES. BPF HW offload is supported by NFP driver. 125*192092faSJesper Dangaard Brouer 126*192092faSJesper Dangaard BrouerQ: Does classic BPF interpreter still exist? 127*192092faSJesper Dangaard BrouerA: NO. Classic BPF programs are converted into extend BPF instructions. 128*192092faSJesper Dangaard Brouer 129*192092faSJesper Dangaard BrouerQ: Can BPF call arbitrary kernel functions? 130*192092faSJesper Dangaard BrouerA: NO. BPF programs can only call a set of helper functions which 131*192092faSJesper Dangaard Brouer is defined for every program type. 132*192092faSJesper Dangaard Brouer 133*192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary kernel memory? 134*192092faSJesper Dangaard BrouerA: NO. Tracing bpf programs can _read_ arbitrary memory with bpf_probe_read() 135*192092faSJesper Dangaard Brouer and bpf_probe_read_str() helpers. Networking programs cannot read 136*192092faSJesper Dangaard Brouer arbitrary memory, since they don't have access to these helpers. 137*192092faSJesper Dangaard Brouer Programs can never read or write arbitrary memory directly. 138*192092faSJesper Dangaard Brouer 139*192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary user memory? 140*192092faSJesper Dangaard BrouerA: Sort-of. Tracing BPF programs can overwrite the user memory 141*192092faSJesper Dangaard Brouer of the current task with bpf_probe_write_user(). Every time such 142*192092faSJesper Dangaard Brouer program is loaded the kernel will print warning message, so 143*192092faSJesper Dangaard Brouer this helper is only useful for experiments and prototypes. 144*192092faSJesper Dangaard Brouer Tracing BPF programs are root only. 145*192092faSJesper Dangaard Brouer 146*192092faSJesper Dangaard BrouerQ: When bpf_trace_printk() helper is used the kernel prints nasty 147*192092faSJesper Dangaard Brouer warning message. Why is that? 148*192092faSJesper Dangaard BrouerA: This is done to nudge program authors into better interfaces when 149*192092faSJesper Dangaard Brouer programs need to pass data to user space. Like bpf_perf_event_output() 150*192092faSJesper Dangaard Brouer can be used to efficiently stream data via perf ring buffer. 151*192092faSJesper Dangaard Brouer BPF maps can be used for asynchronous data sharing between kernel 152*192092faSJesper Dangaard Brouer and user space. bpf_trace_printk() should only be used for debugging. 153*192092faSJesper Dangaard Brouer 154*192092faSJesper Dangaard BrouerQ: Can BPF functionality such as new program or map types, new 155*192092faSJesper Dangaard Brouer helpers, etc be added out of kernel module code? 156*192092faSJesper Dangaard BrouerA: NO. 157