1*1a6ac1d5SJesper Dangaard Brouer============== 2*1a6ac1d5SJesper Dangaard BrouerBPF Design Q&A 3*1a6ac1d5SJesper Dangaard Brouer============== 4*1a6ac1d5SJesper Dangaard Brouer 5192092faSJesper Dangaard BrouerBPF extensibility and applicability to networking, tracing, security 6192092faSJesper Dangaard Brouerin the linux kernel and several user space implementations of BPF 7192092faSJesper Dangaard Brouervirtual machine led to a number of misunderstanding on what BPF actually is. 8192092faSJesper Dangaard BrouerThis short QA is an attempt to address that and outline a direction 9192092faSJesper Dangaard Brouerof where BPF is heading long term. 10192092faSJesper Dangaard Brouer 11*1a6ac1d5SJesper Dangaard Brouer.. contents:: 12*1a6ac1d5SJesper Dangaard Brouer :local: 13*1a6ac1d5SJesper Dangaard Brouer :depth: 3 14*1a6ac1d5SJesper Dangaard Brouer 15*1a6ac1d5SJesper Dangaard BrouerQuestions and Answers 16*1a6ac1d5SJesper Dangaard Brouer===================== 17*1a6ac1d5SJesper Dangaard Brouer 18192092faSJesper Dangaard BrouerQ: Is BPF a generic instruction set similar to x64 and arm64? 19*1a6ac1d5SJesper Dangaard Brouer------------------------------------------------------------- 20192092faSJesper Dangaard BrouerA: NO. 21192092faSJesper Dangaard Brouer 22192092faSJesper Dangaard BrouerQ: Is BPF a generic virtual machine ? 23*1a6ac1d5SJesper Dangaard Brouer------------------------------------- 24192092faSJesper Dangaard BrouerA: NO. 25192092faSJesper Dangaard Brouer 26*1a6ac1d5SJesper Dangaard BrouerBPF is generic instruction set *with* C calling convention. 27*1a6ac1d5SJesper Dangaard Brouer----------------------------------------------------------- 28192092faSJesper Dangaard Brouer 29192092faSJesper Dangaard BrouerQ: Why C calling convention was chosen? 30*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 31*1a6ac1d5SJesper Dangaard Brouer 32192092faSJesper Dangaard BrouerA: Because BPF programs are designed to run in the linux kernel 33192092faSJesper Dangaard Brouerwhich is written in C, hence BPF defines instruction set compatible 34192092faSJesper Dangaard Brouerwith two most used architectures x64 and arm64 (and takes into 35192092faSJesper Dangaard Brouerconsideration important quirks of other architectures) and 36192092faSJesper Dangaard Brouerdefines calling convention that is compatible with C calling 37192092faSJesper Dangaard Brouerconvention of the linux kernel on those architectures. 38192092faSJesper Dangaard Brouer 39192092faSJesper Dangaard BrouerQ: can multiple return values be supported in the future? 40*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 41192092faSJesper Dangaard BrouerA: NO. BPF allows only register R0 to be used as return value. 42192092faSJesper Dangaard Brouer 43192092faSJesper Dangaard BrouerQ: can more than 5 function arguments be supported in the future? 44*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 45192092faSJesper Dangaard BrouerA: NO. BPF calling convention only allows registers R1-R5 to be used 46192092faSJesper Dangaard Broueras arguments. BPF is not a standalone instruction set. 47192092faSJesper Dangaard Brouer(unlike x64 ISA that allows msft, cdecl and other conventions) 48192092faSJesper Dangaard Brouer 49192092faSJesper Dangaard BrouerQ: can BPF programs access instruction pointer or return address? 50*1a6ac1d5SJesper Dangaard Brouer----------------------------------------------------------------- 51192092faSJesper Dangaard BrouerA: NO. 52192092faSJesper Dangaard Brouer 53192092faSJesper Dangaard BrouerQ: can BPF programs access stack pointer ? 54*1a6ac1d5SJesper Dangaard Brouer------------------------------------------ 55*1a6ac1d5SJesper Dangaard BrouerA: NO. 56*1a6ac1d5SJesper Dangaard Brouer 57*1a6ac1d5SJesper Dangaard BrouerOnly frame pointer (register R10) is accessible. 58192092faSJesper Dangaard BrouerFrom compiler point of view it's necessary to have stack pointer. 59192092faSJesper Dangaard BrouerFor example LLVM defines register R11 as stack pointer in its 60192092faSJesper Dangaard BrouerBPF backend, but it makes sure that generated code never uses it. 61192092faSJesper Dangaard Brouer 62192092faSJesper Dangaard BrouerQ: Does C-calling convention diminishes possible use cases? 63*1a6ac1d5SJesper Dangaard Brouer----------------------------------------------------------- 64*1a6ac1d5SJesper Dangaard BrouerA: YES. 65*1a6ac1d5SJesper Dangaard Brouer 66*1a6ac1d5SJesper Dangaard BrouerBPF design forces addition of major functionality in the form 67192092faSJesper Dangaard Brouerof kernel helper functions and kernel objects like BPF maps with 68192092faSJesper Dangaard Brouerseamless interoperability between them. It lets kernel call into 69192092faSJesper Dangaard BrouerBPF programs and programs call kernel helpers with zero overhead. 70192092faSJesper Dangaard BrouerAs all of them were native C code. That is particularly the case 71192092faSJesper Dangaard Brouerfor JITed BPF programs that are indistinguishable from 72192092faSJesper Dangaard Brouernative kernel C code. 73192092faSJesper Dangaard Brouer 74192092faSJesper Dangaard BrouerQ: Does it mean that 'innovative' extensions to BPF code are disallowed? 75*1a6ac1d5SJesper Dangaard Brouer------------------------------------------------------------------------ 76*1a6ac1d5SJesper Dangaard BrouerA: Soft yes. 77*1a6ac1d5SJesper Dangaard Brouer 78*1a6ac1d5SJesper Dangaard BrouerAt least for now until BPF core has support for 79192092faSJesper Dangaard Brouerbpf-to-bpf calls, indirect calls, loops, global variables, 80192092faSJesper Dangaard Brouerjump tables, read only sections and all other normal constructs 81192092faSJesper Dangaard Brouerthat C code can produce. 82192092faSJesper Dangaard Brouer 83192092faSJesper Dangaard BrouerQ: Can loops be supported in a safe way? 84*1a6ac1d5SJesper Dangaard Brouer---------------------------------------- 85*1a6ac1d5SJesper Dangaard BrouerA: It's not clear yet. 86*1a6ac1d5SJesper Dangaard Brouer 87*1a6ac1d5SJesper Dangaard BrouerBPF developers are trying to find a way to 88192092faSJesper Dangaard Brouersupport bounded loops where the verifier can guarantee that 89192092faSJesper Dangaard Brouerthe program terminates in less than 4096 instructions. 90192092faSJesper Dangaard Brouer 91*1a6ac1d5SJesper Dangaard BrouerInstruction level questions 92*1a6ac1d5SJesper Dangaard Brouer--------------------------- 93*1a6ac1d5SJesper Dangaard Brouer 94*1a6ac1d5SJesper Dangaard BrouerQ: LD_ABS and LD_IND instructions vs C code 95*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 96*1a6ac1d5SJesper Dangaard Brouer 97192092faSJesper Dangaard BrouerQ: How come LD_ABS and LD_IND instruction are present in BPF whereas 98192092faSJesper Dangaard BrouerC code cannot express them and has to use builtin intrinsics? 99*1a6ac1d5SJesper Dangaard Brouer 100192092faSJesper Dangaard BrouerA: This is artifact of compatibility with classic BPF. Modern 101192092faSJesper Dangaard Brouernetworking code in BPF performs better without them. 102192092faSJesper Dangaard BrouerSee 'direct packet access'. 103192092faSJesper Dangaard Brouer 104*1a6ac1d5SJesper Dangaard BrouerQ: BPF instructions mapping not one-to-one to native CPU 105*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 106192092faSJesper Dangaard BrouerQ: It seems not all BPF instructions are one-to-one to native CPU. 107192092faSJesper Dangaard BrouerFor example why BPF_JNE and other compare and jumps are not cpu-like? 108*1a6ac1d5SJesper Dangaard Brouer 109192092faSJesper Dangaard BrouerA: This was necessary to avoid introducing flags into ISA which are 110192092faSJesper Dangaard Brouerimpossible to make generic and efficient across CPU architectures. 111192092faSJesper Dangaard Brouer 112192092faSJesper Dangaard BrouerQ: why BPF_DIV instruction doesn't map to x64 div? 113*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 114192092faSJesper Dangaard BrouerA: Because if we picked one-to-one relationship to x64 it would have made 115192092faSJesper Dangaard Brouerit more complicated to support on arm64 and other archs. Also it 116192092faSJesper Dangaard Brouerneeds div-by-zero runtime check. 117192092faSJesper Dangaard Brouer 118192092faSJesper Dangaard BrouerQ: why there is no BPF_SDIV for signed divide operation? 119*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 120192092faSJesper Dangaard BrouerA: Because it would be rarely used. llvm errors in such case and 121192092faSJesper Dangaard Brouerprints a suggestion to use unsigned divide instead 122192092faSJesper Dangaard Brouer 123192092faSJesper Dangaard BrouerQ: Why BPF has implicit prologue and epilogue? 124*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 125192092faSJesper Dangaard BrouerA: Because architectures like sparc have register windows and in general 126192092faSJesper Dangaard Brouerthere are enough subtle differences between architectures, so naive 127192092faSJesper Dangaard Brouerstore return address into stack won't work. Another reason is BPF has 128192092faSJesper Dangaard Brouerto be safe from division by zero (and legacy exception path 129192092faSJesper Dangaard Brouerof LD_ABS insn). Those instructions need to invoke epilogue and 130192092faSJesper Dangaard Brouerreturn implicitly. 131192092faSJesper Dangaard Brouer 132192092faSJesper Dangaard BrouerQ: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning? 133*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 134192092faSJesper Dangaard BrouerA: Because classic BPF didn't have them and BPF authors felt that compiler 135192092faSJesper Dangaard Brouerworkaround would be acceptable. Turned out that programs lose performance 136192092faSJesper Dangaard Brouerdue to lack of these compare instructions and they were added. 137192092faSJesper Dangaard BrouerThese two instructions is a perfect example what kind of new BPF 138192092faSJesper Dangaard Brouerinstructions are acceptable and can be added in the future. 139192092faSJesper Dangaard BrouerThese two already had equivalent instructions in native CPUs. 140192092faSJesper Dangaard BrouerNew instructions that don't have one-to-one mapping to HW instructions 141192092faSJesper Dangaard Brouerwill not be accepted. 142192092faSJesper Dangaard Brouer 143*1a6ac1d5SJesper Dangaard BrouerQ: BPF 32-bit subregister requirements 144*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 145192092faSJesper Dangaard BrouerQ: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF 146192092faSJesper Dangaard Brouerregisters which makes BPF inefficient virtual machine for 32-bit 147192092faSJesper Dangaard BrouerCPU architectures and 32-bit HW accelerators. Can true 32-bit registers 148192092faSJesper Dangaard Brouerbe added to BPF in the future? 149*1a6ac1d5SJesper Dangaard Brouer 150192092faSJesper Dangaard BrouerA: NO. The first thing to improve performance on 32-bit archs is to teach 151192092faSJesper Dangaard BrouerLLVM to generate code that uses 32-bit subregisters. Then second step 152192092faSJesper Dangaard Broueris to teach verifier to mark operations where zero-ing upper bits 153192092faSJesper Dangaard Broueris unnecessary. Then JITs can take advantage of those markings and 154192092faSJesper Dangaard Brouerdrastically reduce size of generated code and improve performance. 155192092faSJesper Dangaard Brouer 156192092faSJesper Dangaard BrouerQ: Does BPF have a stable ABI? 157*1a6ac1d5SJesper Dangaard Brouer------------------------------ 158192092faSJesper Dangaard BrouerA: YES. BPF instructions, arguments to BPF programs, set of helper 159192092faSJesper Dangaard Brouerfunctions and their arguments, recognized return codes are all part 160192092faSJesper Dangaard Brouerof ABI. However when tracing programs are using bpf_probe_read() helper 161192092faSJesper Dangaard Brouerto walk kernel internal datastructures and compile with kernel 162192092faSJesper Dangaard Brouerinternal headers these accesses can and will break with newer 163192092faSJesper Dangaard Brouerkernels. The union bpf_attr -> kern_version is checked at load time 164192092faSJesper Dangaard Brouerto prevent accidentally loading kprobe-based bpf programs written 165192092faSJesper Dangaard Brouerfor a different kernel. Networking programs don't do kern_version check. 166192092faSJesper Dangaard Brouer 167192092faSJesper Dangaard BrouerQ: How much stack space a BPF program uses? 168*1a6ac1d5SJesper Dangaard Brouer------------------------------------------- 169192092faSJesper Dangaard BrouerA: Currently all program types are limited to 512 bytes of stack 170192092faSJesper Dangaard Brouerspace, but the verifier computes the actual amount of stack used 171192092faSJesper Dangaard Brouerand both interpreter and most JITed code consume necessary amount. 172192092faSJesper Dangaard Brouer 173192092faSJesper Dangaard BrouerQ: Can BPF be offloaded to HW? 174*1a6ac1d5SJesper Dangaard Brouer------------------------------ 175192092faSJesper Dangaard BrouerA: YES. BPF HW offload is supported by NFP driver. 176192092faSJesper Dangaard Brouer 177192092faSJesper Dangaard BrouerQ: Does classic BPF interpreter still exist? 178*1a6ac1d5SJesper Dangaard Brouer-------------------------------------------- 179192092faSJesper Dangaard BrouerA: NO. Classic BPF programs are converted into extend BPF instructions. 180192092faSJesper Dangaard Brouer 181192092faSJesper Dangaard BrouerQ: Can BPF call arbitrary kernel functions? 182*1a6ac1d5SJesper Dangaard Brouer------------------------------------------- 183192092faSJesper Dangaard BrouerA: NO. BPF programs can only call a set of helper functions which 184192092faSJesper Dangaard Broueris defined for every program type. 185192092faSJesper Dangaard Brouer 186192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary kernel memory? 187*1a6ac1d5SJesper Dangaard Brouer--------------------------------------------- 188*1a6ac1d5SJesper Dangaard BrouerA: NO. 189*1a6ac1d5SJesper Dangaard Brouer 190*1a6ac1d5SJesper Dangaard BrouerTracing bpf programs can *read* arbitrary memory with bpf_probe_read() 191192092faSJesper Dangaard Brouerand bpf_probe_read_str() helpers. Networking programs cannot read 192192092faSJesper Dangaard Brouerarbitrary memory, since they don't have access to these helpers. 193192092faSJesper Dangaard BrouerPrograms can never read or write arbitrary memory directly. 194192092faSJesper Dangaard Brouer 195192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary user memory? 196*1a6ac1d5SJesper Dangaard Brouer------------------------------------------- 197*1a6ac1d5SJesper Dangaard BrouerA: Sort-of. 198*1a6ac1d5SJesper Dangaard Brouer 199*1a6ac1d5SJesper Dangaard BrouerTracing BPF programs can overwrite the user memory 200192092faSJesper Dangaard Brouerof the current task with bpf_probe_write_user(). Every time such 201192092faSJesper Dangaard Brouerprogram is loaded the kernel will print warning message, so 202192092faSJesper Dangaard Brouerthis helper is only useful for experiments and prototypes. 203192092faSJesper Dangaard BrouerTracing BPF programs are root only. 204192092faSJesper Dangaard Brouer 205*1a6ac1d5SJesper Dangaard BrouerQ: bpf_trace_printk() helper warning 206*1a6ac1d5SJesper Dangaard Brouer------------------------------------ 207192092faSJesper Dangaard BrouerQ: When bpf_trace_printk() helper is used the kernel prints nasty 208192092faSJesper Dangaard Brouerwarning message. Why is that? 209*1a6ac1d5SJesper Dangaard Brouer 210192092faSJesper Dangaard BrouerA: This is done to nudge program authors into better interfaces when 211192092faSJesper Dangaard Brouerprograms need to pass data to user space. Like bpf_perf_event_output() 212192092faSJesper Dangaard Brouercan be used to efficiently stream data via perf ring buffer. 213192092faSJesper Dangaard BrouerBPF maps can be used for asynchronous data sharing between kernel 214192092faSJesper Dangaard Brouerand user space. bpf_trace_printk() should only be used for debugging. 215192092faSJesper Dangaard Brouer 216*1a6ac1d5SJesper Dangaard BrouerQ: New functionality via kernel modules? 217*1a6ac1d5SJesper Dangaard Brouer---------------------------------------- 218192092faSJesper Dangaard BrouerQ: Can BPF functionality such as new program or map types, new 219192092faSJesper Dangaard Brouerhelpers, etc be added out of kernel module code? 220*1a6ac1d5SJesper Dangaard Brouer 221192092faSJesper Dangaard BrouerA: NO. 222