11a6ac1d5SJesper Dangaard Brouer============== 21a6ac1d5SJesper Dangaard BrouerBPF Design Q&A 31a6ac1d5SJesper Dangaard Brouer============== 41a6ac1d5SJesper Dangaard Brouer 5192092faSJesper Dangaard BrouerBPF extensibility and applicability to networking, tracing, security 6192092faSJesper Dangaard Brouerin the linux kernel and several user space implementations of BPF 7192092faSJesper Dangaard Brouervirtual machine led to a number of misunderstanding on what BPF actually is. 8192092faSJesper Dangaard BrouerThis short QA is an attempt to address that and outline a direction 9192092faSJesper Dangaard Brouerof where BPF is heading long term. 10192092faSJesper Dangaard Brouer 111a6ac1d5SJesper Dangaard Brouer.. contents:: 121a6ac1d5SJesper Dangaard Brouer :local: 131a6ac1d5SJesper Dangaard Brouer :depth: 3 141a6ac1d5SJesper Dangaard Brouer 151a6ac1d5SJesper Dangaard BrouerQuestions and Answers 161a6ac1d5SJesper Dangaard Brouer===================== 171a6ac1d5SJesper Dangaard Brouer 18192092faSJesper Dangaard BrouerQ: Is BPF a generic instruction set similar to x64 and arm64? 191a6ac1d5SJesper Dangaard Brouer------------------------------------------------------------- 20192092faSJesper Dangaard BrouerA: NO. 21192092faSJesper Dangaard Brouer 22192092faSJesper Dangaard BrouerQ: Is BPF a generic virtual machine ? 231a6ac1d5SJesper Dangaard Brouer------------------------------------- 24192092faSJesper Dangaard BrouerA: NO. 25192092faSJesper Dangaard Brouer 261a6ac1d5SJesper Dangaard BrouerBPF is generic instruction set *with* C calling convention. 271a6ac1d5SJesper Dangaard Brouer----------------------------------------------------------- 28192092faSJesper Dangaard Brouer 29192092faSJesper Dangaard BrouerQ: Why C calling convention was chosen? 301a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 311a6ac1d5SJesper Dangaard Brouer 32192092faSJesper Dangaard BrouerA: Because BPF programs are designed to run in the linux kernel 33192092faSJesper Dangaard Brouerwhich is written in C, hence BPF defines instruction set compatible 34192092faSJesper Dangaard Brouerwith two most used architectures x64 and arm64 (and takes into 35192092faSJesper Dangaard Brouerconsideration important quirks of other architectures) and 36192092faSJesper Dangaard Brouerdefines calling convention that is compatible with C calling 37192092faSJesper Dangaard Brouerconvention of the linux kernel on those architectures. 38192092faSJesper Dangaard Brouer 39*46604676SAndrii NakryikoQ: Can multiple return values be supported in the future? 401a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 41192092faSJesper Dangaard BrouerA: NO. BPF allows only register R0 to be used as return value. 42192092faSJesper Dangaard Brouer 43*46604676SAndrii NakryikoQ: Can more than 5 function arguments be supported in the future? 441a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 45192092faSJesper Dangaard BrouerA: NO. BPF calling convention only allows registers R1-R5 to be used 46192092faSJesper Dangaard Broueras arguments. BPF is not a standalone instruction set. 47192092faSJesper Dangaard Brouer(unlike x64 ISA that allows msft, cdecl and other conventions) 48192092faSJesper Dangaard Brouer 49*46604676SAndrii NakryikoQ: Can BPF programs access instruction pointer or return address? 501a6ac1d5SJesper Dangaard Brouer----------------------------------------------------------------- 51192092faSJesper Dangaard BrouerA: NO. 52192092faSJesper Dangaard Brouer 53*46604676SAndrii NakryikoQ: Can BPF programs access stack pointer ? 541a6ac1d5SJesper Dangaard Brouer------------------------------------------ 551a6ac1d5SJesper Dangaard BrouerA: NO. 561a6ac1d5SJesper Dangaard Brouer 571a6ac1d5SJesper Dangaard BrouerOnly frame pointer (register R10) is accessible. 58192092faSJesper Dangaard BrouerFrom compiler point of view it's necessary to have stack pointer. 59*46604676SAndrii NakryikoFor example, LLVM defines register R11 as stack pointer in its 60192092faSJesper Dangaard BrouerBPF backend, but it makes sure that generated code never uses it. 61192092faSJesper Dangaard Brouer 62192092faSJesper Dangaard BrouerQ: Does C-calling convention diminishes possible use cases? 631a6ac1d5SJesper Dangaard Brouer----------------------------------------------------------- 641a6ac1d5SJesper Dangaard BrouerA: YES. 651a6ac1d5SJesper Dangaard Brouer 661a6ac1d5SJesper Dangaard BrouerBPF design forces addition of major functionality in the form 67192092faSJesper Dangaard Brouerof kernel helper functions and kernel objects like BPF maps with 68192092faSJesper Dangaard Brouerseamless interoperability between them. It lets kernel call into 69*46604676SAndrii NakryikoBPF programs and programs call kernel helpers with zero overhead, 70*46604676SAndrii Nakryikoas all of them were native C code. That is particularly the case 71192092faSJesper Dangaard Brouerfor JITed BPF programs that are indistinguishable from 72192092faSJesper Dangaard Brouernative kernel C code. 73192092faSJesper Dangaard Brouer 74192092faSJesper Dangaard BrouerQ: Does it mean that 'innovative' extensions to BPF code are disallowed? 751a6ac1d5SJesper Dangaard Brouer------------------------------------------------------------------------ 761a6ac1d5SJesper Dangaard BrouerA: Soft yes. 771a6ac1d5SJesper Dangaard Brouer 78*46604676SAndrii NakryikoAt least for now, until BPF core has support for 79192092faSJesper Dangaard Brouerbpf-to-bpf calls, indirect calls, loops, global variables, 80*46604676SAndrii Nakryikojump tables, read-only sections, and all other normal constructs 81192092faSJesper Dangaard Brouerthat C code can produce. 82192092faSJesper Dangaard Brouer 83192092faSJesper Dangaard BrouerQ: Can loops be supported in a safe way? 841a6ac1d5SJesper Dangaard Brouer---------------------------------------- 851a6ac1d5SJesper Dangaard BrouerA: It's not clear yet. 861a6ac1d5SJesper Dangaard Brouer 871a6ac1d5SJesper Dangaard BrouerBPF developers are trying to find a way to 88192092faSJesper Dangaard Brouersupport bounded loops where the verifier can guarantee that 89192092faSJesper Dangaard Brouerthe program terminates in less than 4096 instructions. 90192092faSJesper Dangaard Brouer 911a6ac1d5SJesper Dangaard BrouerInstruction level questions 921a6ac1d5SJesper Dangaard Brouer--------------------------- 931a6ac1d5SJesper Dangaard Brouer 941a6ac1d5SJesper Dangaard BrouerQ: LD_ABS and LD_IND instructions vs C code 951a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 961a6ac1d5SJesper Dangaard Brouer 97192092faSJesper Dangaard BrouerQ: How come LD_ABS and LD_IND instruction are present in BPF whereas 98192092faSJesper Dangaard BrouerC code cannot express them and has to use builtin intrinsics? 991a6ac1d5SJesper Dangaard Brouer 100192092faSJesper Dangaard BrouerA: This is artifact of compatibility with classic BPF. Modern 101192092faSJesper Dangaard Brouernetworking code in BPF performs better without them. 102192092faSJesper Dangaard BrouerSee 'direct packet access'. 103192092faSJesper Dangaard Brouer 1041a6ac1d5SJesper Dangaard BrouerQ: BPF instructions mapping not one-to-one to native CPU 1051a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 106192092faSJesper Dangaard BrouerQ: It seems not all BPF instructions are one-to-one to native CPU. 107192092faSJesper Dangaard BrouerFor example why BPF_JNE and other compare and jumps are not cpu-like? 1081a6ac1d5SJesper Dangaard Brouer 109192092faSJesper Dangaard BrouerA: This was necessary to avoid introducing flags into ISA which are 110192092faSJesper Dangaard Brouerimpossible to make generic and efficient across CPU architectures. 111192092faSJesper Dangaard Brouer 112*46604676SAndrii NakryikoQ: Why BPF_DIV instruction doesn't map to x64 div? 1131a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 114192092faSJesper Dangaard BrouerA: Because if we picked one-to-one relationship to x64 it would have made 115192092faSJesper Dangaard Brouerit more complicated to support on arm64 and other archs. Also it 116192092faSJesper Dangaard Brouerneeds div-by-zero runtime check. 117192092faSJesper Dangaard Brouer 118*46604676SAndrii NakryikoQ: Why there is no BPF_SDIV for signed divide operation? 1191a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 120192092faSJesper Dangaard BrouerA: Because it would be rarely used. llvm errors in such case and 121*46604676SAndrii Nakryikoprints a suggestion to use unsigned divide instead. 122192092faSJesper Dangaard Brouer 123192092faSJesper Dangaard BrouerQ: Why BPF has implicit prologue and epilogue? 1241a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 125192092faSJesper Dangaard BrouerA: Because architectures like sparc have register windows and in general 126192092faSJesper Dangaard Brouerthere are enough subtle differences between architectures, so naive 127192092faSJesper Dangaard Brouerstore return address into stack won't work. Another reason is BPF has 128192092faSJesper Dangaard Brouerto be safe from division by zero (and legacy exception path 129192092faSJesper Dangaard Brouerof LD_ABS insn). Those instructions need to invoke epilogue and 130192092faSJesper Dangaard Brouerreturn implicitly. 131192092faSJesper Dangaard Brouer 132192092faSJesper Dangaard BrouerQ: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning? 1331a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 134192092faSJesper Dangaard BrouerA: Because classic BPF didn't have them and BPF authors felt that compiler 135192092faSJesper Dangaard Brouerworkaround would be acceptable. Turned out that programs lose performance 136192092faSJesper Dangaard Brouerdue to lack of these compare instructions and they were added. 137192092faSJesper Dangaard BrouerThese two instructions is a perfect example what kind of new BPF 138192092faSJesper Dangaard Brouerinstructions are acceptable and can be added in the future. 139192092faSJesper Dangaard BrouerThese two already had equivalent instructions in native CPUs. 140192092faSJesper Dangaard BrouerNew instructions that don't have one-to-one mapping to HW instructions 141192092faSJesper Dangaard Brouerwill not be accepted. 142192092faSJesper Dangaard Brouer 1431a6ac1d5SJesper Dangaard BrouerQ: BPF 32-bit subregister requirements 1441a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 145192092faSJesper Dangaard BrouerQ: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF 146192092faSJesper Dangaard Brouerregisters which makes BPF inefficient virtual machine for 32-bit 147192092faSJesper Dangaard BrouerCPU architectures and 32-bit HW accelerators. Can true 32-bit registers 148192092faSJesper Dangaard Brouerbe added to BPF in the future? 1491a6ac1d5SJesper Dangaard Brouer 150192092faSJesper Dangaard BrouerA: NO. The first thing to improve performance on 32-bit archs is to teach 151192092faSJesper Dangaard BrouerLLVM to generate code that uses 32-bit subregisters. Then second step 152192092faSJesper Dangaard Broueris to teach verifier to mark operations where zero-ing upper bits 153192092faSJesper Dangaard Broueris unnecessary. Then JITs can take advantage of those markings and 154192092faSJesper Dangaard Brouerdrastically reduce size of generated code and improve performance. 155192092faSJesper Dangaard Brouer 156192092faSJesper Dangaard BrouerQ: Does BPF have a stable ABI? 1571a6ac1d5SJesper Dangaard Brouer------------------------------ 158192092faSJesper Dangaard BrouerA: YES. BPF instructions, arguments to BPF programs, set of helper 159192092faSJesper Dangaard Brouerfunctions and their arguments, recognized return codes are all part 160a769fa72SDaniel Borkmannof ABI. However there is one specific exception to tracing programs 161a769fa72SDaniel Borkmannwhich are using helpers like bpf_probe_read() to walk kernel internal 162a769fa72SDaniel Borkmanndata structures and compile with kernel internal headers. Both of these 163a769fa72SDaniel Borkmannkernel internals are subject to change and can break with newer kernels 164a769fa72SDaniel Borkmannsuch that the program needs to be adapted accordingly. 165192092faSJesper Dangaard Brouer 166192092faSJesper Dangaard BrouerQ: How much stack space a BPF program uses? 1671a6ac1d5SJesper Dangaard Brouer------------------------------------------- 168192092faSJesper Dangaard BrouerA: Currently all program types are limited to 512 bytes of stack 169192092faSJesper Dangaard Brouerspace, but the verifier computes the actual amount of stack used 170192092faSJesper Dangaard Brouerand both interpreter and most JITed code consume necessary amount. 171192092faSJesper Dangaard Brouer 172192092faSJesper Dangaard BrouerQ: Can BPF be offloaded to HW? 1731a6ac1d5SJesper Dangaard Brouer------------------------------ 174192092faSJesper Dangaard BrouerA: YES. BPF HW offload is supported by NFP driver. 175192092faSJesper Dangaard Brouer 176192092faSJesper Dangaard BrouerQ: Does classic BPF interpreter still exist? 1771a6ac1d5SJesper Dangaard Brouer-------------------------------------------- 178192092faSJesper Dangaard BrouerA: NO. Classic BPF programs are converted into extend BPF instructions. 179192092faSJesper Dangaard Brouer 180192092faSJesper Dangaard BrouerQ: Can BPF call arbitrary kernel functions? 1811a6ac1d5SJesper Dangaard Brouer------------------------------------------- 182192092faSJesper Dangaard BrouerA: NO. BPF programs can only call a set of helper functions which 183192092faSJesper Dangaard Broueris defined for every program type. 184192092faSJesper Dangaard Brouer 185192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary kernel memory? 1861a6ac1d5SJesper Dangaard Brouer--------------------------------------------- 1871a6ac1d5SJesper Dangaard BrouerA: NO. 1881a6ac1d5SJesper Dangaard Brouer 1891a6ac1d5SJesper Dangaard BrouerTracing bpf programs can *read* arbitrary memory with bpf_probe_read() 190192092faSJesper Dangaard Brouerand bpf_probe_read_str() helpers. Networking programs cannot read 191192092faSJesper Dangaard Brouerarbitrary memory, since they don't have access to these helpers. 192192092faSJesper Dangaard BrouerPrograms can never read or write arbitrary memory directly. 193192092faSJesper Dangaard Brouer 194192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary user memory? 1951a6ac1d5SJesper Dangaard Brouer------------------------------------------- 1961a6ac1d5SJesper Dangaard BrouerA: Sort-of. 1971a6ac1d5SJesper Dangaard Brouer 1981a6ac1d5SJesper Dangaard BrouerTracing BPF programs can overwrite the user memory 199192092faSJesper Dangaard Brouerof the current task with bpf_probe_write_user(). Every time such 200192092faSJesper Dangaard Brouerprogram is loaded the kernel will print warning message, so 201192092faSJesper Dangaard Brouerthis helper is only useful for experiments and prototypes. 202192092faSJesper Dangaard BrouerTracing BPF programs are root only. 203192092faSJesper Dangaard Brouer 2041a6ac1d5SJesper Dangaard BrouerQ: bpf_trace_printk() helper warning 2051a6ac1d5SJesper Dangaard Brouer------------------------------------ 206192092faSJesper Dangaard BrouerQ: When bpf_trace_printk() helper is used the kernel prints nasty 207192092faSJesper Dangaard Brouerwarning message. Why is that? 2081a6ac1d5SJesper Dangaard Brouer 209192092faSJesper Dangaard BrouerA: This is done to nudge program authors into better interfaces when 210192092faSJesper Dangaard Brouerprograms need to pass data to user space. Like bpf_perf_event_output() 211192092faSJesper Dangaard Brouercan be used to efficiently stream data via perf ring buffer. 212192092faSJesper Dangaard BrouerBPF maps can be used for asynchronous data sharing between kernel 213192092faSJesper Dangaard Brouerand user space. bpf_trace_printk() should only be used for debugging. 214192092faSJesper Dangaard Brouer 2151a6ac1d5SJesper Dangaard BrouerQ: New functionality via kernel modules? 2161a6ac1d5SJesper Dangaard Brouer---------------------------------------- 217192092faSJesper Dangaard BrouerQ: Can BPF functionality such as new program or map types, new 218192092faSJesper Dangaard Brouerhelpers, etc be added out of kernel module code? 2191a6ac1d5SJesper Dangaard Brouer 220192092faSJesper Dangaard BrouerA: NO. 221