xref: /openbmc/linux/Documentation/bpf/bpf_design_QA.rst (revision a769fa7208b94f37b6240215dc6970f9d76fc58c)
11a6ac1d5SJesper Dangaard Brouer==============
21a6ac1d5SJesper Dangaard BrouerBPF Design Q&A
31a6ac1d5SJesper Dangaard Brouer==============
41a6ac1d5SJesper Dangaard Brouer
5192092faSJesper Dangaard BrouerBPF extensibility and applicability to networking, tracing, security
6192092faSJesper Dangaard Brouerin the linux kernel and several user space implementations of BPF
7192092faSJesper Dangaard Brouervirtual machine led to a number of misunderstanding on what BPF actually is.
8192092faSJesper Dangaard BrouerThis short QA is an attempt to address that and outline a direction
9192092faSJesper Dangaard Brouerof where BPF is heading long term.
10192092faSJesper Dangaard Brouer
111a6ac1d5SJesper Dangaard Brouer.. contents::
121a6ac1d5SJesper Dangaard Brouer    :local:
131a6ac1d5SJesper Dangaard Brouer    :depth: 3
141a6ac1d5SJesper Dangaard Brouer
151a6ac1d5SJesper Dangaard BrouerQuestions and Answers
161a6ac1d5SJesper Dangaard Brouer=====================
171a6ac1d5SJesper Dangaard Brouer
18192092faSJesper Dangaard BrouerQ: Is BPF a generic instruction set similar to x64 and arm64?
191a6ac1d5SJesper Dangaard Brouer-------------------------------------------------------------
20192092faSJesper Dangaard BrouerA: NO.
21192092faSJesper Dangaard Brouer
22192092faSJesper Dangaard BrouerQ: Is BPF a generic virtual machine ?
231a6ac1d5SJesper Dangaard Brouer-------------------------------------
24192092faSJesper Dangaard BrouerA: NO.
25192092faSJesper Dangaard Brouer
261a6ac1d5SJesper Dangaard BrouerBPF is generic instruction set *with* C calling convention.
271a6ac1d5SJesper Dangaard Brouer-----------------------------------------------------------
28192092faSJesper Dangaard Brouer
29192092faSJesper Dangaard BrouerQ: Why C calling convention was chosen?
301a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
311a6ac1d5SJesper Dangaard Brouer
32192092faSJesper Dangaard BrouerA: Because BPF programs are designed to run in the linux kernel
33192092faSJesper Dangaard Brouerwhich is written in C, hence BPF defines instruction set compatible
34192092faSJesper Dangaard Brouerwith two most used architectures x64 and arm64 (and takes into
35192092faSJesper Dangaard Brouerconsideration important quirks of other architectures) and
36192092faSJesper Dangaard Brouerdefines calling convention that is compatible with C calling
37192092faSJesper Dangaard Brouerconvention of the linux kernel on those architectures.
38192092faSJesper Dangaard Brouer
39192092faSJesper Dangaard BrouerQ: can multiple return values be supported in the future?
401a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
41192092faSJesper Dangaard BrouerA: NO. BPF allows only register R0 to be used as return value.
42192092faSJesper Dangaard Brouer
43192092faSJesper Dangaard BrouerQ: can more than 5 function arguments be supported in the future?
441a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45192092faSJesper Dangaard BrouerA: NO. BPF calling convention only allows registers R1-R5 to be used
46192092faSJesper Dangaard Broueras arguments. BPF is not a standalone instruction set.
47192092faSJesper Dangaard Brouer(unlike x64 ISA that allows msft, cdecl and other conventions)
48192092faSJesper Dangaard Brouer
49192092faSJesper Dangaard BrouerQ: can BPF programs access instruction pointer or return address?
501a6ac1d5SJesper Dangaard Brouer-----------------------------------------------------------------
51192092faSJesper Dangaard BrouerA: NO.
52192092faSJesper Dangaard Brouer
53192092faSJesper Dangaard BrouerQ: can BPF programs access stack pointer ?
541a6ac1d5SJesper Dangaard Brouer------------------------------------------
551a6ac1d5SJesper Dangaard BrouerA: NO.
561a6ac1d5SJesper Dangaard Brouer
571a6ac1d5SJesper Dangaard BrouerOnly frame pointer (register R10) is accessible.
58192092faSJesper Dangaard BrouerFrom compiler point of view it's necessary to have stack pointer.
59192092faSJesper Dangaard BrouerFor example LLVM defines register R11 as stack pointer in its
60192092faSJesper Dangaard BrouerBPF backend, but it makes sure that generated code never uses it.
61192092faSJesper Dangaard Brouer
62192092faSJesper Dangaard BrouerQ: Does C-calling convention diminishes possible use cases?
631a6ac1d5SJesper Dangaard Brouer-----------------------------------------------------------
641a6ac1d5SJesper Dangaard BrouerA: YES.
651a6ac1d5SJesper Dangaard Brouer
661a6ac1d5SJesper Dangaard BrouerBPF design forces addition of major functionality in the form
67192092faSJesper Dangaard Brouerof kernel helper functions and kernel objects like BPF maps with
68192092faSJesper Dangaard Brouerseamless interoperability between them. It lets kernel call into
69192092faSJesper Dangaard BrouerBPF programs and programs call kernel helpers with zero overhead.
70192092faSJesper Dangaard BrouerAs all of them were native C code. That is particularly the case
71192092faSJesper Dangaard Brouerfor JITed BPF programs that are indistinguishable from
72192092faSJesper Dangaard Brouernative kernel C code.
73192092faSJesper Dangaard Brouer
74192092faSJesper Dangaard BrouerQ: Does it mean that 'innovative' extensions to BPF code are disallowed?
751a6ac1d5SJesper Dangaard Brouer------------------------------------------------------------------------
761a6ac1d5SJesper Dangaard BrouerA: Soft yes.
771a6ac1d5SJesper Dangaard Brouer
781a6ac1d5SJesper Dangaard BrouerAt least for now until BPF core has support for
79192092faSJesper Dangaard Brouerbpf-to-bpf calls, indirect calls, loops, global variables,
80192092faSJesper Dangaard Brouerjump tables, read only sections and all other normal constructs
81192092faSJesper Dangaard Brouerthat C code can produce.
82192092faSJesper Dangaard Brouer
83192092faSJesper Dangaard BrouerQ: Can loops be supported in a safe way?
841a6ac1d5SJesper Dangaard Brouer----------------------------------------
851a6ac1d5SJesper Dangaard BrouerA: It's not clear yet.
861a6ac1d5SJesper Dangaard Brouer
871a6ac1d5SJesper Dangaard BrouerBPF developers are trying to find a way to
88192092faSJesper Dangaard Brouersupport bounded loops where the verifier can guarantee that
89192092faSJesper Dangaard Brouerthe program terminates in less than 4096 instructions.
90192092faSJesper Dangaard Brouer
911a6ac1d5SJesper Dangaard BrouerInstruction level questions
921a6ac1d5SJesper Dangaard Brouer---------------------------
931a6ac1d5SJesper Dangaard Brouer
941a6ac1d5SJesper Dangaard BrouerQ: LD_ABS and LD_IND instructions vs C code
951a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
961a6ac1d5SJesper Dangaard Brouer
97192092faSJesper Dangaard BrouerQ: How come LD_ABS and LD_IND instruction are present in BPF whereas
98192092faSJesper Dangaard BrouerC code cannot express them and has to use builtin intrinsics?
991a6ac1d5SJesper Dangaard Brouer
100192092faSJesper Dangaard BrouerA: This is artifact of compatibility with classic BPF. Modern
101192092faSJesper Dangaard Brouernetworking code in BPF performs better without them.
102192092faSJesper Dangaard BrouerSee 'direct packet access'.
103192092faSJesper Dangaard Brouer
1041a6ac1d5SJesper Dangaard BrouerQ: BPF instructions mapping not one-to-one to native CPU
1051a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
106192092faSJesper Dangaard BrouerQ: It seems not all BPF instructions are one-to-one to native CPU.
107192092faSJesper Dangaard BrouerFor example why BPF_JNE and other compare and jumps are not cpu-like?
1081a6ac1d5SJesper Dangaard Brouer
109192092faSJesper Dangaard BrouerA: This was necessary to avoid introducing flags into ISA which are
110192092faSJesper Dangaard Brouerimpossible to make generic and efficient across CPU architectures.
111192092faSJesper Dangaard Brouer
112192092faSJesper Dangaard BrouerQ: why BPF_DIV instruction doesn't map to x64 div?
1131a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114192092faSJesper Dangaard BrouerA: Because if we picked one-to-one relationship to x64 it would have made
115192092faSJesper Dangaard Brouerit more complicated to support on arm64 and other archs. Also it
116192092faSJesper Dangaard Brouerneeds div-by-zero runtime check.
117192092faSJesper Dangaard Brouer
118192092faSJesper Dangaard BrouerQ: why there is no BPF_SDIV for signed divide operation?
1191a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
120192092faSJesper Dangaard BrouerA: Because it would be rarely used. llvm errors in such case and
121192092faSJesper Dangaard Brouerprints a suggestion to use unsigned divide instead
122192092faSJesper Dangaard Brouer
123192092faSJesper Dangaard BrouerQ: Why BPF has implicit prologue and epilogue?
1241a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
125192092faSJesper Dangaard BrouerA: Because architectures like sparc have register windows and in general
126192092faSJesper Dangaard Brouerthere are enough subtle differences between architectures, so naive
127192092faSJesper Dangaard Brouerstore return address into stack won't work. Another reason is BPF has
128192092faSJesper Dangaard Brouerto be safe from division by zero (and legacy exception path
129192092faSJesper Dangaard Brouerof LD_ABS insn). Those instructions need to invoke epilogue and
130192092faSJesper Dangaard Brouerreturn implicitly.
131192092faSJesper Dangaard Brouer
132192092faSJesper Dangaard BrouerQ: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning?
1331a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
134192092faSJesper Dangaard BrouerA: Because classic BPF didn't have them and BPF authors felt that compiler
135192092faSJesper Dangaard Brouerworkaround would be acceptable. Turned out that programs lose performance
136192092faSJesper Dangaard Brouerdue to lack of these compare instructions and they were added.
137192092faSJesper Dangaard BrouerThese two instructions is a perfect example what kind of new BPF
138192092faSJesper Dangaard Brouerinstructions are acceptable and can be added in the future.
139192092faSJesper Dangaard BrouerThese two already had equivalent instructions in native CPUs.
140192092faSJesper Dangaard BrouerNew instructions that don't have one-to-one mapping to HW instructions
141192092faSJesper Dangaard Brouerwill not be accepted.
142192092faSJesper Dangaard Brouer
1431a6ac1d5SJesper Dangaard BrouerQ: BPF 32-bit subregister requirements
1441a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
145192092faSJesper Dangaard BrouerQ: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF
146192092faSJesper Dangaard Brouerregisters which makes BPF inefficient virtual machine for 32-bit
147192092faSJesper Dangaard BrouerCPU architectures and 32-bit HW accelerators. Can true 32-bit registers
148192092faSJesper Dangaard Brouerbe added to BPF in the future?
1491a6ac1d5SJesper Dangaard Brouer
150192092faSJesper Dangaard BrouerA: NO. The first thing to improve performance on 32-bit archs is to teach
151192092faSJesper Dangaard BrouerLLVM to generate code that uses 32-bit subregisters. Then second step
152192092faSJesper Dangaard Broueris to teach verifier to mark operations where zero-ing upper bits
153192092faSJesper Dangaard Broueris unnecessary. Then JITs can take advantage of those markings and
154192092faSJesper Dangaard Brouerdrastically reduce size of generated code and improve performance.
155192092faSJesper Dangaard Brouer
156192092faSJesper Dangaard BrouerQ: Does BPF have a stable ABI?
1571a6ac1d5SJesper Dangaard Brouer------------------------------
158192092faSJesper Dangaard BrouerA: YES. BPF instructions, arguments to BPF programs, set of helper
159192092faSJesper Dangaard Brouerfunctions and their arguments, recognized return codes are all part
160*a769fa72SDaniel Borkmannof ABI. However there is one specific exception to tracing programs
161*a769fa72SDaniel Borkmannwhich are using helpers like bpf_probe_read() to walk kernel internal
162*a769fa72SDaniel Borkmanndata structures and compile with kernel internal headers. Both of these
163*a769fa72SDaniel Borkmannkernel internals are subject to change and can break with newer kernels
164*a769fa72SDaniel Borkmannsuch that the program needs to be adapted accordingly.
165192092faSJesper Dangaard Brouer
166192092faSJesper Dangaard BrouerQ: How much stack space a BPF program uses?
1671a6ac1d5SJesper Dangaard Brouer-------------------------------------------
168192092faSJesper Dangaard BrouerA: Currently all program types are limited to 512 bytes of stack
169192092faSJesper Dangaard Brouerspace, but the verifier computes the actual amount of stack used
170192092faSJesper Dangaard Brouerand both interpreter and most JITed code consume necessary amount.
171192092faSJesper Dangaard Brouer
172192092faSJesper Dangaard BrouerQ: Can BPF be offloaded to HW?
1731a6ac1d5SJesper Dangaard Brouer------------------------------
174192092faSJesper Dangaard BrouerA: YES. BPF HW offload is supported by NFP driver.
175192092faSJesper Dangaard Brouer
176192092faSJesper Dangaard BrouerQ: Does classic BPF interpreter still exist?
1771a6ac1d5SJesper Dangaard Brouer--------------------------------------------
178192092faSJesper Dangaard BrouerA: NO. Classic BPF programs are converted into extend BPF instructions.
179192092faSJesper Dangaard Brouer
180192092faSJesper Dangaard BrouerQ: Can BPF call arbitrary kernel functions?
1811a6ac1d5SJesper Dangaard Brouer-------------------------------------------
182192092faSJesper Dangaard BrouerA: NO. BPF programs can only call a set of helper functions which
183192092faSJesper Dangaard Broueris defined for every program type.
184192092faSJesper Dangaard Brouer
185192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary kernel memory?
1861a6ac1d5SJesper Dangaard Brouer---------------------------------------------
1871a6ac1d5SJesper Dangaard BrouerA: NO.
1881a6ac1d5SJesper Dangaard Brouer
1891a6ac1d5SJesper Dangaard BrouerTracing bpf programs can *read* arbitrary memory with bpf_probe_read()
190192092faSJesper Dangaard Brouerand bpf_probe_read_str() helpers. Networking programs cannot read
191192092faSJesper Dangaard Brouerarbitrary memory, since they don't have access to these helpers.
192192092faSJesper Dangaard BrouerPrograms can never read or write arbitrary memory directly.
193192092faSJesper Dangaard Brouer
194192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary user memory?
1951a6ac1d5SJesper Dangaard Brouer-------------------------------------------
1961a6ac1d5SJesper Dangaard BrouerA: Sort-of.
1971a6ac1d5SJesper Dangaard Brouer
1981a6ac1d5SJesper Dangaard BrouerTracing BPF programs can overwrite the user memory
199192092faSJesper Dangaard Brouerof the current task with bpf_probe_write_user(). Every time such
200192092faSJesper Dangaard Brouerprogram is loaded the kernel will print warning message, so
201192092faSJesper Dangaard Brouerthis helper is only useful for experiments and prototypes.
202192092faSJesper Dangaard BrouerTracing BPF programs are root only.
203192092faSJesper Dangaard Brouer
2041a6ac1d5SJesper Dangaard BrouerQ: bpf_trace_printk() helper warning
2051a6ac1d5SJesper Dangaard Brouer------------------------------------
206192092faSJesper Dangaard BrouerQ: When bpf_trace_printk() helper is used the kernel prints nasty
207192092faSJesper Dangaard Brouerwarning message. Why is that?
2081a6ac1d5SJesper Dangaard Brouer
209192092faSJesper Dangaard BrouerA: This is done to nudge program authors into better interfaces when
210192092faSJesper Dangaard Brouerprograms need to pass data to user space. Like bpf_perf_event_output()
211192092faSJesper Dangaard Brouercan be used to efficiently stream data via perf ring buffer.
212192092faSJesper Dangaard BrouerBPF maps can be used for asynchronous data sharing between kernel
213192092faSJesper Dangaard Brouerand user space. bpf_trace_printk() should only be used for debugging.
214192092faSJesper Dangaard Brouer
2151a6ac1d5SJesper Dangaard BrouerQ: New functionality via kernel modules?
2161a6ac1d5SJesper Dangaard Brouer----------------------------------------
217192092faSJesper Dangaard BrouerQ: Can BPF functionality such as new program or map types, new
218192092faSJesper Dangaard Brouerhelpers, etc be added out of kernel module code?
2191a6ac1d5SJesper Dangaard Brouer
220192092faSJesper Dangaard BrouerA: NO.
221