xref: /openbmc/linux/Documentation/bpf/bpf_design_QA.rst (revision 1a6ac1d59dc3b4077c643c3be70f9e650e267afe)
1*1a6ac1d5SJesper Dangaard Brouer==============
2*1a6ac1d5SJesper Dangaard BrouerBPF Design Q&A
3*1a6ac1d5SJesper Dangaard Brouer==============
4*1a6ac1d5SJesper Dangaard Brouer
5192092faSJesper Dangaard BrouerBPF extensibility and applicability to networking, tracing, security
6192092faSJesper Dangaard Brouerin the linux kernel and several user space implementations of BPF
7192092faSJesper Dangaard Brouervirtual machine led to a number of misunderstanding on what BPF actually is.
8192092faSJesper Dangaard BrouerThis short QA is an attempt to address that and outline a direction
9192092faSJesper Dangaard Brouerof where BPF is heading long term.
10192092faSJesper Dangaard Brouer
11*1a6ac1d5SJesper Dangaard Brouer.. contents::
12*1a6ac1d5SJesper Dangaard Brouer    :local:
13*1a6ac1d5SJesper Dangaard Brouer    :depth: 3
14*1a6ac1d5SJesper Dangaard Brouer
15*1a6ac1d5SJesper Dangaard BrouerQuestions and Answers
16*1a6ac1d5SJesper Dangaard Brouer=====================
17*1a6ac1d5SJesper Dangaard Brouer
18192092faSJesper Dangaard BrouerQ: Is BPF a generic instruction set similar to x64 and arm64?
19*1a6ac1d5SJesper Dangaard Brouer-------------------------------------------------------------
20192092faSJesper Dangaard BrouerA: NO.
21192092faSJesper Dangaard Brouer
22192092faSJesper Dangaard BrouerQ: Is BPF a generic virtual machine ?
23*1a6ac1d5SJesper Dangaard Brouer-------------------------------------
24192092faSJesper Dangaard BrouerA: NO.
25192092faSJesper Dangaard Brouer
26*1a6ac1d5SJesper Dangaard BrouerBPF is generic instruction set *with* C calling convention.
27*1a6ac1d5SJesper Dangaard Brouer-----------------------------------------------------------
28192092faSJesper Dangaard Brouer
29192092faSJesper Dangaard BrouerQ: Why C calling convention was chosen?
30*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31*1a6ac1d5SJesper Dangaard Brouer
32192092faSJesper Dangaard BrouerA: Because BPF programs are designed to run in the linux kernel
33192092faSJesper Dangaard Brouerwhich is written in C, hence BPF defines instruction set compatible
34192092faSJesper Dangaard Brouerwith two most used architectures x64 and arm64 (and takes into
35192092faSJesper Dangaard Brouerconsideration important quirks of other architectures) and
36192092faSJesper Dangaard Brouerdefines calling convention that is compatible with C calling
37192092faSJesper Dangaard Brouerconvention of the linux kernel on those architectures.
38192092faSJesper Dangaard Brouer
39192092faSJesper Dangaard BrouerQ: can multiple return values be supported in the future?
40*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
41192092faSJesper Dangaard BrouerA: NO. BPF allows only register R0 to be used as return value.
42192092faSJesper Dangaard Brouer
43192092faSJesper Dangaard BrouerQ: can more than 5 function arguments be supported in the future?
44*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45192092faSJesper Dangaard BrouerA: NO. BPF calling convention only allows registers R1-R5 to be used
46192092faSJesper Dangaard Broueras arguments. BPF is not a standalone instruction set.
47192092faSJesper Dangaard Brouer(unlike x64 ISA that allows msft, cdecl and other conventions)
48192092faSJesper Dangaard Brouer
49192092faSJesper Dangaard BrouerQ: can BPF programs access instruction pointer or return address?
50*1a6ac1d5SJesper Dangaard Brouer-----------------------------------------------------------------
51192092faSJesper Dangaard BrouerA: NO.
52192092faSJesper Dangaard Brouer
53192092faSJesper Dangaard BrouerQ: can BPF programs access stack pointer ?
54*1a6ac1d5SJesper Dangaard Brouer------------------------------------------
55*1a6ac1d5SJesper Dangaard BrouerA: NO.
56*1a6ac1d5SJesper Dangaard Brouer
57*1a6ac1d5SJesper Dangaard BrouerOnly frame pointer (register R10) is accessible.
58192092faSJesper Dangaard BrouerFrom compiler point of view it's necessary to have stack pointer.
59192092faSJesper Dangaard BrouerFor example LLVM defines register R11 as stack pointer in its
60192092faSJesper Dangaard BrouerBPF backend, but it makes sure that generated code never uses it.
61192092faSJesper Dangaard Brouer
62192092faSJesper Dangaard BrouerQ: Does C-calling convention diminishes possible use cases?
63*1a6ac1d5SJesper Dangaard Brouer-----------------------------------------------------------
64*1a6ac1d5SJesper Dangaard BrouerA: YES.
65*1a6ac1d5SJesper Dangaard Brouer
66*1a6ac1d5SJesper Dangaard BrouerBPF design forces addition of major functionality in the form
67192092faSJesper Dangaard Brouerof kernel helper functions and kernel objects like BPF maps with
68192092faSJesper Dangaard Brouerseamless interoperability between them. It lets kernel call into
69192092faSJesper Dangaard BrouerBPF programs and programs call kernel helpers with zero overhead.
70192092faSJesper Dangaard BrouerAs all of them were native C code. That is particularly the case
71192092faSJesper Dangaard Brouerfor JITed BPF programs that are indistinguishable from
72192092faSJesper Dangaard Brouernative kernel C code.
73192092faSJesper Dangaard Brouer
74192092faSJesper Dangaard BrouerQ: Does it mean that 'innovative' extensions to BPF code are disallowed?
75*1a6ac1d5SJesper Dangaard Brouer------------------------------------------------------------------------
76*1a6ac1d5SJesper Dangaard BrouerA: Soft yes.
77*1a6ac1d5SJesper Dangaard Brouer
78*1a6ac1d5SJesper Dangaard BrouerAt least for now until BPF core has support for
79192092faSJesper Dangaard Brouerbpf-to-bpf calls, indirect calls, loops, global variables,
80192092faSJesper Dangaard Brouerjump tables, read only sections and all other normal constructs
81192092faSJesper Dangaard Brouerthat C code can produce.
82192092faSJesper Dangaard Brouer
83192092faSJesper Dangaard BrouerQ: Can loops be supported in a safe way?
84*1a6ac1d5SJesper Dangaard Brouer----------------------------------------
85*1a6ac1d5SJesper Dangaard BrouerA: It's not clear yet.
86*1a6ac1d5SJesper Dangaard Brouer
87*1a6ac1d5SJesper Dangaard BrouerBPF developers are trying to find a way to
88192092faSJesper Dangaard Brouersupport bounded loops where the verifier can guarantee that
89192092faSJesper Dangaard Brouerthe program terminates in less than 4096 instructions.
90192092faSJesper Dangaard Brouer
91*1a6ac1d5SJesper Dangaard BrouerInstruction level questions
92*1a6ac1d5SJesper Dangaard Brouer---------------------------
93*1a6ac1d5SJesper Dangaard Brouer
94*1a6ac1d5SJesper Dangaard BrouerQ: LD_ABS and LD_IND instructions vs C code
95*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
96*1a6ac1d5SJesper Dangaard Brouer
97192092faSJesper Dangaard BrouerQ: How come LD_ABS and LD_IND instruction are present in BPF whereas
98192092faSJesper Dangaard BrouerC code cannot express them and has to use builtin intrinsics?
99*1a6ac1d5SJesper Dangaard Brouer
100192092faSJesper Dangaard BrouerA: This is artifact of compatibility with classic BPF. Modern
101192092faSJesper Dangaard Brouernetworking code in BPF performs better without them.
102192092faSJesper Dangaard BrouerSee 'direct packet access'.
103192092faSJesper Dangaard Brouer
104*1a6ac1d5SJesper Dangaard BrouerQ: BPF instructions mapping not one-to-one to native CPU
105*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
106192092faSJesper Dangaard BrouerQ: It seems not all BPF instructions are one-to-one to native CPU.
107192092faSJesper Dangaard BrouerFor example why BPF_JNE and other compare and jumps are not cpu-like?
108*1a6ac1d5SJesper Dangaard Brouer
109192092faSJesper Dangaard BrouerA: This was necessary to avoid introducing flags into ISA which are
110192092faSJesper Dangaard Brouerimpossible to make generic and efficient across CPU architectures.
111192092faSJesper Dangaard Brouer
112192092faSJesper Dangaard BrouerQ: why BPF_DIV instruction doesn't map to x64 div?
113*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114192092faSJesper Dangaard BrouerA: Because if we picked one-to-one relationship to x64 it would have made
115192092faSJesper Dangaard Brouerit more complicated to support on arm64 and other archs. Also it
116192092faSJesper Dangaard Brouerneeds div-by-zero runtime check.
117192092faSJesper Dangaard Brouer
118192092faSJesper Dangaard BrouerQ: why there is no BPF_SDIV for signed divide operation?
119*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
120192092faSJesper Dangaard BrouerA: Because it would be rarely used. llvm errors in such case and
121192092faSJesper Dangaard Brouerprints a suggestion to use unsigned divide instead
122192092faSJesper Dangaard Brouer
123192092faSJesper Dangaard BrouerQ: Why BPF has implicit prologue and epilogue?
124*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
125192092faSJesper Dangaard BrouerA: Because architectures like sparc have register windows and in general
126192092faSJesper Dangaard Brouerthere are enough subtle differences between architectures, so naive
127192092faSJesper Dangaard Brouerstore return address into stack won't work. Another reason is BPF has
128192092faSJesper Dangaard Brouerto be safe from division by zero (and legacy exception path
129192092faSJesper Dangaard Brouerof LD_ABS insn). Those instructions need to invoke epilogue and
130192092faSJesper Dangaard Brouerreturn implicitly.
131192092faSJesper Dangaard Brouer
132192092faSJesper Dangaard BrouerQ: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning?
133*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
134192092faSJesper Dangaard BrouerA: Because classic BPF didn't have them and BPF authors felt that compiler
135192092faSJesper Dangaard Brouerworkaround would be acceptable. Turned out that programs lose performance
136192092faSJesper Dangaard Brouerdue to lack of these compare instructions and they were added.
137192092faSJesper Dangaard BrouerThese two instructions is a perfect example what kind of new BPF
138192092faSJesper Dangaard Brouerinstructions are acceptable and can be added in the future.
139192092faSJesper Dangaard BrouerThese two already had equivalent instructions in native CPUs.
140192092faSJesper Dangaard BrouerNew instructions that don't have one-to-one mapping to HW instructions
141192092faSJesper Dangaard Brouerwill not be accepted.
142192092faSJesper Dangaard Brouer
143*1a6ac1d5SJesper Dangaard BrouerQ: BPF 32-bit subregister requirements
144*1a6ac1d5SJesper Dangaard Brouer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
145192092faSJesper Dangaard BrouerQ: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF
146192092faSJesper Dangaard Brouerregisters which makes BPF inefficient virtual machine for 32-bit
147192092faSJesper Dangaard BrouerCPU architectures and 32-bit HW accelerators. Can true 32-bit registers
148192092faSJesper Dangaard Brouerbe added to BPF in the future?
149*1a6ac1d5SJesper Dangaard Brouer
150192092faSJesper Dangaard BrouerA: NO. The first thing to improve performance on 32-bit archs is to teach
151192092faSJesper Dangaard BrouerLLVM to generate code that uses 32-bit subregisters. Then second step
152192092faSJesper Dangaard Broueris to teach verifier to mark operations where zero-ing upper bits
153192092faSJesper Dangaard Broueris unnecessary. Then JITs can take advantage of those markings and
154192092faSJesper Dangaard Brouerdrastically reduce size of generated code and improve performance.
155192092faSJesper Dangaard Brouer
156192092faSJesper Dangaard BrouerQ: Does BPF have a stable ABI?
157*1a6ac1d5SJesper Dangaard Brouer------------------------------
158192092faSJesper Dangaard BrouerA: YES. BPF instructions, arguments to BPF programs, set of helper
159192092faSJesper Dangaard Brouerfunctions and their arguments, recognized return codes are all part
160192092faSJesper Dangaard Brouerof ABI. However when tracing programs are using bpf_probe_read() helper
161192092faSJesper Dangaard Brouerto walk kernel internal datastructures and compile with kernel
162192092faSJesper Dangaard Brouerinternal headers these accesses can and will break with newer
163192092faSJesper Dangaard Brouerkernels. The union bpf_attr -> kern_version is checked at load time
164192092faSJesper Dangaard Brouerto prevent accidentally loading kprobe-based bpf programs written
165192092faSJesper Dangaard Brouerfor a different kernel. Networking programs don't do kern_version check.
166192092faSJesper Dangaard Brouer
167192092faSJesper Dangaard BrouerQ: How much stack space a BPF program uses?
168*1a6ac1d5SJesper Dangaard Brouer-------------------------------------------
169192092faSJesper Dangaard BrouerA: Currently all program types are limited to 512 bytes of stack
170192092faSJesper Dangaard Brouerspace, but the verifier computes the actual amount of stack used
171192092faSJesper Dangaard Brouerand both interpreter and most JITed code consume necessary amount.
172192092faSJesper Dangaard Brouer
173192092faSJesper Dangaard BrouerQ: Can BPF be offloaded to HW?
174*1a6ac1d5SJesper Dangaard Brouer------------------------------
175192092faSJesper Dangaard BrouerA: YES. BPF HW offload is supported by NFP driver.
176192092faSJesper Dangaard Brouer
177192092faSJesper Dangaard BrouerQ: Does classic BPF interpreter still exist?
178*1a6ac1d5SJesper Dangaard Brouer--------------------------------------------
179192092faSJesper Dangaard BrouerA: NO. Classic BPF programs are converted into extend BPF instructions.
180192092faSJesper Dangaard Brouer
181192092faSJesper Dangaard BrouerQ: Can BPF call arbitrary kernel functions?
182*1a6ac1d5SJesper Dangaard Brouer-------------------------------------------
183192092faSJesper Dangaard BrouerA: NO. BPF programs can only call a set of helper functions which
184192092faSJesper Dangaard Broueris defined for every program type.
185192092faSJesper Dangaard Brouer
186192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary kernel memory?
187*1a6ac1d5SJesper Dangaard Brouer---------------------------------------------
188*1a6ac1d5SJesper Dangaard BrouerA: NO.
189*1a6ac1d5SJesper Dangaard Brouer
190*1a6ac1d5SJesper Dangaard BrouerTracing bpf programs can *read* arbitrary memory with bpf_probe_read()
191192092faSJesper Dangaard Brouerand bpf_probe_read_str() helpers. Networking programs cannot read
192192092faSJesper Dangaard Brouerarbitrary memory, since they don't have access to these helpers.
193192092faSJesper Dangaard BrouerPrograms can never read or write arbitrary memory directly.
194192092faSJesper Dangaard Brouer
195192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary user memory?
196*1a6ac1d5SJesper Dangaard Brouer-------------------------------------------
197*1a6ac1d5SJesper Dangaard BrouerA: Sort-of.
198*1a6ac1d5SJesper Dangaard Brouer
199*1a6ac1d5SJesper Dangaard BrouerTracing BPF programs can overwrite the user memory
200192092faSJesper Dangaard Brouerof the current task with bpf_probe_write_user(). Every time such
201192092faSJesper Dangaard Brouerprogram is loaded the kernel will print warning message, so
202192092faSJesper Dangaard Brouerthis helper is only useful for experiments and prototypes.
203192092faSJesper Dangaard BrouerTracing BPF programs are root only.
204192092faSJesper Dangaard Brouer
205*1a6ac1d5SJesper Dangaard BrouerQ: bpf_trace_printk() helper warning
206*1a6ac1d5SJesper Dangaard Brouer------------------------------------
207192092faSJesper Dangaard BrouerQ: When bpf_trace_printk() helper is used the kernel prints nasty
208192092faSJesper Dangaard Brouerwarning message. Why is that?
209*1a6ac1d5SJesper Dangaard Brouer
210192092faSJesper Dangaard BrouerA: This is done to nudge program authors into better interfaces when
211192092faSJesper Dangaard Brouerprograms need to pass data to user space. Like bpf_perf_event_output()
212192092faSJesper Dangaard Brouercan be used to efficiently stream data via perf ring buffer.
213192092faSJesper Dangaard BrouerBPF maps can be used for asynchronous data sharing between kernel
214192092faSJesper Dangaard Brouerand user space. bpf_trace_printk() should only be used for debugging.
215192092faSJesper Dangaard Brouer
216*1a6ac1d5SJesper Dangaard BrouerQ: New functionality via kernel modules?
217*1a6ac1d5SJesper Dangaard Brouer----------------------------------------
218192092faSJesper Dangaard BrouerQ: Can BPF functionality such as new program or map types, new
219192092faSJesper Dangaard Brouerhelpers, etc be added out of kernel module code?
220*1a6ac1d5SJesper Dangaard Brouer
221192092faSJesper Dangaard BrouerA: NO.
222