xref: /openbmc/linux/Documentation/bpf/bpf_design_QA.rst (revision 192092faa02dd5e5d1ff875d7512a5d803db95a0)
1*192092faSJesper Dangaard BrouerBPF extensibility and applicability to networking, tracing, security
2*192092faSJesper Dangaard Brouerin the linux kernel and several user space implementations of BPF
3*192092faSJesper Dangaard Brouervirtual machine led to a number of misunderstanding on what BPF actually is.
4*192092faSJesper Dangaard BrouerThis short QA is an attempt to address that and outline a direction
5*192092faSJesper Dangaard Brouerof where BPF is heading long term.
6*192092faSJesper Dangaard Brouer
7*192092faSJesper Dangaard BrouerQ: Is BPF a generic instruction set similar to x64 and arm64?
8*192092faSJesper Dangaard BrouerA: NO.
9*192092faSJesper Dangaard Brouer
10*192092faSJesper Dangaard BrouerQ: Is BPF a generic virtual machine ?
11*192092faSJesper Dangaard BrouerA: NO.
12*192092faSJesper Dangaard Brouer
13*192092faSJesper Dangaard BrouerBPF is generic instruction set _with_ C calling convention.
14*192092faSJesper Dangaard Brouer
15*192092faSJesper Dangaard BrouerQ: Why C calling convention was chosen?
16*192092faSJesper Dangaard BrouerA: Because BPF programs are designed to run in the linux kernel
17*192092faSJesper Dangaard Brouer   which is written in C, hence BPF defines instruction set compatible
18*192092faSJesper Dangaard Brouer   with two most used architectures x64 and arm64 (and takes into
19*192092faSJesper Dangaard Brouer   consideration important quirks of other architectures) and
20*192092faSJesper Dangaard Brouer   defines calling convention that is compatible with C calling
21*192092faSJesper Dangaard Brouer   convention of the linux kernel on those architectures.
22*192092faSJesper Dangaard Brouer
23*192092faSJesper Dangaard BrouerQ: can multiple return values be supported in the future?
24*192092faSJesper Dangaard BrouerA: NO. BPF allows only register R0 to be used as return value.
25*192092faSJesper Dangaard Brouer
26*192092faSJesper Dangaard BrouerQ: can more than 5 function arguments be supported in the future?
27*192092faSJesper Dangaard BrouerA: NO. BPF calling convention only allows registers R1-R5 to be used
28*192092faSJesper Dangaard Brouer   as arguments. BPF is not a standalone instruction set.
29*192092faSJesper Dangaard Brouer   (unlike x64 ISA that allows msft, cdecl and other conventions)
30*192092faSJesper Dangaard Brouer
31*192092faSJesper Dangaard BrouerQ: can BPF programs access instruction pointer or return address?
32*192092faSJesper Dangaard BrouerA: NO.
33*192092faSJesper Dangaard Brouer
34*192092faSJesper Dangaard BrouerQ: can BPF programs access stack pointer ?
35*192092faSJesper Dangaard BrouerA: NO. Only frame pointer (register R10) is accessible.
36*192092faSJesper Dangaard Brouer   From compiler point of view it's necessary to have stack pointer.
37*192092faSJesper Dangaard Brouer   For example LLVM defines register R11 as stack pointer in its
38*192092faSJesper Dangaard Brouer   BPF backend, but it makes sure that generated code never uses it.
39*192092faSJesper Dangaard Brouer
40*192092faSJesper Dangaard BrouerQ: Does C-calling convention diminishes possible use cases?
41*192092faSJesper Dangaard BrouerA: YES. BPF design forces addition of major functionality in the form
42*192092faSJesper Dangaard Brouer   of kernel helper functions and kernel objects like BPF maps with
43*192092faSJesper Dangaard Brouer   seamless interoperability between them. It lets kernel call into
44*192092faSJesper Dangaard Brouer   BPF programs and programs call kernel helpers with zero overhead.
45*192092faSJesper Dangaard Brouer   As all of them were native C code. That is particularly the case
46*192092faSJesper Dangaard Brouer   for JITed BPF programs that are indistinguishable from
47*192092faSJesper Dangaard Brouer   native kernel C code.
48*192092faSJesper Dangaard Brouer
49*192092faSJesper Dangaard BrouerQ: Does it mean that 'innovative' extensions to BPF code are disallowed?
50*192092faSJesper Dangaard BrouerA: Soft yes. At least for now until BPF core has support for
51*192092faSJesper Dangaard Brouer   bpf-to-bpf calls, indirect calls, loops, global variables,
52*192092faSJesper Dangaard Brouer   jump tables, read only sections and all other normal constructs
53*192092faSJesper Dangaard Brouer   that C code can produce.
54*192092faSJesper Dangaard Brouer
55*192092faSJesper Dangaard BrouerQ: Can loops be supported in a safe way?
56*192092faSJesper Dangaard BrouerA: It's not clear yet. BPF developers are trying to find a way to
57*192092faSJesper Dangaard Brouer   support bounded loops where the verifier can guarantee that
58*192092faSJesper Dangaard Brouer   the program terminates in less than 4096 instructions.
59*192092faSJesper Dangaard Brouer
60*192092faSJesper Dangaard BrouerQ: How come LD_ABS and LD_IND instruction are present in BPF whereas
61*192092faSJesper Dangaard Brouer   C code cannot express them and has to use builtin intrinsics?
62*192092faSJesper Dangaard BrouerA: This is artifact of compatibility with classic BPF. Modern
63*192092faSJesper Dangaard Brouer   networking code in BPF performs better without them.
64*192092faSJesper Dangaard Brouer   See 'direct packet access'.
65*192092faSJesper Dangaard Brouer
66*192092faSJesper Dangaard BrouerQ: It seems not all BPF instructions are one-to-one to native CPU.
67*192092faSJesper Dangaard Brouer   For example why BPF_JNE and other compare and jumps are not cpu-like?
68*192092faSJesper Dangaard BrouerA: This was necessary to avoid introducing flags into ISA which are
69*192092faSJesper Dangaard Brouer   impossible to make generic and efficient across CPU architectures.
70*192092faSJesper Dangaard Brouer
71*192092faSJesper Dangaard BrouerQ: why BPF_DIV instruction doesn't map to x64 div?
72*192092faSJesper Dangaard BrouerA: Because if we picked one-to-one relationship to x64 it would have made
73*192092faSJesper Dangaard Brouer   it more complicated to support on arm64 and other archs. Also it
74*192092faSJesper Dangaard Brouer   needs div-by-zero runtime check.
75*192092faSJesper Dangaard Brouer
76*192092faSJesper Dangaard BrouerQ: why there is no BPF_SDIV for signed divide operation?
77*192092faSJesper Dangaard BrouerA: Because it would be rarely used. llvm errors in such case and
78*192092faSJesper Dangaard Brouer   prints a suggestion to use unsigned divide instead
79*192092faSJesper Dangaard Brouer
80*192092faSJesper Dangaard BrouerQ: Why BPF has implicit prologue and epilogue?
81*192092faSJesper Dangaard BrouerA: Because architectures like sparc have register windows and in general
82*192092faSJesper Dangaard Brouer   there are enough subtle differences between architectures, so naive
83*192092faSJesper Dangaard Brouer   store return address into stack won't work. Another reason is BPF has
84*192092faSJesper Dangaard Brouer   to be safe from division by zero (and legacy exception path
85*192092faSJesper Dangaard Brouer   of LD_ABS insn). Those instructions need to invoke epilogue and
86*192092faSJesper Dangaard Brouer   return implicitly.
87*192092faSJesper Dangaard Brouer
88*192092faSJesper Dangaard BrouerQ: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning?
89*192092faSJesper Dangaard BrouerA: Because classic BPF didn't have them and BPF authors felt that compiler
90*192092faSJesper Dangaard Brouer   workaround would be acceptable. Turned out that programs lose performance
91*192092faSJesper Dangaard Brouer   due to lack of these compare instructions and they were added.
92*192092faSJesper Dangaard Brouer   These two instructions is a perfect example what kind of new BPF
93*192092faSJesper Dangaard Brouer   instructions are acceptable and can be added in the future.
94*192092faSJesper Dangaard Brouer   These two already had equivalent instructions in native CPUs.
95*192092faSJesper Dangaard Brouer   New instructions that don't have one-to-one mapping to HW instructions
96*192092faSJesper Dangaard Brouer   will not be accepted.
97*192092faSJesper Dangaard Brouer
98*192092faSJesper Dangaard BrouerQ: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF
99*192092faSJesper Dangaard Brouer   registers which makes BPF inefficient virtual machine for 32-bit
100*192092faSJesper Dangaard Brouer   CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
101*192092faSJesper Dangaard Brouer   be added to BPF in the future?
102*192092faSJesper Dangaard BrouerA: NO. The first thing to improve performance on 32-bit archs is to teach
103*192092faSJesper Dangaard Brouer   LLVM to generate code that uses 32-bit subregisters. Then second step
104*192092faSJesper Dangaard Brouer   is to teach verifier to mark operations where zero-ing upper bits
105*192092faSJesper Dangaard Brouer   is unnecessary. Then JITs can take advantage of those markings and
106*192092faSJesper Dangaard Brouer   drastically reduce size of generated code and improve performance.
107*192092faSJesper Dangaard Brouer
108*192092faSJesper Dangaard BrouerQ: Does BPF have a stable ABI?
109*192092faSJesper Dangaard BrouerA: YES. BPF instructions, arguments to BPF programs, set of helper
110*192092faSJesper Dangaard Brouer   functions and their arguments, recognized return codes are all part
111*192092faSJesper Dangaard Brouer   of ABI. However when tracing programs are using bpf_probe_read() helper
112*192092faSJesper Dangaard Brouer   to walk kernel internal datastructures and compile with kernel
113*192092faSJesper Dangaard Brouer   internal headers these accesses can and will break with newer
114*192092faSJesper Dangaard Brouer   kernels. The union bpf_attr -> kern_version is checked at load time
115*192092faSJesper Dangaard Brouer   to prevent accidentally loading kprobe-based bpf programs written
116*192092faSJesper Dangaard Brouer   for a different kernel. Networking programs don't do kern_version check.
117*192092faSJesper Dangaard Brouer
118*192092faSJesper Dangaard BrouerQ: How much stack space a BPF program uses?
119*192092faSJesper Dangaard BrouerA: Currently all program types are limited to 512 bytes of stack
120*192092faSJesper Dangaard Brouer   space, but the verifier computes the actual amount of stack used
121*192092faSJesper Dangaard Brouer   and both interpreter and most JITed code consume necessary amount.
122*192092faSJesper Dangaard Brouer
123*192092faSJesper Dangaard BrouerQ: Can BPF be offloaded to HW?
124*192092faSJesper Dangaard BrouerA: YES. BPF HW offload is supported by NFP driver.
125*192092faSJesper Dangaard Brouer
126*192092faSJesper Dangaard BrouerQ: Does classic BPF interpreter still exist?
127*192092faSJesper Dangaard BrouerA: NO. Classic BPF programs are converted into extend BPF instructions.
128*192092faSJesper Dangaard Brouer
129*192092faSJesper Dangaard BrouerQ: Can BPF call arbitrary kernel functions?
130*192092faSJesper Dangaard BrouerA: NO. BPF programs can only call a set of helper functions which
131*192092faSJesper Dangaard Brouer   is defined for every program type.
132*192092faSJesper Dangaard Brouer
133*192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary kernel memory?
134*192092faSJesper Dangaard BrouerA: NO. Tracing bpf programs can _read_ arbitrary memory with bpf_probe_read()
135*192092faSJesper Dangaard Brouer   and bpf_probe_read_str() helpers. Networking programs cannot read
136*192092faSJesper Dangaard Brouer   arbitrary memory, since they don't have access to these helpers.
137*192092faSJesper Dangaard Brouer   Programs can never read or write arbitrary memory directly.
138*192092faSJesper Dangaard Brouer
139*192092faSJesper Dangaard BrouerQ: Can BPF overwrite arbitrary user memory?
140*192092faSJesper Dangaard BrouerA: Sort-of. Tracing BPF programs can overwrite the user memory
141*192092faSJesper Dangaard Brouer   of the current task with bpf_probe_write_user(). Every time such
142*192092faSJesper Dangaard Brouer   program is loaded the kernel will print warning message, so
143*192092faSJesper Dangaard Brouer   this helper is only useful for experiments and prototypes.
144*192092faSJesper Dangaard Brouer   Tracing BPF programs are root only.
145*192092faSJesper Dangaard Brouer
146*192092faSJesper Dangaard BrouerQ: When bpf_trace_printk() helper is used the kernel prints nasty
147*192092faSJesper Dangaard Brouer   warning message. Why is that?
148*192092faSJesper Dangaard BrouerA: This is done to nudge program authors into better interfaces when
149*192092faSJesper Dangaard Brouer   programs need to pass data to user space. Like bpf_perf_event_output()
150*192092faSJesper Dangaard Brouer   can be used to efficiently stream data via perf ring buffer.
151*192092faSJesper Dangaard Brouer   BPF maps can be used for asynchronous data sharing between kernel
152*192092faSJesper Dangaard Brouer   and user space. bpf_trace_printk() should only be used for debugging.
153*192092faSJesper Dangaard Brouer
154*192092faSJesper Dangaard BrouerQ: Can BPF functionality such as new program or map types, new
155*192092faSJesper Dangaard Brouer   helpers, etc be added out of kernel module code?
156*192092faSJesper Dangaard BrouerA: NO.
157