xref: /openbmc/linux/Documentation/bpf/standardization/instruction-set.rst (revision 4d016ae42efb214d4b441b0654771ddf34c72891)
1.. contents::
2.. sectnum::
3
4========================================
5eBPF Instruction Set Specification, v1.0
6========================================
7
8This document specifies version 1.0 of the eBPF instruction set.
9
10Documentation conventions
11=========================
12
13For brevity, this document uses the type notion "u64", "u32", etc.
14to mean an unsigned integer whose width is the specified number of bits,
15and "s32", etc. to mean a signed integer of the specified number of bits.
16
17Registers and calling convention
18================================
19
20eBPF has 10 general purpose registers and a read-only frame pointer register,
21all of which are 64-bits wide.
22
23The eBPF calling convention is defined as:
24
25* R0: return value from function calls, and exit value for eBPF programs
26* R1 - R5: arguments for function calls
27* R6 - R9: callee saved registers that function calls will preserve
28* R10: read-only frame pointer to access stack
29
30R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
31necessary across calls.
32
33Instruction encoding
34====================
35
36eBPF has two instruction encodings:
37
38* the basic instruction encoding, which uses 64 bits to encode an instruction
39* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
40  constant) value after the basic instruction for a total of 128 bits.
41
42The fields conforming an encoded basic instruction are stored in the
43following order::
44
45  opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF.
46  opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF.
47
48**imm**
49  signed integer immediate value
50
51**offset**
52  signed integer offset used with pointer arithmetic
53
54**src_reg**
55  the source register number (0-10), except where otherwise specified
56  (`64-bit immediate instructions`_ reuse this field for other purposes)
57
58**dst_reg**
59  destination register number (0-10)
60
61**opcode**
62  operation to perform
63
64Note that the contents of multi-byte fields ('imm' and 'offset') are
65stored using big-endian byte ordering in big-endian BPF and
66little-endian byte ordering in little-endian BPF.
67
68For example::
69
70  opcode                  offset imm          assembly
71         src_reg dst_reg
72  07     0       1        00 00  44 33 22 11  r1 += 0x11223344 // little
73         dst_reg src_reg
74  07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big
75
76Note that most instructions do not use all of the fields.
77Unused fields shall be cleared to zero.
78
79As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
80instruction uses a 64-bit immediate value that is constructed as follows.
81The 64 bits following the basic instruction contain a pseudo instruction
82using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
83and imm containing the high 32 bits of the immediate value.
84
85This is depicted in the following figure::
86
87        basic_instruction
88  .-----------------------------.
89  |                             |
90  code:8 regs:8 offset:16 imm:32 unused:32 imm:32
91                                 |              |
92                                 '--------------'
93                                pseudo instruction
94
95Thus the 64-bit immediate value is constructed as follows:
96
97  imm64 = (next_imm << 32) | imm
98
99where 'next_imm' refers to the imm value of the pseudo instruction
100following the basic instruction.  The unused bytes in the pseudo
101instruction are reserved and shall be cleared to zero.
102
103Instruction classes
104-------------------
105
106The three LSB bits of the 'opcode' field store the instruction class:
107
108=========  =====  ===============================  ===================================
109class      value  description                      reference
110=========  =====  ===============================  ===================================
111BPF_LD     0x00   non-standard load operations     `Load and store instructions`_
112BPF_LDX    0x01   load into register operations    `Load and store instructions`_
113BPF_ST     0x02   store from immediate operations  `Load and store instructions`_
114BPF_STX    0x03   store from register operations   `Load and store instructions`_
115BPF_ALU    0x04   32-bit arithmetic operations     `Arithmetic and jump instructions`_
116BPF_JMP    0x05   64-bit jump operations           `Arithmetic and jump instructions`_
117BPF_JMP32  0x06   32-bit jump operations           `Arithmetic and jump instructions`_
118BPF_ALU64  0x07   64-bit arithmetic operations     `Arithmetic and jump instructions`_
119=========  =====  ===============================  ===================================
120
121Arithmetic and jump instructions
122================================
123
124For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and
125``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts:
126
127==============  ======  =================
1284 bits (MSB)    1 bit   3 bits (LSB)
129==============  ======  =================
130code            source  instruction class
131==============  ======  =================
132
133**code**
134  the operation code, whose meaning varies by instruction class
135
136**source**
137  the source operand location, which unless otherwise specified is one of:
138
139  ======  =====  ==============================================
140  source  value  description
141  ======  =====  ==============================================
142  BPF_K   0x00   use 32-bit 'imm' value as source operand
143  BPF_X   0x08   use 'src_reg' register value as source operand
144  ======  =====  ==============================================
145
146**instruction class**
147  the instruction class (see `Instruction classes`_)
148
149Arithmetic instructions
150-----------------------
151
152``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
153otherwise identical operations.
154The 'code' field encodes the operation as below, where 'src' and 'dst' refer
155to the values of the source and destination registers, respectively.
156
157=========  =====  =======  ==========================================================
158code       value  offset   description
159=========  =====  =======  ==========================================================
160BPF_ADD    0x00   0        dst += src
161BPF_SUB    0x10   0        dst -= src
162BPF_MUL    0x20   0        dst \*= src
163BPF_DIV    0x30   0        dst = (src != 0) ? (dst / src) : 0
164BPF_SDIV   0x30   1        dst = (src != 0) ? (dst s/ src) : 0
165BPF_OR     0x40   0        dst \|= src
166BPF_AND    0x50   0        dst &= src
167BPF_LSH    0x60   0        dst <<= (src & mask)
168BPF_RSH    0x70   0        dst >>= (src & mask)
169BPF_NEG    0x80   0        dst = -dst
170BPF_MOD    0x90   0        dst = (src != 0) ? (dst % src) : dst
171BPF_SMOD   0x90   1        dst = (src != 0) ? (dst s% src) : dst
172BPF_XOR    0xa0   0        dst ^= src
173BPF_MOV    0xb0   0        dst = src
174BPF_MOVSX  0xb0   8/16/32  dst = (s8,s16,s32)src
175BPF_ARSH   0xc0   0        sign extending dst >>= (src & mask)
176BPF_END    0xd0   0        byte swap operations (see `Byte swap instructions`_ below)
177=========  =====  =======  ==========================================================
178
179Underflow and overflow are allowed during arithmetic operations, meaning
180the 64-bit or 32-bit value will wrap. If eBPF program execution would
181result in division by zero, the destination register is instead set to zero.
182If execution would result in modulo by zero, for ``BPF_ALU64`` the value of
183the destination register is unchanged whereas for ``BPF_ALU`` the upper
18432 bits of the destination register are zeroed.
185
186``BPF_ADD | BPF_X | BPF_ALU`` means::
187
188  dst = (u32) ((u32) dst + (u32) src)
189
190where '(u32)' indicates that the upper 32 bits are zeroed.
191
192``BPF_ADD | BPF_X | BPF_ALU64`` means::
193
194  dst = dst + src
195
196``BPF_XOR | BPF_K | BPF_ALU`` means::
197
198  dst = (u32) dst ^ (u32) imm32
199
200``BPF_XOR | BPF_K | BPF_ALU64`` means::
201
202  dst = dst ^ imm32
203
204Note that most instructions have instruction offset of 0. Only three instructions
205(``BPF_SDIV``, ``BPF_SMOD``, ``BPF_MOVSX``) have a non-zero offset.
206
207The devision and modulo operations support both unsigned and signed flavors.
208
209For unsigned operations (``BPF_DIV`` and ``BPF_MOD``), for ``BPF_ALU``,
210'imm' is interpreted as a 32-bit unsigned value. For ``BPF_ALU64``,
211'imm' is first sign extended from 32 to 64 bits, and then interpreted as
212a 64-bit unsigned value.
213
214For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``,
215'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm'
216is first sign extended from 32 to 64 bits, and then interpreted as a
21764-bit signed value.
218
219The ``BPF_MOVSX`` instruction does a move operation with sign extension.
220``BPF_ALU | BPF_MOVSX`` sign extends 8-bit and 16-bit operands into 32
221bit operands, and zeroes the remaining upper 32 bits.
222``BPF_ALU64 | BPF_MOVSX`` sign extends 8-bit, 16-bit, and 32-bit
223operands into 64 bit operands.
224
225Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31)
226for 32-bit operations.
227
228Byte swap instructions
229----------------------
230
231The byte swap instructions use instruction classes of ``BPF_ALU`` and ``BPF_ALU64``
232and a 4-bit 'code' field of ``BPF_END``.
233
234The byte swap instructions operate on the destination register
235only and do not use a separate source register or immediate value.
236
237For ``BPF_ALU``, the 1-bit source operand field in the opcode is used to
238select what byte order the operation converts from or to. For
239``BPF_ALU64``, the 1-bit source operand field in the opcode is reserved
240and must be set to 0.
241
242=========  =========  =====  =================================================
243class      source     value  description
244=========  =========  =====  =================================================
245BPF_ALU    BPF_TO_LE  0x00   convert between host byte order and little endian
246BPF_ALU    BPF_TO_BE  0x08   convert between host byte order and big endian
247BPF_ALU64  Reserved   0x00   do byte swap unconditionally
248=========  =========  =====  =================================================
249
250The 'imm' field encodes the width of the swap operations.  The following widths
251are supported: 16, 32 and 64.
252
253Examples:
254
255``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
256
257  dst = htole16(dst)
258
259``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
260
261  dst = htobe64(dst)
262
263``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
264
265  dst = bswap16 dst
266  dst = bswap32 dst
267  dst = bswap64 dst
268
269Jump instructions
270-----------------
271
272``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for
273otherwise identical operations.
274The 'code' field encodes the operation as below:
275
276========  =====  ===  ===========================================  =========================================
277code      value  src  description                                  notes
278========  =====  ===  ===========================================  =========================================
279BPF_JA    0x0    0x0  PC += offset                                 BPF_JMP class
280BPF_JA    0x0    0x0  PC += imm                                    BPF_JMP32 class
281BPF_JEQ   0x1    any  PC += offset if dst == src
282BPF_JGT   0x2    any  PC += offset if dst > src                    unsigned
283BPF_JGE   0x3    any  PC += offset if dst >= src                   unsigned
284BPF_JSET  0x4    any  PC += offset if dst & src
285BPF_JNE   0x5    any  PC += offset if dst != src
286BPF_JSGT  0x6    any  PC += offset if dst > src                    signed
287BPF_JSGE  0x7    any  PC += offset if dst >= src                   signed
288BPF_CALL  0x8    0x0  call helper function by address              see `Helper functions`_
289BPF_CALL  0x8    0x1  call PC += offset                            see `Program-local functions`_
290BPF_CALL  0x8    0x2  call helper function by BTF ID               see `Helper functions`_
291BPF_EXIT  0x9    0x0  return                                       BPF_JMP only
292BPF_JLT   0xa    any  PC += offset if dst < src                    unsigned
293BPF_JLE   0xb    any  PC += offset if dst <= src                   unsigned
294BPF_JSLT  0xc    any  PC += offset if dst < src                    signed
295BPF_JSLE  0xd    any  PC += offset if dst <= src                   signed
296========  =====  ===  ===========================================  =========================================
297
298The eBPF program needs to store the return value into register R0 before doing a
299``BPF_EXIT``.
300
301Example:
302
303``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means::
304
305  if (s32)dst s>= (s32)src goto +offset
306
307where 's>=' indicates a signed '>=' comparison.
308
309``BPF_JA | BPF_K | BPF_JMP32`` (0x06) means::
310
311  gotol +imm
312
313where 'imm' means the branch offset comes from insn 'imm' field.
314
315Note that there are two flavors of ``BPF_JA`` instructions. The
316``BPF_JMP`` class permits a 16-bit jump offset specified by the 'offset'
317field, whereas the ``BPF_JMP32`` class permits a 32-bit jump offset
318specified by the 'imm' field. A > 16-bit conditional jump may be
319converted to a < 16-bit conditional jump plus a 32-bit unconditional
320jump.
321
322Helper functions
323~~~~~~~~~~~~~~~~
324
325Helper functions are a concept whereby BPF programs can call into a
326set of function calls exposed by the underlying platform.
327
328Historically, each helper function was identified by an address
329encoded in the imm field.  The available helper functions may differ
330for each program type, but address values are unique across all program types.
331
332Platforms that support the BPF Type Format (BTF) support identifying
333a helper function by a BTF ID encoded in the imm field, where the BTF ID
334identifies the helper name and type.
335
336Program-local functions
337~~~~~~~~~~~~~~~~~~~~~~~
338Program-local functions are functions exposed by the same BPF program as the
339caller, and are referenced by offset from the call instruction, similar to
340``BPF_JA``.  A ``BPF_EXIT`` within the program-local function will return to
341the caller.
342
343Load and store instructions
344===========================
345
346For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the
3478-bit 'opcode' field is divided as:
348
349============  ======  =================
3503 bits (MSB)  2 bits  3 bits (LSB)
351============  ======  =================
352mode          size    instruction class
353============  ======  =================
354
355The mode modifier is one of:
356
357  =============  =====  ====================================  =============
358  mode modifier  value  description                           reference
359  =============  =====  ====================================  =============
360  BPF_IMM        0x00   64-bit immediate instructions         `64-bit immediate instructions`_
361  BPF_ABS        0x20   legacy BPF packet access (absolute)   `Legacy BPF Packet access instructions`_
362  BPF_IND        0x40   legacy BPF packet access (indirect)   `Legacy BPF Packet access instructions`_
363  BPF_MEM        0x60   regular load and store operations     `Regular load and store operations`_
364  BPF_MEMSX      0x80   sign-extension load operations        `Sign-extension load operations`_
365  BPF_ATOMIC     0xc0   atomic operations                     `Atomic operations`_
366  =============  =====  ====================================  =============
367
368The size modifier is one of:
369
370  =============  =====  =====================
371  size modifier  value  description
372  =============  =====  =====================
373  BPF_W          0x00   word        (4 bytes)
374  BPF_H          0x08   half word   (2 bytes)
375  BPF_B          0x10   byte
376  BPF_DW         0x18   double word (8 bytes)
377  =============  =====  =====================
378
379Regular load and store operations
380---------------------------------
381
382The ``BPF_MEM`` mode modifier is used to encode regular load and store
383instructions that transfer data between a register and memory.
384
385``BPF_MEM | <size> | BPF_STX`` means::
386
387  *(size *) (dst + offset) = src
388
389``BPF_MEM | <size> | BPF_ST`` means::
390
391  *(size *) (dst + offset) = imm32
392
393``BPF_MEM | <size> | BPF_LDX`` means::
394
395  dst = *(unsigned size *) (src + offset)
396
397Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW`` and
398'unsigned size' is one of u8, u16, u32 or u64.
399
400Sign-extension load operations
401------------------------------
402
403The ``BPF_MEMSX`` mode modifier is used to encode sign-extension load
404instructions that transfer data between a register and memory.
405
406``BPF_MEMSX | <size> | BPF_LDX`` means::
407
408  dst = *(signed size *) (src + offset)
409
410Where size is one of: ``BPF_B``, ``BPF_H`` or ``BPF_W``, and
411'signed size' is one of s8, s16 or s32.
412
413Atomic operations
414-----------------
415
416Atomic operations are operations that operate on memory and can not be
417interrupted or corrupted by other access to the same memory region
418by other eBPF programs or means outside of this specification.
419
420All atomic operations supported by eBPF are encoded as store operations
421that use the ``BPF_ATOMIC`` mode modifier as follows:
422
423* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations
424* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations
425* 8-bit and 16-bit wide atomic operations are not supported.
426
427The 'imm' field is used to encode the actual atomic operation.
428Simple atomic operation use a subset of the values defined to encode
429arithmetic operations in the 'imm' field to encode the atomic operation:
430
431========  =====  ===========
432imm       value  description
433========  =====  ===========
434BPF_ADD   0x00   atomic add
435BPF_OR    0x40   atomic or
436BPF_AND   0x50   atomic and
437BPF_XOR   0xa0   atomic xor
438========  =====  ===========
439
440
441``BPF_ATOMIC | BPF_W  | BPF_STX`` with 'imm' = BPF_ADD means::
442
443  *(u32 *)(dst + offset) += src
444
445``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::
446
447  *(u64 *)(dst + offset) += src
448
449In addition to the simple atomic operations, there also is a modifier and
450two complex atomic operations:
451
452===========  ================  ===========================
453imm          value             description
454===========  ================  ===========================
455BPF_FETCH    0x01              modifier: return old value
456BPF_XCHG     0xe0 | BPF_FETCH  atomic exchange
457BPF_CMPXCHG  0xf0 | BPF_FETCH  atomic compare and exchange
458===========  ================  ===========================
459
460The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
461always set for the complex atomic operations.  If the ``BPF_FETCH`` flag
462is set, then the operation also overwrites ``src`` with the value that
463was in memory before it was modified.
464
465The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
466addressed by ``dst + offset``.
467
468The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
469``dst + offset`` with ``R0``. If they match, the value addressed by
470``dst + offset`` is replaced with ``src``. In either case, the
471value that was at ``dst + offset`` before the operation is zero-extended
472and loaded back to ``R0``.
473
47464-bit immediate instructions
475-----------------------------
476
477Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
478encoding defined in `Instruction encoding`_, and use the 'src' field of the
479basic instruction to hold an opcode subtype.
480
481The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD`` instructions
482with opcode subtypes in the 'src' field, using new terms such as "map"
483defined further below:
484
485=========================  ======  ===  =========================================  ===========  ==============
486opcode construction        opcode  src  pseudocode                                 imm type     dst type
487=========================  ======  ===  =========================================  ===========  ==============
488BPF_IMM | BPF_DW | BPF_LD  0x18    0x0  dst = imm64                                integer      integer
489BPF_IMM | BPF_DW | BPF_LD  0x18    0x1  dst = map_by_fd(imm)                       map fd       map
490BPF_IMM | BPF_DW | BPF_LD  0x18    0x2  dst = map_val(map_by_fd(imm)) + next_imm   map fd       data pointer
491BPF_IMM | BPF_DW | BPF_LD  0x18    0x3  dst = var_addr(imm)                        variable id  data pointer
492BPF_IMM | BPF_DW | BPF_LD  0x18    0x4  dst = code_addr(imm)                       integer      code pointer
493BPF_IMM | BPF_DW | BPF_LD  0x18    0x5  dst = map_by_idx(imm)                      map index    map
494BPF_IMM | BPF_DW | BPF_LD  0x18    0x6  dst = map_val(map_by_idx(imm)) + next_imm  map index    data pointer
495=========================  ======  ===  =========================================  ===========  ==============
496
497where
498
499* map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_)
500* map_by_idx(imm) means to convert a 32-bit index into an address of a map
501* map_val(map) gets the address of the first value in a given map
502* var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id
503* code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions
504* the 'imm type' can be used by disassemblers for display
505* the 'dst type' can be used for verification and JIT compilation purposes
506
507Maps
508~~~~
509
510Maps are shared memory regions accessible by eBPF programs on some platforms.
511A map can have various semantics as defined in a separate document, and may or
512may not have a single contiguous memory region, but the 'map_val(map)' is
513currently only defined for maps that do have a single contiguous memory region.
514
515Each map can have a file descriptor (fd) if supported by the platform, where
516'map_by_fd(imm)' means to get the map with the specified file descriptor. Each
517BPF program can also be defined to use a set of maps associated with the
518program at load time, and 'map_by_idx(imm)' means to get the map with the given
519index in the set associated with the BPF program containing the instruction.
520
521Platform Variables
522~~~~~~~~~~~~~~~~~~
523
524Platform variables are memory regions, identified by integer ids, exposed by
525the runtime and accessible by BPF programs on some platforms.  The
526'var_addr(imm)' operation means to get the address of the memory region
527identified by the given id.
528
529Legacy BPF Packet access instructions
530-------------------------------------
531
532eBPF previously introduced special instructions for access to packet data that were
533carried over from classic BPF. However, these instructions are
534deprecated and should no longer be used.
535