1.. contents:: 2.. sectnum:: 3 4======================================== 5eBPF Instruction Set Specification, v1.0 6======================================== 7 8This document specifies version 1.0 of the eBPF instruction set. 9 10Documentation conventions 11========================= 12 13For brevity, this document uses the type notion "u64", "u32", etc. 14to mean an unsigned integer whose width is the specified number of bits, 15and "s32", etc. to mean a signed integer of the specified number of bits. 16 17Registers and calling convention 18================================ 19 20eBPF has 10 general purpose registers and a read-only frame pointer register, 21all of which are 64-bits wide. 22 23The eBPF calling convention is defined as: 24 25* R0: return value from function calls, and exit value for eBPF programs 26* R1 - R5: arguments for function calls 27* R6 - R9: callee saved registers that function calls will preserve 28* R10: read-only frame pointer to access stack 29 30R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if 31necessary across calls. 32 33Instruction encoding 34==================== 35 36eBPF has two instruction encodings: 37 38* the basic instruction encoding, which uses 64 bits to encode an instruction 39* the wide instruction encoding, which appends a second 64-bit immediate (i.e., 40 constant) value after the basic instruction for a total of 128 bits. 41 42The fields conforming an encoded basic instruction are stored in the 43following order:: 44 45 opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF. 46 opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF. 47 48**imm** 49 signed integer immediate value 50 51**offset** 52 signed integer offset used with pointer arithmetic 53 54**src_reg** 55 the source register number (0-10), except where otherwise specified 56 (`64-bit immediate instructions`_ reuse this field for other purposes) 57 58**dst_reg** 59 destination register number (0-10) 60 61**opcode** 62 operation to perform 63 64Note that the contents of multi-byte fields ('imm' and 'offset') are 65stored using big-endian byte ordering in big-endian BPF and 66little-endian byte ordering in little-endian BPF. 67 68For example:: 69 70 opcode offset imm assembly 71 src_reg dst_reg 72 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little 73 dst_reg src_reg 74 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big 75 76Note that most instructions do not use all of the fields. 77Unused fields shall be cleared to zero. 78 79As discussed below in `64-bit immediate instructions`_, a 64-bit immediate 80instruction uses a 64-bit immediate value that is constructed as follows. 81The 64 bits following the basic instruction contain a pseudo instruction 82using the same format but with opcode, dst_reg, src_reg, and offset all set to zero, 83and imm containing the high 32 bits of the immediate value. 84 85This is depicted in the following figure:: 86 87 basic_instruction 88 .-----------------------------. 89 | | 90 code:8 regs:8 offset:16 imm:32 unused:32 imm:32 91 | | 92 '--------------' 93 pseudo instruction 94 95Thus the 64-bit immediate value is constructed as follows: 96 97 imm64 = (next_imm << 32) | imm 98 99where 'next_imm' refers to the imm value of the pseudo instruction 100following the basic instruction. The unused bytes in the pseudo 101instruction are reserved and shall be cleared to zero. 102 103Instruction classes 104------------------- 105 106The three LSB bits of the 'opcode' field store the instruction class: 107 108========= ===== =============================== =================================== 109class value description reference 110========= ===== =============================== =================================== 111BPF_LD 0x00 non-standard load operations `Load and store instructions`_ 112BPF_LDX 0x01 load into register operations `Load and store instructions`_ 113BPF_ST 0x02 store from immediate operations `Load and store instructions`_ 114BPF_STX 0x03 store from register operations `Load and store instructions`_ 115BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump instructions`_ 116BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump instructions`_ 117BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump instructions`_ 118BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump instructions`_ 119========= ===== =============================== =================================== 120 121Arithmetic and jump instructions 122================================ 123 124For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and 125``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts: 126 127============== ====== ================= 1284 bits (MSB) 1 bit 3 bits (LSB) 129============== ====== ================= 130code source instruction class 131============== ====== ================= 132 133**code** 134 the operation code, whose meaning varies by instruction class 135 136**source** 137 the source operand location, which unless otherwise specified is one of: 138 139 ====== ===== ============================================== 140 source value description 141 ====== ===== ============================================== 142 BPF_K 0x00 use 32-bit 'imm' value as source operand 143 BPF_X 0x08 use 'src_reg' register value as source operand 144 ====== ===== ============================================== 145 146**instruction class** 147 the instruction class (see `Instruction classes`_) 148 149Arithmetic instructions 150----------------------- 151 152``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for 153otherwise identical operations. 154The 'code' field encodes the operation as below, where 'src' and 'dst' refer 155to the values of the source and destination registers, respectively. 156 157======== ===== ========================================================== 158code value description 159======== ===== ========================================================== 160BPF_ADD 0x00 dst += src 161BPF_SUB 0x10 dst -= src 162BPF_MUL 0x20 dst \*= src 163BPF_DIV 0x30 dst = (src != 0) ? (dst / src) : 0 164BPF_OR 0x40 dst \|= src 165BPF_AND 0x50 dst &= src 166BPF_LSH 0x60 dst <<= (src & mask) 167BPF_RSH 0x70 dst >>= (src & mask) 168BPF_NEG 0x80 dst = -src 169BPF_MOD 0x90 dst = (src != 0) ? (dst % src) : dst 170BPF_XOR 0xa0 dst ^= src 171BPF_MOV 0xb0 dst = src 172BPF_ARSH 0xc0 sign extending dst >>= (src & mask) 173BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below) 174======== ===== ========================================================== 175 176Underflow and overflow are allowed during arithmetic operations, meaning 177the 64-bit or 32-bit value will wrap. If eBPF program execution would 178result in division by zero, the destination register is instead set to zero. 179If execution would result in modulo by zero, for ``BPF_ALU64`` the value of 180the destination register is unchanged whereas for ``BPF_ALU`` the upper 18132 bits of the destination register are zeroed. 182 183``BPF_ADD | BPF_X | BPF_ALU`` means:: 184 185 dst = (u32) ((u32) dst + (u32) src) 186 187where '(u32)' indicates that the upper 32 bits are zeroed. 188 189``BPF_ADD | BPF_X | BPF_ALU64`` means:: 190 191 dst = dst + src 192 193``BPF_XOR | BPF_K | BPF_ALU`` means:: 194 195 dst = (u32) dst ^ (u32) imm32 196 197``BPF_XOR | BPF_K | BPF_ALU64`` means:: 198 199 dst = dst ^ imm32 200 201Also note that the division and modulo operations are unsigned. Thus, for 202``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas 203for ``BPF_ALU64``, 'imm' is first sign extended to 64 bits and the result 204interpreted as an unsigned 64-bit value. There are no instructions for 205signed division or modulo. 206 207Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31) 208for 32-bit operations. 209 210Byte swap instructions 211~~~~~~~~~~~~~~~~~~~~~~ 212 213The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit 214'code' field of ``BPF_END``. 215 216The byte swap instructions operate on the destination register 217only and do not use a separate source register or immediate value. 218 219The 1-bit source operand field in the opcode is used to select what byte 220order the operation convert from or to: 221 222========= ===== ================================================= 223source value description 224========= ===== ================================================= 225BPF_TO_LE 0x00 convert between host byte order and little endian 226BPF_TO_BE 0x08 convert between host byte order and big endian 227========= ===== ================================================= 228 229The 'imm' field encodes the width of the swap operations. The following widths 230are supported: 16, 32 and 64. 231 232Examples: 233 234``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means:: 235 236 dst = htole16(dst) 237 238``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means:: 239 240 dst = htobe64(dst) 241 242Jump instructions 243----------------- 244 245``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for 246otherwise identical operations. 247The 'code' field encodes the operation as below: 248 249======== ===== === =========================================== ========================================= 250code value src description notes 251======== ===== === =========================================== ========================================= 252BPF_JA 0x0 0x0 PC += offset BPF_JMP only 253BPF_JEQ 0x1 any PC += offset if dst == src 254BPF_JGT 0x2 any PC += offset if dst > src unsigned 255BPF_JGE 0x3 any PC += offset if dst >= src unsigned 256BPF_JSET 0x4 any PC += offset if dst & src 257BPF_JNE 0x5 any PC += offset if dst != src 258BPF_JSGT 0x6 any PC += offset if dst > src signed 259BPF_JSGE 0x7 any PC += offset if dst >= src signed 260BPF_CALL 0x8 0x0 call helper function by address see `Helper functions`_ 261BPF_CALL 0x8 0x1 call PC += offset see `Program-local functions`_ 262BPF_CALL 0x8 0x2 call helper function by BTF ID see `Helper functions`_ 263BPF_EXIT 0x9 0x0 return BPF_JMP only 264BPF_JLT 0xa any PC += offset if dst < src unsigned 265BPF_JLE 0xb any PC += offset if dst <= src unsigned 266BPF_JSLT 0xc any PC += offset if dst < src signed 267BPF_JSLE 0xd any PC += offset if dst <= src signed 268======== ===== === =========================================== ========================================= 269 270The eBPF program needs to store the return value into register R0 before doing a 271``BPF_EXIT``. 272 273Example: 274 275``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means:: 276 277 if (s32)dst s>= (s32)src goto +offset 278 279where 's>=' indicates a signed '>=' comparison. 280 281Helper functions 282~~~~~~~~~~~~~~~~ 283 284Helper functions are a concept whereby BPF programs can call into a 285set of function calls exposed by the underlying platform. 286 287Historically, each helper function was identified by an address 288encoded in the imm field. The available helper functions may differ 289for each program type, but address values are unique across all program types. 290 291Platforms that support the BPF Type Format (BTF) support identifying 292a helper function by a BTF ID encoded in the imm field, where the BTF ID 293identifies the helper name and type. 294 295Program-local functions 296~~~~~~~~~~~~~~~~~~~~~~~ 297Program-local functions are functions exposed by the same BPF program as the 298caller, and are referenced by offset from the call instruction, similar to 299``BPF_JA``. A ``BPF_EXIT`` within the program-local function will return to 300the caller. 301 302Load and store instructions 303=========================== 304 305For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the 3068-bit 'opcode' field is divided as: 307 308============ ====== ================= 3093 bits (MSB) 2 bits 3 bits (LSB) 310============ ====== ================= 311mode size instruction class 312============ ====== ================= 313 314The mode modifier is one of: 315 316 ============= ===== ==================================== ============= 317 mode modifier value description reference 318 ============= ===== ==================================== ============= 319 BPF_IMM 0x00 64-bit immediate instructions `64-bit immediate instructions`_ 320 BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ 321 BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ 322 BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_ 323 BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_ 324 ============= ===== ==================================== ============= 325 326The size modifier is one of: 327 328 ============= ===== ===================== 329 size modifier value description 330 ============= ===== ===================== 331 BPF_W 0x00 word (4 bytes) 332 BPF_H 0x08 half word (2 bytes) 333 BPF_B 0x10 byte 334 BPF_DW 0x18 double word (8 bytes) 335 ============= ===== ===================== 336 337Regular load and store operations 338--------------------------------- 339 340The ``BPF_MEM`` mode modifier is used to encode regular load and store 341instructions that transfer data between a register and memory. 342 343``BPF_MEM | <size> | BPF_STX`` means:: 344 345 *(size *) (dst + offset) = src 346 347``BPF_MEM | <size> | BPF_ST`` means:: 348 349 *(size *) (dst + offset) = imm32 350 351``BPF_MEM | <size> | BPF_LDX`` means:: 352 353 dst = *(size *) (src + offset) 354 355Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``. 356 357Atomic operations 358----------------- 359 360Atomic operations are operations that operate on memory and can not be 361interrupted or corrupted by other access to the same memory region 362by other eBPF programs or means outside of this specification. 363 364All atomic operations supported by eBPF are encoded as store operations 365that use the ``BPF_ATOMIC`` mode modifier as follows: 366 367* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations 368* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations 369* 8-bit and 16-bit wide atomic operations are not supported. 370 371The 'imm' field is used to encode the actual atomic operation. 372Simple atomic operation use a subset of the values defined to encode 373arithmetic operations in the 'imm' field to encode the atomic operation: 374 375======== ===== =========== 376imm value description 377======== ===== =========== 378BPF_ADD 0x00 atomic add 379BPF_OR 0x40 atomic or 380BPF_AND 0x50 atomic and 381BPF_XOR 0xa0 atomic xor 382======== ===== =========== 383 384 385``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means:: 386 387 *(u32 *)(dst + offset) += src 388 389``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means:: 390 391 *(u64 *)(dst + offset) += src 392 393In addition to the simple atomic operations, there also is a modifier and 394two complex atomic operations: 395 396=========== ================ =========================== 397imm value description 398=========== ================ =========================== 399BPF_FETCH 0x01 modifier: return old value 400BPF_XCHG 0xe0 | BPF_FETCH atomic exchange 401BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange 402=========== ================ =========================== 403 404The ``BPF_FETCH`` modifier is optional for simple atomic operations, and 405always set for the complex atomic operations. If the ``BPF_FETCH`` flag 406is set, then the operation also overwrites ``src`` with the value that 407was in memory before it was modified. 408 409The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value 410addressed by ``dst + offset``. 411 412The ``BPF_CMPXCHG`` operation atomically compares the value addressed by 413``dst + offset`` with ``R0``. If they match, the value addressed by 414``dst + offset`` is replaced with ``src``. In either case, the 415value that was at ``dst + offset`` before the operation is zero-extended 416and loaded back to ``R0``. 417 41864-bit immediate instructions 419----------------------------- 420 421Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction 422encoding defined in `Instruction encoding`_, and use the 'src' field of the 423basic instruction to hold an opcode subtype. 424 425The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD`` instructions 426with opcode subtypes in the 'src' field, using new terms such as "map" 427defined further below: 428 429========================= ====== === ========================================= =========== ============== 430opcode construction opcode src pseudocode imm type dst type 431========================= ====== === ========================================= =========== ============== 432BPF_IMM | BPF_DW | BPF_LD 0x18 0x0 dst = imm64 integer integer 433BPF_IMM | BPF_DW | BPF_LD 0x18 0x1 dst = map_by_fd(imm) map fd map 434BPF_IMM | BPF_DW | BPF_LD 0x18 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer 435BPF_IMM | BPF_DW | BPF_LD 0x18 0x3 dst = var_addr(imm) variable id data pointer 436BPF_IMM | BPF_DW | BPF_LD 0x18 0x4 dst = code_addr(imm) integer code pointer 437BPF_IMM | BPF_DW | BPF_LD 0x18 0x5 dst = map_by_idx(imm) map index map 438BPF_IMM | BPF_DW | BPF_LD 0x18 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer 439========================= ====== === ========================================= =========== ============== 440 441where 442 443* map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_) 444* map_by_idx(imm) means to convert a 32-bit index into an address of a map 445* map_val(map) gets the address of the first value in a given map 446* var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id 447* code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions 448* the 'imm type' can be used by disassemblers for display 449* the 'dst type' can be used for verification and JIT compilation purposes 450 451Maps 452~~~~ 453 454Maps are shared memory regions accessible by eBPF programs on some platforms. 455A map can have various semantics as defined in a separate document, and may or 456may not have a single contiguous memory region, but the 'map_val(map)' is 457currently only defined for maps that do have a single contiguous memory region. 458 459Each map can have a file descriptor (fd) if supported by the platform, where 460'map_by_fd(imm)' means to get the map with the specified file descriptor. Each 461BPF program can also be defined to use a set of maps associated with the 462program at load time, and 'map_by_idx(imm)' means to get the map with the given 463index in the set associated with the BPF program containing the instruction. 464 465Platform Variables 466~~~~~~~~~~~~~~~~~~ 467 468Platform variables are memory regions, identified by integer ids, exposed by 469the runtime and accessible by BPF programs on some platforms. The 470'var_addr(imm)' operation means to get the address of the memory region 471identified by the given id. 472 473Legacy BPF Packet access instructions 474------------------------------------- 475 476eBPF previously introduced special instructions for access to packet data that were 477carried over from classic BPF. However, these instructions are 478deprecated and should no longer be used. 479