1.. contents:: 2.. sectnum:: 3 4======================================= 5BPF Instruction Set Specification, v1.0 6======================================= 7 8This document specifies version 1.0 of the BPF instruction set. 9 10Documentation conventions 11========================= 12 13For brevity and consistency, this document refers to families 14of types using a shorthand syntax and refers to several expository, 15mnemonic functions when describing the semantics of instructions. 16The range of valid values for those types and the semantics of those 17functions are defined in the following subsections. 18 19Types 20----- 21This document refers to integer types with the notation `SN` to specify 22a type's signedness (`S`) and bit width (`N`), respectively. 23 24.. table:: Meaning of signedness notation. 25 26 ==== ========= 27 `S` Meaning 28 ==== ========= 29 `u` unsigned 30 `s` signed 31 ==== ========= 32 33.. table:: Meaning of bit-width notation. 34 35 ===== ========= 36 `N` Bit width 37 ===== ========= 38 `8` 8 bits 39 `16` 16 bits 40 `32` 32 bits 41 `64` 64 bits 42 `128` 128 bits 43 ===== ========= 44 45For example, `u32` is a type whose valid values are all the 32-bit unsigned 46numbers and `s16` is a types whose valid values are all the 16-bit signed 47numbers. 48 49Functions 50--------- 51* `htobe16`: Takes an unsigned 16-bit number in host-endian format and 52 returns the equivalent number as an unsigned 16-bit number in big-endian 53 format. 54* `htobe32`: Takes an unsigned 32-bit number in host-endian format and 55 returns the equivalent number as an unsigned 32-bit number in big-endian 56 format. 57* `htobe64`: Takes an unsigned 64-bit number in host-endian format and 58 returns the equivalent number as an unsigned 64-bit number in big-endian 59 format. 60* `htole16`: Takes an unsigned 16-bit number in host-endian format and 61 returns the equivalent number as an unsigned 16-bit number in little-endian 62 format. 63* `htole32`: Takes an unsigned 32-bit number in host-endian format and 64 returns the equivalent number as an unsigned 32-bit number in little-endian 65 format. 66* `htole64`: Takes an unsigned 64-bit number in host-endian format and 67 returns the equivalent number as an unsigned 64-bit number in little-endian 68 format. 69* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian 70 format and returns the equivalent number with the same bit width but 71 opposite endianness. 72* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian 73 format and returns the equivalent number with the same bit width but 74 opposite endianness. 75* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian 76 format and returns the equivalent number with the same bit width but 77 opposite endianness. 78 79 80Definitions 81----------- 82 83.. glossary:: 84 85 Sign Extend 86 To `sign extend an` ``X`` `-bit number, A, to a` ``Y`` `-bit number, B ,` means to 87 88 #. Copy all ``X`` bits from `A` to the lower ``X`` bits of `B`. 89 #. Set the value of the remaining ``Y`` - ``X`` bits of `B` to the value of 90 the most-significant bit of `A`. 91 92.. admonition:: Example 93 94 Sign extend an 8-bit number ``A`` to a 16-bit number ``B`` on a big-endian platform: 95 :: 96 97 A: 10000110 98 B: 11111111 10000110 99 100Instruction encoding 101==================== 102 103BPF has two instruction encodings: 104 105* the basic instruction encoding, which uses 64 bits to encode an instruction 106* the wide instruction encoding, which appends a second 64-bit immediate (i.e., 107 constant) value after the basic instruction for a total of 128 bits. 108 109The fields conforming an encoded basic instruction are stored in the 110following order:: 111 112 opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF. 113 opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF. 114 115**imm** 116 signed integer immediate value 117 118**offset** 119 signed integer offset used with pointer arithmetic 120 121**src_reg** 122 the source register number (0-10), except where otherwise specified 123 (`64-bit immediate instructions`_ reuse this field for other purposes) 124 125**dst_reg** 126 destination register number (0-10) 127 128**opcode** 129 operation to perform 130 131Note that the contents of multi-byte fields ('imm' and 'offset') are 132stored using big-endian byte ordering in big-endian BPF and 133little-endian byte ordering in little-endian BPF. 134 135For example:: 136 137 opcode offset imm assembly 138 src_reg dst_reg 139 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little 140 dst_reg src_reg 141 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big 142 143Note that most instructions do not use all of the fields. 144Unused fields shall be cleared to zero. 145 146As discussed below in `64-bit immediate instructions`_, a 64-bit immediate 147instruction uses a 64-bit immediate value that is constructed as follows. 148The 64 bits following the basic instruction contain a pseudo instruction 149using the same format but with opcode, dst_reg, src_reg, and offset all set to zero, 150and imm containing the high 32 bits of the immediate value. 151 152This is depicted in the following figure:: 153 154 basic_instruction 155 .-----------------------------. 156 | | 157 code:8 regs:8 offset:16 imm:32 unused:32 imm:32 158 | | 159 '--------------' 160 pseudo instruction 161 162Thus the 64-bit immediate value is constructed as follows: 163 164 imm64 = (next_imm << 32) | imm 165 166where 'next_imm' refers to the imm value of the pseudo instruction 167following the basic instruction. The unused bytes in the pseudo 168instruction are reserved and shall be cleared to zero. 169 170Instruction classes 171------------------- 172 173The three LSB bits of the 'opcode' field store the instruction class: 174 175========= ===== =============================== =================================== 176class value description reference 177========= ===== =============================== =================================== 178BPF_LD 0x00 non-standard load operations `Load and store instructions`_ 179BPF_LDX 0x01 load into register operations `Load and store instructions`_ 180BPF_ST 0x02 store from immediate operations `Load and store instructions`_ 181BPF_STX 0x03 store from register operations `Load and store instructions`_ 182BPF_ALU 0x04 32-bit arithmetic operations `Arithmetic and jump instructions`_ 183BPF_JMP 0x05 64-bit jump operations `Arithmetic and jump instructions`_ 184BPF_JMP32 0x06 32-bit jump operations `Arithmetic and jump instructions`_ 185BPF_ALU64 0x07 64-bit arithmetic operations `Arithmetic and jump instructions`_ 186========= ===== =============================== =================================== 187 188Arithmetic and jump instructions 189================================ 190 191For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` and 192``BPF_JMP32``), the 8-bit 'opcode' field is divided into three parts: 193 194============== ====== ================= 1954 bits (MSB) 1 bit 3 bits (LSB) 196============== ====== ================= 197code source instruction class 198============== ====== ================= 199 200**code** 201 the operation code, whose meaning varies by instruction class 202 203**source** 204 the source operand location, which unless otherwise specified is one of: 205 206 ====== ===== ============================================== 207 source value description 208 ====== ===== ============================================== 209 BPF_K 0x00 use 32-bit 'imm' value as source operand 210 BPF_X 0x08 use 'src_reg' register value as source operand 211 ====== ===== ============================================== 212 213**instruction class** 214 the instruction class (see `Instruction classes`_) 215 216Arithmetic instructions 217----------------------- 218 219``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for 220otherwise identical operations. 221The 'code' field encodes the operation as below, where 'src' and 'dst' refer 222to the values of the source and destination registers, respectively. 223 224========= ===== ======= ========================================================== 225code value offset description 226========= ===== ======= ========================================================== 227BPF_ADD 0x00 0 dst += src 228BPF_SUB 0x10 0 dst -= src 229BPF_MUL 0x20 0 dst \*= src 230BPF_DIV 0x30 0 dst = (src != 0) ? (dst / src) : 0 231BPF_SDIV 0x30 1 dst = (src != 0) ? (dst s/ src) : 0 232BPF_OR 0x40 0 dst \|= src 233BPF_AND 0x50 0 dst &= src 234BPF_LSH 0x60 0 dst <<= (src & mask) 235BPF_RSH 0x70 0 dst >>= (src & mask) 236BPF_NEG 0x80 0 dst = -dst 237BPF_MOD 0x90 0 dst = (src != 0) ? (dst % src) : dst 238BPF_SMOD 0x90 1 dst = (src != 0) ? (dst s% src) : dst 239BPF_XOR 0xa0 0 dst ^= src 240BPF_MOV 0xb0 0 dst = src 241BPF_MOVSX 0xb0 8/16/32 dst = (s8,s16,s32)src 242BPF_ARSH 0xc0 0 :term:`sign extending<Sign Extend>` dst >>= (src & mask) 243BPF_END 0xd0 0 byte swap operations (see `Byte swap instructions`_ below) 244========= ===== ======= ========================================================== 245 246Underflow and overflow are allowed during arithmetic operations, meaning 247the 64-bit or 32-bit value will wrap. If BPF program execution would 248result in division by zero, the destination register is instead set to zero. 249If execution would result in modulo by zero, for ``BPF_ALU64`` the value of 250the destination register is unchanged whereas for ``BPF_ALU`` the upper 25132 bits of the destination register are zeroed. 252 253``BPF_ADD | BPF_X | BPF_ALU`` means:: 254 255 dst = (u32) ((u32) dst + (u32) src) 256 257where '(u32)' indicates that the upper 32 bits are zeroed. 258 259``BPF_ADD | BPF_X | BPF_ALU64`` means:: 260 261 dst = dst + src 262 263``BPF_XOR | BPF_K | BPF_ALU`` means:: 264 265 dst = (u32) dst ^ (u32) imm32 266 267``BPF_XOR | BPF_K | BPF_ALU64`` means:: 268 269 dst = dst ^ imm32 270 271Note that most instructions have instruction offset of 0. Only three instructions 272(``BPF_SDIV``, ``BPF_SMOD``, ``BPF_MOVSX``) have a non-zero offset. 273 274The division and modulo operations support both unsigned and signed flavors. 275 276For unsigned operations (``BPF_DIV`` and ``BPF_MOD``), for ``BPF_ALU``, 277'imm' is interpreted as a 32-bit unsigned value. For ``BPF_ALU64``, 278'imm' is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then 279interpreted as a 64-bit unsigned value. 280 281For signed operations (``BPF_SDIV`` and ``BPF_SMOD``), for ``BPF_ALU``, 282'imm' is interpreted as a 32-bit signed value. For ``BPF_ALU64``, 'imm' 283is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then 284interpreted as a 64-bit signed value. 285 286The ``BPF_MOVSX`` instruction does a move operation with sign extension. 287``BPF_ALU | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32 288bit operands, and zeroes the remaining upper 32 bits. 289``BPF_ALU64 | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit 290operands into 64 bit operands. 291 292Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31) 293for 32-bit operations. 294 295Byte swap instructions 296---------------------- 297 298The byte swap instructions use instruction classes of ``BPF_ALU`` and ``BPF_ALU64`` 299and a 4-bit 'code' field of ``BPF_END``. 300 301The byte swap instructions operate on the destination register 302only and do not use a separate source register or immediate value. 303 304For ``BPF_ALU``, the 1-bit source operand field in the opcode is used to 305select what byte order the operation converts from or to. For 306``BPF_ALU64``, the 1-bit source operand field in the opcode is reserved 307and must be set to 0. 308 309========= ========= ===== ================================================= 310class source value description 311========= ========= ===== ================================================= 312BPF_ALU BPF_TO_LE 0x00 convert between host byte order and little endian 313BPF_ALU BPF_TO_BE 0x08 convert between host byte order and big endian 314BPF_ALU64 Reserved 0x00 do byte swap unconditionally 315========= ========= ===== ================================================= 316 317The 'imm' field encodes the width of the swap operations. The following widths 318are supported: 16, 32 and 64. 319 320Examples: 321 322``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: 323 324 dst = htole16(dst) 325 dst = htole32(dst) 326 dst = htole64(dst) 327 328``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means:: 329 330 dst = htobe16(dst) 331 dst = htobe32(dst) 332 dst = htobe64(dst) 333 334``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means:: 335 336 dst = bswap16(dst) 337 dst = bswap32(dst) 338 dst = bswap64(dst) 339 340Jump instructions 341----------------- 342 343``BPF_JMP32`` uses 32-bit wide operands while ``BPF_JMP`` uses 64-bit wide operands for 344otherwise identical operations. 345The 'code' field encodes the operation as below: 346 347======== ===== === =========================================== ========================================= 348code value src description notes 349======== ===== === =========================================== ========================================= 350BPF_JA 0x0 0x0 PC += offset BPF_JMP class 351BPF_JA 0x0 0x0 PC += imm BPF_JMP32 class 352BPF_JEQ 0x1 any PC += offset if dst == src 353BPF_JGT 0x2 any PC += offset if dst > src unsigned 354BPF_JGE 0x3 any PC += offset if dst >= src unsigned 355BPF_JSET 0x4 any PC += offset if dst & src 356BPF_JNE 0x5 any PC += offset if dst != src 357BPF_JSGT 0x6 any PC += offset if dst > src signed 358BPF_JSGE 0x7 any PC += offset if dst >= src signed 359BPF_CALL 0x8 0x0 call helper function by address see `Helper functions`_ 360BPF_CALL 0x8 0x1 call PC += imm see `Program-local functions`_ 361BPF_CALL 0x8 0x2 call helper function by BTF ID see `Helper functions`_ 362BPF_EXIT 0x9 0x0 return BPF_JMP only 363BPF_JLT 0xa any PC += offset if dst < src unsigned 364BPF_JLE 0xb any PC += offset if dst <= src unsigned 365BPF_JSLT 0xc any PC += offset if dst < src signed 366BPF_JSLE 0xd any PC += offset if dst <= src signed 367======== ===== === =========================================== ========================================= 368 369The BPF program needs to store the return value into register R0 before doing a 370``BPF_EXIT``. 371 372Example: 373 374``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means:: 375 376 if (s32)dst s>= (s32)src goto +offset 377 378where 's>=' indicates a signed '>=' comparison. 379 380``BPF_JA | BPF_K | BPF_JMP32`` (0x06) means:: 381 382 gotol +imm 383 384where 'imm' means the branch offset comes from insn 'imm' field. 385 386Note that there are two flavors of ``BPF_JA`` instructions. The 387``BPF_JMP`` class permits a 16-bit jump offset specified by the 'offset' 388field, whereas the ``BPF_JMP32`` class permits a 32-bit jump offset 389specified by the 'imm' field. A > 16-bit conditional jump may be 390converted to a < 16-bit conditional jump plus a 32-bit unconditional 391jump. 392 393Helper functions 394~~~~~~~~~~~~~~~~ 395 396Helper functions are a concept whereby BPF programs can call into a 397set of function calls exposed by the underlying platform. 398 399Historically, each helper function was identified by an address 400encoded in the imm field. The available helper functions may differ 401for each program type, but address values are unique across all program types. 402 403Platforms that support the BPF Type Format (BTF) support identifying 404a helper function by a BTF ID encoded in the imm field, where the BTF ID 405identifies the helper name and type. 406 407Program-local functions 408~~~~~~~~~~~~~~~~~~~~~~~ 409Program-local functions are functions exposed by the same BPF program as the 410caller, and are referenced by offset from the call instruction, similar to 411``BPF_JA``. The offset is encoded in the imm field of the call instruction. 412A ``BPF_EXIT`` within the program-local function will return to the caller. 413 414Load and store instructions 415=========================== 416 417For load and store instructions (``BPF_LD``, ``BPF_LDX``, ``BPF_ST``, and ``BPF_STX``), the 4188-bit 'opcode' field is divided as: 419 420============ ====== ================= 4213 bits (MSB) 2 bits 3 bits (LSB) 422============ ====== ================= 423mode size instruction class 424============ ====== ================= 425 426The mode modifier is one of: 427 428 ============= ===== ==================================== ============= 429 mode modifier value description reference 430 ============= ===== ==================================== ============= 431 BPF_IMM 0x00 64-bit immediate instructions `64-bit immediate instructions`_ 432 BPF_ABS 0x20 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ 433 BPF_IND 0x40 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ 434 BPF_MEM 0x60 regular load and store operations `Regular load and store operations`_ 435 BPF_MEMSX 0x80 sign-extension load operations `Sign-extension load operations`_ 436 BPF_ATOMIC 0xc0 atomic operations `Atomic operations`_ 437 ============= ===== ==================================== ============= 438 439The size modifier is one of: 440 441 ============= ===== ===================== 442 size modifier value description 443 ============= ===== ===================== 444 BPF_W 0x00 word (4 bytes) 445 BPF_H 0x08 half word (2 bytes) 446 BPF_B 0x10 byte 447 BPF_DW 0x18 double word (8 bytes) 448 ============= ===== ===================== 449 450Regular load and store operations 451--------------------------------- 452 453The ``BPF_MEM`` mode modifier is used to encode regular load and store 454instructions that transfer data between a register and memory. 455 456``BPF_MEM | <size> | BPF_STX`` means:: 457 458 *(size *) (dst + offset) = src 459 460``BPF_MEM | <size> | BPF_ST`` means:: 461 462 *(size *) (dst + offset) = imm32 463 464``BPF_MEM | <size> | BPF_LDX`` means:: 465 466 dst = *(unsigned size *) (src + offset) 467 468Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW`` and 469'unsigned size' is one of u8, u16, u32 or u64. 470 471Sign-extension load operations 472------------------------------ 473 474The ``BPF_MEMSX`` mode modifier is used to encode :term:`sign-extension<Sign Extend>` load 475instructions that transfer data between a register and memory. 476 477``BPF_MEMSX | <size> | BPF_LDX`` means:: 478 479 dst = *(signed size *) (src + offset) 480 481Where size is one of: ``BPF_B``, ``BPF_H`` or ``BPF_W``, and 482'signed size' is one of s8, s16 or s32. 483 484Atomic operations 485----------------- 486 487Atomic operations are operations that operate on memory and can not be 488interrupted or corrupted by other access to the same memory region 489by other BPF programs or means outside of this specification. 490 491All atomic operations supported by BPF are encoded as store operations 492that use the ``BPF_ATOMIC`` mode modifier as follows: 493 494* ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations 495* ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations 496* 8-bit and 16-bit wide atomic operations are not supported. 497 498The 'imm' field is used to encode the actual atomic operation. 499Simple atomic operation use a subset of the values defined to encode 500arithmetic operations in the 'imm' field to encode the atomic operation: 501 502======== ===== =========== 503imm value description 504======== ===== =========== 505BPF_ADD 0x00 atomic add 506BPF_OR 0x40 atomic or 507BPF_AND 0x50 atomic and 508BPF_XOR 0xa0 atomic xor 509======== ===== =========== 510 511 512``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means:: 513 514 *(u32 *)(dst + offset) += src 515 516``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means:: 517 518 *(u64 *)(dst + offset) += src 519 520In addition to the simple atomic operations, there also is a modifier and 521two complex atomic operations: 522 523=========== ================ =========================== 524imm value description 525=========== ================ =========================== 526BPF_FETCH 0x01 modifier: return old value 527BPF_XCHG 0xe0 | BPF_FETCH atomic exchange 528BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange 529=========== ================ =========================== 530 531The ``BPF_FETCH`` modifier is optional for simple atomic operations, and 532always set for the complex atomic operations. If the ``BPF_FETCH`` flag 533is set, then the operation also overwrites ``src`` with the value that 534was in memory before it was modified. 535 536The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value 537addressed by ``dst + offset``. 538 539The ``BPF_CMPXCHG`` operation atomically compares the value addressed by 540``dst + offset`` with ``R0``. If they match, the value addressed by 541``dst + offset`` is replaced with ``src``. In either case, the 542value that was at ``dst + offset`` before the operation is zero-extended 543and loaded back to ``R0``. 544 54564-bit immediate instructions 546----------------------------- 547 548Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction 549encoding defined in `Instruction encoding`_, and use the 'src' field of the 550basic instruction to hold an opcode subtype. 551 552The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD`` instructions 553with opcode subtypes in the 'src' field, using new terms such as "map" 554defined further below: 555 556========================= ====== === ========================================= =========== ============== 557opcode construction opcode src pseudocode imm type dst type 558========================= ====== === ========================================= =========== ============== 559BPF_IMM | BPF_DW | BPF_LD 0x18 0x0 dst = imm64 integer integer 560BPF_IMM | BPF_DW | BPF_LD 0x18 0x1 dst = map_by_fd(imm) map fd map 561BPF_IMM | BPF_DW | BPF_LD 0x18 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer 562BPF_IMM | BPF_DW | BPF_LD 0x18 0x3 dst = var_addr(imm) variable id data pointer 563BPF_IMM | BPF_DW | BPF_LD 0x18 0x4 dst = code_addr(imm) integer code pointer 564BPF_IMM | BPF_DW | BPF_LD 0x18 0x5 dst = map_by_idx(imm) map index map 565BPF_IMM | BPF_DW | BPF_LD 0x18 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer 566========================= ====== === ========================================= =========== ============== 567 568where 569 570* map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_) 571* map_by_idx(imm) means to convert a 32-bit index into an address of a map 572* map_val(map) gets the address of the first value in a given map 573* var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id 574* code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions 575* the 'imm type' can be used by disassemblers for display 576* the 'dst type' can be used for verification and JIT compilation purposes 577 578Maps 579~~~~ 580 581Maps are shared memory regions accessible by BPF programs on some platforms. 582A map can have various semantics as defined in a separate document, and may or 583may not have a single contiguous memory region, but the 'map_val(map)' is 584currently only defined for maps that do have a single contiguous memory region. 585 586Each map can have a file descriptor (fd) if supported by the platform, where 587'map_by_fd(imm)' means to get the map with the specified file descriptor. Each 588BPF program can also be defined to use a set of maps associated with the 589program at load time, and 'map_by_idx(imm)' means to get the map with the given 590index in the set associated with the BPF program containing the instruction. 591 592Platform Variables 593~~~~~~~~~~~~~~~~~~ 594 595Platform variables are memory regions, identified by integer ids, exposed by 596the runtime and accessible by BPF programs on some platforms. The 597'var_addr(imm)' operation means to get the address of the memory region 598identified by the given id. 599 600Legacy BPF Packet access instructions 601------------------------------------- 602 603BPF previously introduced special instructions for access to packet data that were 604carried over from classic BPF. However, these instructions are 605deprecated and should no longer be used. 606