#
1a7e62e6 |
| 07-Oct-2018 |
Quentin Monnet <quentin.monnet@netronome.com> |
nfp: bpf: rename nfp_prog->stack_depth as nfp_prog->stack_frame_depth
In preparation for support for BPF to BPF calls in offloaded programs, rename the "stack_depth" field of the struct nfp_prog as
nfp: bpf: rename nfp_prog->stack_depth as nfp_prog->stack_frame_depth
In preparation for support for BPF to BPF calls in offloaded programs, rename the "stack_depth" field of the struct nfp_prog as "stack_frame_depth". This is to make it clear that the field refers to the maximum size of the current stack frame (as opposed to the maximum size of the whole stack memory).
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
Revision tags: v4.18.12, v4.18.11, v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13 |
|
#
0c261593 |
| 04-Aug-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: xdp_adjust_tail support
Add support for adjust_tail. There are no FW changes needed but add a FW capability just in case there would be any issue with previously released FW, or we will h
nfp: bpf: xdp_adjust_tail support
Add support for adjust_tail. There are no FW changes needed but add a FW capability just in case there would be any issue with previously released FW, or we will have to change the ABI in the future.
The helper is trivial and shouldn't be used too often so just inline the body of the function. We add the delta to locally maintained packet length register and check for overflow, since add of negative value must overflow if result is positive. Note that if delta of 0 would be allowed in the kernel this trick stops working and we need one more instruction to compare lengths before and after the change.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
Revision tags: v4.17.12, v4.17.11 |
|
#
ab01f4ac |
| 25-Jul-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: remember maps by ID
Record perf maps by map ID, not raw kernel pointer. This helps with debug messages, because printing pointers to logs is frowned upon, and makes debug easier for the u
nfp: bpf: remember maps by ID
Record perf maps by map ID, not raw kernel pointer. This helps with debug messages, because printing pointers to logs is frowned upon, and makes debug easier for the users, as map ID is something they should be more familiar with. Note that perf maps are offload neutral, therefore IDs won't be orphaned.
While at it use a rate limited print helper for the error message.
Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
Revision tags: v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5 |
|
#
9fb410a8 |
| 06-Jul-2018 |
Jiong Wang <jiong.wang@netronome.com> |
nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h
As we are doing JIT, we would want to use the advanced version of the reciprocal divide (reciprocal_value_adv) to trade performanc
nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h
As we are doing JIT, we would want to use the advanced version of the reciprocal divide (reciprocal_value_adv) to trade performance with host.
We could reduce the required ALU instructions from 4 to 2 or 1.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
2a952b03 |
| 06-Jul-2018 |
Jiong Wang <jiong.wang@netronome.com> |
nfp: bpf: support u32 divide using reciprocal_div.h
NFP doesn't have integer divide instruction, this patch use reciprocal algorithm (the basic one, reciprocal_div) to emulate it.
For each u32 divi
nfp: bpf: support u32 divide using reciprocal_div.h
NFP doesn't have integer divide instruction, this patch use reciprocal algorithm (the basic one, reciprocal_div) to emulate it.
For each u32 divide, we would need 11 instructions to finish the operation.
7 (for multiplication) + 4 (various ALUs) = 11
Given NFP only supports multiplication no bigger than u32, we'd require divisor and dividend no bigger than that as well.
Also eBPF doesn't support signed divide and has enforced this on C language level by failing compilation. However LLVM assembler hasn't enforced this, so it is possible for negative constant to leak in as a BPF_K operand through assembly code, we reject such cases as well.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
d3d23fdb |
| 06-Jul-2018 |
Jiong Wang <jiong.wang@netronome.com> |
nfp: bpf: support u16 and u32 multiplications
NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per step, therefore we need 2 steps for u16 and 4 steps for u32.
We also need on
nfp: bpf: support u16 and u32 multiplications
NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per step, therefore we need 2 steps for u16 and 4 steps for u32.
We also need one start instruction to initialize the sequence and one or two instructions to fetch the result depending on either you need the high halve of u32 multiplication.
For ALU64, if either operand is beyond u32's value range, we reject it. One thing to note, if the source operand is BPF_K, then we need to check "imm" field directly, and we'd reject it if it is negative. Because for ALU64, "imm" (with s32 type) is expected to be sign extended to s64 which NFP mul doesn't support. For ALU32, it is fine for "imm" be negative though, because the result is 32-bits and here is no difference on the low halve of result for signed/unsigned mul, so we will get correct result.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
662c5472 |
| 06-Jul-2018 |
Jiong Wang <jiong.wang@netronome.com> |
nfp: bpf: rename umin/umax to umin_src/umax_src
The two fields are a copy of umin and umax info of bpf_insn->src_reg generated by verifier.
Rename to make their meaning clear.
Signed-off-by: Jiong
nfp: bpf: rename umin/umax to umin_src/umax_src
The two fields are a copy of umin and umax info of bpf_insn->src_reg generated by verifier.
Rename to make their meaning clear.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
Revision tags: v4.17.4 |
|
#
cc0dff6d |
| 26-Jun-2018 |
Jiong Wang <jiong.wang@netronome.com> |
nfp: bpf: allow source ptr type be map ptr in memcpy optimization
Map read has been supported on NFP, this patch enables optimization for memcpy from map to packet.
This patch also fixed one latent
nfp: bpf: allow source ptr type be map ptr in memcpy optimization
Map read has been supported on NFP, this patch enables optimization for memcpy from map to packet.
This patch also fixed one latent bug which will cause copying from unexpected address once memcpy for map pointer enabled. The fixed code path was not exercised before.
Reported-by: Mary Pham <mary.pham@netronome.com> Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
Revision tags: v4.17.3, v4.17.2, v4.17.1, v4.17 |
|
#
c217abcc |
| 18-May-2018 |
Jiong Wang <jiong.wang@netronome.com> |
nfp: bpf: support arithmetic indirect right shift (BPF_ARSH | BPF_X)
Code logic is similar with arithmetic right shift by constant, and NFP get indirect shift amount through source A operand of PREV
nfp: bpf: support arithmetic indirect right shift (BPF_ARSH | BPF_X)
Code logic is similar with arithmetic right shift by constant, and NFP get indirect shift amount through source A operand of PREV_ALU.
It is possible to fall back to logic right shift if the MSB is known to be zero from range info, however there is no benefit to do this given logic indirect right shift use the same number and cycle of instruction sequence.
Suppose the MSB of regX is the bit we want to replicate to fill in all the vacant positions, and regY contains the shift amount, then we could use single instruction to set up both.
[alu, --, regY, OR, regX]
-- NOTE: the PREV_ALU result doesn't need to write to any destination register.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
f43d0f17 |
| 18-May-2018 |
Jiong Wang <jiong.wang@netronome.com> |
nfp: bpf: support arithmetic right shift by constant (BPF_ARSH | BPF_K)
Code logic is similar with logic right shift except we also need to set PREV_ALU result properly, the MSB of which is the bit
nfp: bpf: support arithmetic right shift by constant (BPF_ARSH | BPF_K)
Code logic is similar with logic right shift except we also need to set PREV_ALU result properly, the MSB of which is the bit that will be replicated to fill in all the vacant positions.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
991f5b36 |
| 18-May-2018 |
Jiong Wang <jiong.wang@netronome.com> |
nfp: bpf: support logic indirect shifts (BPF_[L|R]SH | BPF_X)
For indirect shifts, shift amount is not specified as constant, NFP needs to get the shift amount through the low 5 bits of source A ope
nfp: bpf: support logic indirect shifts (BPF_[L|R]SH | BPF_X)
For indirect shifts, shift amount is not specified as constant, NFP needs to get the shift amount through the low 5 bits of source A operand in PREV_ALU, therefore extra instructions are needed compared with shifts by constants.
Because NFP is 32-bit, so we are using register pair for 64-bit shifts and therefore would need different instruction sequences depending on whether shift amount is less than 32 or not.
NFP branch-on-bit-test instruction emitter is added by this patch and is used for efficient runtime check on shift amount. We'd think the shift amount is less than 32 if bit 5 is clear and greater or equal than 32 otherwise. Shift amount is greater than or equal to 64 will result in undefined behavior.
This patch also use range info to avoid generating unnecessary runtime code if we are certain shift amount is less than 32 or not.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
d985888f |
| 08-May-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: support setting the RX queue index
BPF has access to all internal FW datapath structures. Including the structure containing RX queue selection. With little coordination with the datapat
nfp: bpf: support setting the RX queue index
BPF has access to all internal FW datapath structures. Including the structure containing RX queue selection. With little coordination with the datapath we can let the offloaded BPF select the RX queue. We just need a way to tell the datapath that queue selection has already been done and it shouldn't overwrite it. Define a bit to tell datapath BPF already selected a queue (QSEL_SET), if the selected queue is not enabled (>= number of enabled queues) datapath will perform normal RSS.
BPF queue selection on the NIC can be used to replace standard datapath RSS with fully programmable BPF/XDP RSS.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
b4264c96 |
| 03-May-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: rewrite map pointers with NFP TIDs
Kernel will now replace map fds with actual pointer before calling the offload prepare. We can identify those pointers and replace them with NFP table I
nfp: bpf: rewrite map pointers with NFP TIDs
Kernel will now replace map fds with actual pointer before calling the offload prepare. We can identify those pointers and replace them with NFP table IDs instead of loading the table ID in code generated for CALL instruction.
This allows us to support having the same CALL being used with different maps.
Since we don't want to change the FW ABI we still need to move the TID from R1 to portion of R0 before the jump.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
9816dd35 |
| 03-May-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: perf event output helpers support
Add support for the perf_event_output family of helpers.
The implementation on the NFP will not match the host code exactly. The state of the host map an
nfp: bpf: perf event output helpers support
Add support for the perf_event_output family of helpers.
The implementation on the NFP will not match the host code exactly. The state of the host map and rings is unknown to the device, hence device can't return errors when rings are not installed. The device simply packs the data into a firmware notification message and sends it over to the host, returning success to the program.
There is no notion of a host CPU on the device when packets are being processed. Device will only offload programs which set BPF_F_CURRENT_CPU. Still, if map index doesn't match CPU no error will be returned (see above).
Dropped/lost firmware notification messages will not cause "lost events" event on the perf ring, they are only visible via device error counters.
Firmware notification messages may also get reordered in respect to the packets which caused their generation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
7bdc97be |
| 24-Apr-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: optimize comparisons to negative constants
Comparison instruction requires a subtraction. If the constant is negative we are more likely to fit it into a NFP instruction directly if we ch
nfp: bpf: optimize comparisons to negative constants
Comparison instruction requires a subtraction. If the constant is negative we are more likely to fit it into a NFP instruction directly if we change the sign and use addition.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
61dd8f00 |
| 24-Apr-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: tabularize generations of compare operations
There are quite a few compare instructions now, use a table to translate BPF instruction code to NFP instruction parameters instead of paramete
nfp: bpf: tabularize generations of compare operations
There are quite a few compare instructions now, use a table to translate BPF instruction code to NFP instruction parameters instead of parameterizing helpers. This saves LOC and makes future extensions easier.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
6c59500c |
| 24-Apr-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: optimize add/sub of a negative constant
NFP instruction set can fit small immediates into the instruction. Negative integers, however, will never fit because they will have highest bit set
nfp: bpf: optimize add/sub of a negative constant
NFP instruction set can fit small immediates into the instruction. Negative integers, however, will never fit because they will have highest bit set. If we swap the ALU op between ADD and SUB and negate the constant we have a better chance of fitting small negative integers into the instruction itself and saving one or two cycles.
immed[gprB_21, 0xfffffffc] alu[gprA_4, gprA_4, +, gprB_21], gpr_wrboth immed[gprB_21, 0xffffffff] alu[gprA_5, gprA_5, +carry, gprB_21], gpr_wrboth
now becomes:
alu[gprA_4, gprA_4, -, 4], gpr_wrboth alu[gprA_5, gprA_5, -carry, 0], gpr_wrboth
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
9c9e5323 |
| 24-Apr-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: remove double space
Whitespace cleanup - remove double space.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
Revision tags: v4.16 |
|
#
df4a37d8 |
| 28-Mar-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add support for bpf_get_prandom_u32()
NFP has a prng register, which we can read to obtain a u32 worth of pseudo random data. Generate code for it.
Signed-off-by: Jakub Kicinski <jakub.k
nfp: bpf: add support for bpf_get_prandom_u32()
NFP has a prng register, which we can read to obtain a u32 worth of pseudo random data. Generate code for it.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
41aed09c |
| 28-Mar-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add support for atomic add of unknown values
Allow atomic add to be used even when the value is not guaranteed to fit into a 16 bit immediate. This requires the value to be pulled as data
nfp: bpf: add support for atomic add of unknown values
Allow atomic add to be used even when the value is not guaranteed to fit into a 16 bit immediate. This requires the value to be pulled as data, and therefore use of a transfer register and a context swap.
Track the information about possible lengths of the value, if it's guaranteed to be larger than 16bits don't generate the code for the optimized case at all.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
b556ddd9 |
| 28-Mar-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: expose command delay slots
Allow callers to control the delay slots of commands, instead of giving them just a wait/nowait choice.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.
nfp: bpf: expose command delay slots
Allow callers to control the delay slots of commands, instead of giving them just a wait/nowait choice.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
dcb0c27f |
| 28-Mar-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add basic support for atomic adds
Implement atomic add operation for 32 and 64 bit values. Depend on the verifier to ensure alignment. Values have to be kept in big endian and swapped up
nfp: bpf: add basic support for atomic adds
Implement atomic add operation for 32 and 64 bit values. Depend on the verifier to ensure alignment. Values have to be kept in big endian and swapped upon read/write. For now only support atomic add of a constant.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
bfee64de |
| 28-Mar-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add map deletes from the datapath
Support calling map_delete_elem() FW helper from the datapath programs. For JIT checks and code are basically equivalent to map lookups. Similarly to ot
nfp: bpf: add map deletes from the datapath
Support calling map_delete_elem() FW helper from the datapath programs. For JIT checks and code are basically equivalent to map lookups. Similarly to other map helper key must be on the stack. Different pointer types are left for future extension.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
44d65a47 |
| 28-Mar-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add map updates from the datapath
Support calling map_update_elem() from the datapath programs by calling into FW-provided helper. Value pointer is passed in LM pointer #2. Keeping track
nfp: bpf: add map updates from the datapath
Support calling map_update_elem() from the datapath programs by calling into FW-provided helper. Value pointer is passed in LM pointer #2. Keeping track of old state for arg3 is not necessary, since LM pointer #2 will be always loaded in this case, the trivial optimization for value at the bottom of the stack can't be done here.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
2f46e0c1 |
| 28-Mar-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add helper for validating stack pointers
Our implementation has restriction on stack pointers for function calls. Move the common checks into a helper for reuse. The state has to be enca
nfp: bpf: add helper for validating stack pointers
Our implementation has restriction on stack pointers for function calls. Move the common checks into a helper for reuse. The state has to be encapsulated into a structure to support parameters other than BPF_REG_2.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|