#
854dc87d |
| 30-Nov-2017 |
Jiong Wang <jiong.wang@netronome.com> |
nfp: bpf: support backward jump
This patch adds support for backward jump on NFP.
- restrictions on backward jump in various functions have been removed. - nfp_fixup_branches now supports backw
nfp: bpf: support backward jump
This patch adds support for backward jump on NFP.
- restrictions on backward jump in various functions have been removed. - nfp_fixup_branches now supports backward jump.
There is one thing to note, currently an input eBPF JMP insn may generate several NFP insns, for example,
NFP imm move insn A \ NFP compare insn B --> 3 NFP insn jited from eBPF JMP insn M NFP branch insn C / --- NFP insn X --> 1 NFP insn jited from eBPF insn N --- ...
therefore, we are doing sanity check to make sure the last jited insn from an eBPF JMP is a NFP branch instruction.
Once backward jump is allowed, it is possible an eBPF JMP insn is at the end of the program. This is however causing trouble for the sanity check. Because the sanity check requires the end index of the NFP insns jited from one eBPF insn while only the start index is recorded before this patch that we can only get the end index by:
start_index_of_the_next_eBPF_insn - 1
or for the above example:
start_index_of_eBPF_insn_N (which is the index of NFP insn X) - 1
nfp_fixup_branches was using nfp_for_each_insn_walk2 to expose *next* insn to each iteration during the traversal so the last index could be calculated from which. Now, it needs some extra code to handle the last insn. Meanwhile, the use of walk2 is actually unnecessary, we could simply use generic single instruction walk to do this, the next insn could be easily calculated using list_next_entry.
So, this patch migrates the jump fixup traversal method to *list_for_each_entry*, this simplifies the code logic a little bit.
The other thing to note is a new state variable "last_bpf_off" is introduced to track the index of the last jited NFP insn. This is necessary because NFP is generating special purposes epilogue sequences, so the index of the last jited NFP insn is *not* always nfp_prog->prog_len - 1.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
Revision tags: v4.13.16, v4.14 |
|
#
c6c580d7 |
| 03-Nov-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: move to new BPF program offload infrastructure
Following steps are taken in the driver to offload an XDP program:
XDP_SETUP_PROG: * prepare: - allocate program state; - run verifie
nfp: bpf: move to new BPF program offload infrastructure
Following steps are taken in the driver to offload an XDP program:
XDP_SETUP_PROG: * prepare: - allocate program state; - run verifier (bpf_analyzer()); - run translation; * load: - stop old program if needed; - load program; - enable BPF if not enabled; * clean up: - free program image.
With new infrastructure the flow will look like this:
BPF_OFFLOAD_VERIFIER_PREP: - allocate program state; BPF_OFFLOAD_TRANSLATE: - run translation; XDP_SETUP_PROG: - stop old program if needed; - load program; - enable BPF if not enabled; BPF_OFFLOAD_DESTROY: - free program image.
Take advantage of the new infrastructure. Allocation of driver metadata has to be moved from jit.c to offload.c since it's now done at a different stage. Since there is no separate driver private data for verification step, move temporary nfp_meta pointer into nfp_prog. We will now use user space context offsets.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9314c442 |
| 03-Nov-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: move translation prepare to offload.c
struct nfp_prog is currently only used internally by the translator. This means there is a lot of parameter passing going on, between the translator a
nfp: bpf: move translation prepare to offload.c
struct nfp_prog is currently only used internally by the translator. This means there is a lot of parameter passing going on, between the translator and different stages of offload. Simplify things by allocating nfp_prog in offload.c already.
We will now use kmalloc() to allocate the program area and only DMA map it for the time of loading (instead of allocating DMA coherent memory upfront).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c1c88eae |
| 03-Nov-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: move program prepare and free into offload.c
Most of offload/translation prepare logic will be moved to offload.c. To help git generate more reasonable diffs move nfp_prog_prepare() and n
nfp: bpf: move program prepare and free into offload.c
Most of offload/translation prepare logic will be moved to offload.c. To help git generate more reasonable diffs move nfp_prog_prepare() and nfp_prog_free() functions there as a first step.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
94508438 |
| 03-Nov-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: remove the register renumbering leftovers
The register renumbering was removed and will not be coming back in its old, naive form, given that it would be fundamentally incompatible with ca
nfp: bpf: remove the register renumbering leftovers
The register renumbering was removed and will not be coming back in its old, naive form, given that it would be fundamentally incompatible with calling functions. Remove the leftovers.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
012bb8a8 |
| 03-Nov-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: drop support for cls_bpf with legacy actions
Only support BPF_PROG_TYPE_SCHED_CLS programs in direct action mode. This simplifies preparing the offload since there will now be only one mo
nfp: bpf: drop support for cls_bpf with legacy actions
Only support BPF_PROG_TYPE_SCHED_CLS programs in direct action mode. This simplifies preparing the offload since there will now be only one mode of operation for that type of program. We need to know the attachment mode type of cls_bpf programs, because exit codes are interpreted differently for legacy vs DA mode.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
254ef4d7 |
| 01-Nov-2017 |
Jiong Wang <jiong.wang@netronome.com> |
nfp: bpf: support [BPF_ALU | BPF_ALU64] | BPF_NEG
This patch supports BPF_NEG under both BPF_ALU64 and BPF_ALU. LLVM recently starts to generate it.
NOTE: BPF_NEG takes single operand which is an r
nfp: bpf: support [BPF_ALU | BPF_ALU64] | BPF_NEG
This patch supports BPF_NEG under both BPF_ALU64 and BPF_ALU. LLVM recently starts to generate it.
NOTE: BPF_NEG takes single operand which is an register and serve as both input and output.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
5d42ced1 |
| 01-Nov-2017 |
Jiong Wang <jiong.wang@netronome.com> |
nfp: bpf: rename ALU_OP_NEG to ALU_OP_NOT
The current ALU_OP_NEG is Op encoding 0x4 for NPF ALU instruction. It is actually performing "~B" operation which is bitwise NOT.
The using naming ALU_OP_N
nfp: bpf: rename ALU_OP_NEG to ALU_OP_NOT
The current ALU_OP_NEG is Op encoding 0x4 for NPF ALU instruction. It is actually performing "~B" operation which is bitwise NOT.
The using naming ALU_OP_NEG is misleading as NEG is -B which is not the same as ~B.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9f16c8ab |
| 23-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: optimize mov64 a little
Loading 64bit constants require up to 4 load immediates, since we can only load 16 bits at a time. If the 32bit halves of the 64bit constant are the same, however,
nfp: bpf: optimize mov64 a little
Loading 64bit constants require up to 4 load immediates, since we can only load 16 bits at a time. If the 32bit halves of the 64bit constant are the same, however, we can save a cycle by doing a register move instead of two loads of 16 bits.
Note that we don't optimize the normal ALU64 load because even though it's a 64 bit load the upper half of the register is a coming from sign extension so we can load it in one cycle anyway.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b14157ee |
| 23-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: support stack accesses via non-constant pointers
If stack pointer has a different value on different paths but the alignment to words (4B) remains the same, we can set a new LMEM access po
nfp: bpf: support stack accesses via non-constant pointers
If stack pointer has a different value on different paths but the alignment to words (4B) remains the same, we can set a new LMEM access pointer to the calculated value and access whichever word it's pointing to.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
2df03a50 |
| 23-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: support accessing the stack beyond 64 bytes
To access beyond 64th byte of the stack we need to set a new stack pointer register (LMEM is accessed indirectly through those pointers). Add a
nfp: bpf: support accessing the stack beyond 64 bytes
To access beyond 64th byte of the stack we need to set a new stack pointer register (LMEM is accessed indirectly through those pointers). Add a function for encoding local CSR access instruction. Use stack pointer number 3.
Note that stack pointer registers allow us to index into 32 bytes of LMEM (with shift operations i.e. when operands are restricted). This means if access is crossing 32 byte boundary we must not use offsetting, we have to set the pointer to the exact address and move it with post-increments.
We depend on the datapath placing the stack base address in GPR A22 for our use.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d3488480 |
| 23-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: allow stack accesses via modified stack registers
As long as the verifier tells us the stack offset exactly we can render the LMEM reads quite easily. Simply make sure that the offset is
nfp: bpf: allow stack accesses via modified stack registers
As long as the verifier tells us the stack offset exactly we can render the LMEM reads quite easily. Simply make sure that the offset is constant for a given instruction and add it to the instruction's offset.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9a90c83c |
| 23-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: optimize the RMW for stack accesses
When we are performing unaligned stack accesses in the 32-64B window we have to do a read-modify-write cycle. E.g. for reading 8 bytes from address 17:
nfp: bpf: optimize the RMW for stack accesses
When we are performing unaligned stack accesses in the 32-64B window we have to do a read-modify-write cycle. E.g. for reading 8 bytes from address 17:
0: tmp = stack[16] 1: gprLo = tmp >> 8 2: tmp = stack[20] 3: gprLo |= tmp << 24 4: tmp = stack[20] 5: gprHi = tmp >> 8 6: tmp = stack[24] 7: gprHi |= tmp << 24
The load on line 4 is unnecessary, because tmp already contains data from stack[20].
For write we can optimize both loads and writebacks away.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a82b23fb |
| 23-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add stack read support
Add simple stack read support, similar to write in every aspect, but data flowing the other way. Note that unlike write which can be done in smaller than word quant
nfp: bpf: add stack read support
Add simple stack read support, similar to write in every aspect, but data flowing the other way. Note that unlike write which can be done in smaller than word quantities, if registers are loaded with less-than-word of stack contents - the values have to be zero extended.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ee9133a8 |
| 23-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add stack write support
Stack is implemented by the LMEM register file. Unaligned accesses to LMEM are not allowed. Accesses also have to be 4B wide.
To support stack we need to make su
nfp: bpf: add stack write support
Stack is implemented by the LMEM register file. Unaligned accesses to LMEM are not allowed. Accesses also have to be 4B wide.
To support stack we need to make sure offsets of pointers are known at translation time (for now) and perform correct load/mask/shift operations.
Since we can access first 64B of LMEM without much effort support only stacks not bigger than 64B. Following commits will extend the possible sizes beyond that.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ff42bb9f |
| 23-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add helper for emitting nops
The need to emitting a few nops will become more common soon as we add stack and map support. Add a helper. This allows for code to be shorter but also may b
nfp: bpf: add helper for emitting nops
The need to emitting a few nops will become more common soon as we add stack and map support. Add a helper. This allows for code to be shorter but also may be handy for marking the nops with a "reason" to ease applying optimizations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
bfddbc8a |
| 12-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: support direct packet access in TC
Add support for direct packet access in TC, note that because writing the packet will cause the verifier to generate a csum fixup prologue we won't be ab
nfp: bpf: support direct packet access in TC
Add support for direct packet access in TC, note that because writing the packet will cause the verifier to generate a csum fixup prologue we won't be able to offload packet writes from TC, just yet, only the reads will work.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
e663fe38 |
| 12-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: direct packet access - write
This patch adds ability to write packet contents using pre-validated packet pointers (direct packet access).
Signed-off-by: Jakub Kicinski <jakub.kicinski@net
nfp: bpf: direct packet access - write
This patch adds ability to write packet contents using pre-validated packet pointers (direct packet access).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
2ca71441 |
| 12-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add support for direct packet access - read
In direct packet access bound checks are already done, we can simply dereference the packet pointer.
Verifier/parser logic needs to record poin
nfp: bpf: add support for direct packet access - read
In direct packet access bound checks are already done, we can simply dereference the packet pointer.
Verifier/parser logic needs to record pointer type. Note that although verifier does protect us from CTX vs other pointer changes we will also want to differentiate between PACKET vs MAP_VALUE or STACK, so we can add the check already.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
0a793977 |
| 12-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: separate I/O from checks for legacy data load
Move data load into a separate function and separate it from packet length checks of legacy I/O. This makes the code more readable and easier
nfp: bpf: separate I/O from checks for legacy data load
Move data load into a separate function and separate it from packet length checks of legacy I/O. This makes the code more readable and easier to reuse.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
943c57b9 |
| 12-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: fix context accesses
Sizes of fields in struct xdp_md/xdp_buff and some in sk_buff depend on target architecture. Take that into account and use struct xdp_buff, not struct xdp_md.
Signe
nfp: bpf: fix context accesses
Sizes of fields in struct xdp_md/xdp_buff and some in sk_buff depend on target architecture. Take that into account and use struct xdp_buff, not struct xdp_md.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3119d1fd |
| 12-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: implement byte swap instruction
Implement byte swaps with rotations, shifts and byte loads. Remember to clear upper parts of the 64 bit registers.
Signed-off-by: Jakub Kicinski <jakub.kic
nfp: bpf: implement byte swap instruction
Implement byte swaps with rotations, shifts and byte loads. Remember to clear upper parts of the 64 bit registers.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c000dfb5 |
| 12-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add mov helper
Register move operation is encoded as alu no op. This means that one has to specify number of unused/none parameters to the emit_alu(). Add a helper.
Signed-off-by: Jakub
nfp: bpf: add mov helper
Register move operation is encoded as alu no op. This means that one has to specify number of unused/none parameters to the emit_alu(). Add a helper.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
26fa818d |
| 12-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: fix compare instructions
Now that we have BPF assemebler support in LLVM 6 we can easily test all compare instructions (LLVM 4 didn't generate most of them from C). Fix the compare to imm
nfp: bpf: fix compare instructions
Now that we have BPF assemebler support in LLVM 6 we can easily test all compare instructions (LLVM 4 didn't generate most of them from C). Fix the compare to immediates and refactor the order of compare to regs to make sure they both follow the same pattern.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
82837370 |
| 12-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
nfp: bpf: add missing return in jne_imm optimization
We optimize comparisons to immediate 0 as if (reg.lo | reg.hi). The early return statement was missing, however, which means we would generate tw
nfp: bpf: add missing return in jne_imm optimization
We optimize comparisons to immediate 0 as if (reg.lo | reg.hi). The early return statement was missing, however, which means we would generate two comparisons - optimized one followed by a normal 2x 32 bit compare.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|