90497e03 | 08-Dec-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Avoid the constant pool in tcg_out_movi
Load constants in no more than two insns, which turns out to be faster than using the constant pool.
Suggested-by: Ilya Leoshkevich <iii@linux.ibm
tcg/s390x: Avoid the constant pool in tcg_out_movi
Load constants in no more than two insns, which turns out to be faster than using the constant pool.
Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
1818c71b | 08-Dec-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Cleanup tcg_out_movi
Merge maybe_out_small_movi, as it no longer has additional users. Use is_const_p{16,32}.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Hen
tcg/s390x: Cleanup tcg_out_movi
Merge maybe_out_small_movi, as it no longer has additional users. Use is_const_p{16,32}.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
32c256ed | 08-Dec-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Tighten constraints for 64-bit compare
Give 64-bit comparison second operand a signed 33-bit immediate. This is the smallest superset of uint32_t and int32_t, as used by CLGFI and CGFI re
tcg/s390x: Tighten constraints for 64-bit compare
Give 64-bit comparison second operand a signed 33-bit immediate. This is the smallest superset of uint32_t and int32_t, as used by CLGFI and CGFI respectively. The rest of the 33-bit space can be loaded into TCG_TMP0. Drop use of the constant pool.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
29a5ea73 | 23-Feb-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Implement ctpop operation
There is an older form that produces per-byte results, and a newer form that produces per-register results.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Si
tcg/s390x: Implement ctpop operation
There is an older form that produces per-byte results, and a newer form that produces per-register results.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
bfff8518 | 23-Feb-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Use tgen_movcond_int in tgen_clz
Reuse code from movcond to conditionally copy a2 to dest, based on the condition codes produced by FLOGR.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.co
tcg/s390x: Use tgen_movcond_int in tgen_clz
Reuse code from movcond to conditionally copy a2 to dest, based on the condition codes produced by FLOGR.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
0bbf0f7a | 01-Dec-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Support SELGR instruction in movcond
The new select instruction provides two separate register inputs, whereas the old load-on-condition instruction overlaps one of the register inputs wi
tcg/s390x: Support SELGR instruction in movcond
The new select instruction provides two separate register inputs, whereas the old load-on-condition instruction overlaps one of the register inputs with the destination.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
23d1394a | 23-Feb-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Generalize movcond implementation
Generalize movcond to support pre-computed conditions, and the same set of arguments at all times. This will be assumed by a following patch, which need
tcg/s390x: Generalize movcond implementation
Generalize movcond to support pre-computed conditions, and the same set of arguments at all times. This will be assumed by a following patch, which needs to reuse tgen_movcond_int.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
5c837bbc | 23-Feb-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Create tgen_cmp2 to simplify movcond
Return both regular and inverted condition codes from tgen_cmp2. This lets us choose after the fact which comparision we want.
Reviewed-by: Ilya Leos
tcg/s390x: Create tgen_cmp2 to simplify movcond
Return both regular and inverted condition codes from tgen_cmp2. This lets us choose after the fact which comparision we want.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
6c9b5c0f | 23-Feb-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Support MIE3 logical operations
This is andc, orc, nand, nor, eqv. We can use nor for implementing not.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson
tcg/s390x: Support MIE3 logical operations
This is andc, orc, nand, nor, eqv. We can use nor for implementing not.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
4134083f | 08-Dec-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Tighten constraints for and_i64
Let the register allocator handle such immediates by matching only what one insn can achieve.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off
tcg/s390x: Tighten constraints for and_i64
Let the register allocator handle such immediates by matching only what one insn can achieve.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
b2509acc | 08-Dec-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Tighten constraints for or_i64 and xor_i64
Drop support for sequential OR and XOR, as the serial dependency is slower than loading the constant first. Let the register allocator handle s
tcg/s390x: Tighten constraints for or_i64 and xor_i64
Drop support for sequential OR and XOR, as the serial dependency is slower than loading the constant first. Let the register allocator handle such immediates by matching only what one insn can achieve.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
a0332aca | 08-Dec-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Issue XILF directly for xor_i32
There is only one instruction that is applicable to a 32-bit immediate xor.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Hende
tcg/s390x: Issue XILF directly for xor_i32
There is only one instruction that is applicable to a 32-bit immediate xor.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
668ce343 | 23-Feb-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Support MIE2 MGRK instruction
The MIE2 facility adds a 3-operand signed 64x64->128 multiply.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.h
tcg/s390x: Support MIE2 MGRK instruction
The MIE2 facility adds a 3-operand signed 64x64->128 multiply.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
92c89a07 | 23-Feb-2022 |
Richard Henderson <richard.henderson@linaro.org> |
tcg/s390x: Support MIE2 multiply single instructions
The MIE2 facility adds 3-operand versions of multiply.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richa
tcg/s390x: Support MIE2 multiply single instructions
The MIE2 facility adds 3-operand versions of multiply.
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|