Revision tags: v9.1.0 |
|
#
a7e10fab |
| 23-May-2024 |
Chinmay Rath <rathc@linux.ibm.com> |
target/ppc: Move VMX integer add/sub saturate insns to decodetree.
Moving the following instructions to decodetree specification :
v{add,sub}{u,s}{b,h,w}s : VX-form
The changes were verified by
target/ppc: Move VMX integer add/sub saturate insns to decodetree.
Moving the following instructions to decodetree specification :
v{add,sub}{u,s}{b,h,w}s : VX-form
The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
show more ...
|
#
948e257c |
| 23-Apr-2024 |
Chinmay Rath <rathc@linux.ibm.com> |
target/ppc: Move logical fixed-point instructions to decodetree.
Moving the below instructions to decodetree specification :
andi[s]., {ori, xori}[s] : D-form
{and, andc, nand, or, orc, nor, x
target/ppc: Move logical fixed-point instructions to decodetree.
Moving the below instructions to decodetree specification :
andi[s]., {ori, xori}[s] : D-form
{and, andc, nand, or, orc, nor, xor, eqv}[.], exts{b, h, w}[.], cnt{l, t}z{w, d}[.], popcnt{b, w, d}, prty{w, d}, cmp, bpermd : X-form
With this patch, all the fixed-point logical instructions have been moved to decodetree. The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> [np: 32-bit compile fix] Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
show more ...
|
#
ae556c6a |
| 23-Apr-2024 |
Chinmay Rath <rathc@linux.ibm.com> |
target/ppc: Move cmp{rb, eqb}, tw[i], td[i], isel instructions to decodetree.
Moving the following instructions to decodetree specification :
cmp{rb, eqb}, t{w, d} : X-form t{w, d}i : D-form is
target/ppc: Move cmp{rb, eqb}, tw[i], td[i], isel instructions to decodetree.
Moving the following instructions to decodetree specification :
cmp{rb, eqb}, t{w, d} : X-form t{w, d}i : D-form isel : A-form
The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured using the '-d in_asm,op' flag. Also for CMPRB, following review comments : Replaced repetition of arithmetic right shifting (tcg_gen_shri_i32) followed by extraction of last 8 bits (tcg_gen_ext8u_i32) with extraction of the required bits using offsets (tcg_gen_extract_i32).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> [np: 32-bit compile fix] Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
show more ...
|
#
f424bc10 |
| 23-Apr-2024 |
Chinmay Rath <rathc@linux.ibm.com> |
target/ppc: Move div/mod fixed-point insns (64 bits operands) to decodetree.
Moving the below instructions to decodetree specification :
divd[u, e, eu][o][.] : XO-form mod{sd, ud} : X-form
With
target/ppc: Move div/mod fixed-point insns (64 bits operands) to decodetree.
Moving the below instructions to decodetree specification :
divd[u, e, eu][o][.] : XO-form mod{sd, ud} : X-form
With this patch, all the fixed-point arithmetic instructions have been moved to decodetree. The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured using the '-d in_asm,op' flag. Also, remaned do_divwe method in fixedpoint-impl.c.inc to do_dive because it is now used to divide doubleword operands as well, and not just words.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> [np: 32-bit compile fix] Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
show more ...
|
#
a81b5c18 |
| 23-Apr-2024 |
Chinmay Rath <rathc@linux.ibm.com> |
target/ppc: Move neg, darn, mod{sw, uw} to decodetree.
Moving the below instructions to decodetree specification :
neg[o][.] : XO-form mod{sw, uw}, darn : X-form
The changes were verified
target/ppc: Move neg, darn, mod{sw, uw} to decodetree.
Moving the below instructions to decodetree specification :
neg[o][.] : XO-form mod{sw, uw}, darn : X-form
The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> [np: 32-bit compile fix] Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
show more ...
|
#
2871921d |
| 23-Apr-2024 |
Chinmay Rath <rathc@linux.ibm.com> |
target/ppc: Move divw[u, e, eu] instructions to decodetree.
Moving the following instructions to decodetree specification : divw[u, e, eu][o][.] : XO-form
The changes were verified by validating
target/ppc: Move divw[u, e, eu] instructions to decodetree.
Moving the following instructions to decodetree specification : divw[u, e, eu][o][.] : XO-form
The changes were verified by validating that the tcg ops generated by those instructions remain the same, which were captured with the '-d in_asm,op' flag.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Chinmay Rath <rathc@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
show more ...
|
#
668a6314 |
| 29-Sep-2023 |
Cédric Le Goater <clg@kaod.org> |
target/ppc: Rename variables to avoid local variable shadowing in VUPKPX
and fix such warnings :
../target/ppc/int_helper.c: In function ‘helper_vupklpx’: ../target/ppc/int_helper.c:2025:21: wa
target/ppc: Rename variables to avoid local variable shadowing in VUPKPX
and fix such warnings :
../target/ppc/int_helper.c: In function ‘helper_vupklpx’: ../target/ppc/int_helper.c:2025:21: warning: declaration of ‘r’ shadows a parameter [-Wshadow=local] 2025 | uint8_t r = (e >> 10) & 0x1f; \ | ^ ../target/ppc/int_helper.c:2033:1: note: in expansion of macro ‘VUPKPX’ 2033 | VUPKPX(lpx, UPKLO) | ^~~~~~ ../target/ppc/int_helper.c:2017:41: note: shadowed declaration is here 2017 | void helper_vupk##suffix(ppc_avr_t *r, ppc_avr_t *b) \ | ~~~~~~~~~~~^ ../target/ppc/int_helper.c:2033:1: note: in expansion of macro ‘VUPKPX’ 2033 | VUPKPX(lpx, UPKLO) | ^~~~~~
Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-ID: <20230929083143.234553-1-clg@kaod.org> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Markus Armbruster <armbru@redhat.com>
show more ...
|
#
7bdbf233 |
| 11-Jul-2023 |
Richard Henderson <richard.henderson@linaro.org> |
target/ppc: Use clmul_64
Use generic routine for 64-bit carry-less multiply.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
#
f56d3c1a |
| 11-Jul-2023 |
Richard Henderson <richard.henderson@linaro.org> |
target/ppc: Use clmul_32* routines
Use generic routines for 32-bit carry-less multiply.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
#
a2c67342 |
| 11-Jul-2023 |
Richard Henderson <richard.henderson@linaro.org> |
target/ppc: Use clmul_16* routines
Use generic routines for 16-bit carry-less multiply.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
#
cec4090d |
| 11-Jul-2023 |
Richard Henderson <richard.henderson@linaro.org> |
target/ppc: Use clmul_8* routines
Use generic routines for 8-bit carry-less multiply.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
#
73c19706 |
| 28-Aug-2023 |
Philippe Mathieu-Daudé <philmd@linaro.org> |
target/helpers: Remove unnecessary 'qemu/main-loop.h' header
"qemu/main-loop.h" declares functions related to QEMU's main loop mutex, which these files don't access. Remove the unused "qemu/main-loo
target/helpers: Remove unnecessary 'qemu/main-loop.h' header
"qemu/main-loop.h" declares functions related to QEMU's main loop mutex, which these files don't access. Remove the unused "qemu/main-loop.h" header.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230828221314.18435-8-philmd@linaro.org>
show more ...
|
#
af4cb945 |
| 02-Jun-2023 |
Richard Henderson <richard.henderson@linaro.org> |
target/ppc: Use aesdec_ISB_ISR_AK_IMC
This implements the VNCIPHER instruction.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
target/ppc: Use aesdec_ISB_ISR_AK_IMC
This implements the VNCIPHER instruction.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
#
ce9f5b37 |
| 02-Jun-2023 |
Richard Henderson <richard.henderson@linaro.org> |
target/ppc: Use aesenc_SB_SR_MC_AK
This implements the VCIPHER instruction.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Sign
target/ppc: Use aesenc_SB_SR_MC_AK
This implements the VCIPHER instruction.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
#
2cf44f3b |
| 02-Jun-2023 |
Richard Henderson <richard.henderson@linaro.org> |
target/ppc: Use aesdec_ISB_ISR_AK
This implements the VNCIPHERLAST instruction.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
target/ppc: Use aesdec_ISB_ISR_AK
This implements the VNCIPHERLAST instruction.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
#
7df34e48 |
| 02-Jun-2023 |
Richard Henderson <richard.henderson@linaro.org> |
target/ppc: Use aesenc_SB_SR_AK
This implements the VCIPHERLAST instruction.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Sig
target/ppc: Use aesenc_SB_SR_AK
This implements the VCIPHERLAST instruction.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
Revision tags: v8.0.0, v7.2.0 |
|
#
26c964f8 |
| 19-Oct-2022 |
Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> |
target/ppc: Move VABSDU[BHW] to decodetree and use gvec
Moved VABSDUB, VABSDUH and VABSDUW to decodetree and use gvec to translate them.
vabsdub: rept loop master patch 8 12
target/ppc: Move VABSDU[BHW] to decodetree and use gvec
Moved VABSDUB, VABSDUH and VABSDUW to decodetree and use gvec to translate them.
vabsdub: rept loop master patch 8 12500 0,03601600 0,00688500 (-80.9%) 25 4000 0,03651000 0,00532100 (-85.4%) 100 1000 0,03666900 0,00595300 (-83.8%) 500 200 0,04305800 0,01244600 (-71.1%) 2500 40 0,06893300 0,04273700 (-38.0%) 8000 12 0,14633200 0,12660300 (-13.5%)
vabsduh: rept loop master patch 8 12500 0,02172400 0,00687500 (-68.4%) 25 4000 0,02154100 0,00531500 (-75.3%) 100 1000 0,02235400 0,00596300 (-73.3%) 500 200 0,02827500 0,01245100 (-56.0%) 2500 40 0,05638400 0,04285500 (-24.0%) 8000 12 0,13166000 0,12641400 (-4.0%)
vabsduw: rept loop master patch 8 12500 0,01646400 0,00688300 (-58.2%) 25 4000 0,01454500 0,00475500 (-67.3%) 100 1000 0,01545800 0,00511800 (-66.9%) 500 200 0,02168200 0,01114300 (-48.6%) 2500 40 0,04571300 0,04138800 (-9.5%) 8000 12 0,12209500 0,12178500 (-0.3%)
Same as VADDCUW and VSUBCUW, overall performance gain but it uses more TCGop (4 before the patch, 6 after).
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-8-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|
#
c85929b2 |
| 19-Oct-2022 |
Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> |
target/ppc: Move VAVG[SU][BHW] to decodetree and use gvec
Moved the instructions VAVGUB, VAVGUH, VAVGUW, VAVGSB, VAVGSH, VAVGSW, to decodetree and use gvec with them. For these one the right shift h
target/ppc: Move VAVG[SU][BHW] to decodetree and use gvec
Moved the instructions VAVGUB, VAVGUH, VAVGUW, VAVGSB, VAVGSH, VAVGSW, to decodetree and use gvec with them. For these one the right shift had to be made before the sum as to avoid an overflow, so add 1 at the end if any of the entries had 1 in its LSB as to replicate the "+ 1" before the shift described by the ISA.
vavgub: rept loop master patch 8 12500 0,02616600 0,00754200 (-71.2%) 25 4000 0,02530000 0,00637700 (-74.8%) 100 1000 0,02604600 0,00790100 (-69.7%) 500 200 0,03189300 0,01838400 (-42.4%) 2500 40 0,06006900 0,06851000 (+14.1%) 8000 12 0,13941000 0,20548500 (+47.4%)
vavguh: rept loop master patch 8 12500 0,01818200 0,00780600 (-57.1%) 25 4000 0,01789300 0,00641600 (-64.1%) 100 1000 0,01899100 0,00787200 (-58.5%) 500 200 0,02527200 0,01828400 (-27.7%) 2500 40 0,05361800 0,06773000 (+26.3%) 8000 12 0,12886600 0,20291400 (+57.5%)
vavguw: rept loop master patch 8 12500 0,01423100 0,00776600 (-45.4%) 25 4000 0,01780800 0,00638600 (-64.1%) 100 1000 0,02085500 0,00787000 (-62.3%) 500 200 0,02737100 0,01828800 (-33.2%) 2500 40 0,05572600 0,06774200 (+21.6%) 8000 12 0,13101700 0,20311600 (+55.0%)
vavgsb: rept loop master patch 8 12500 0,03006000 0,00788600 (-73.8%) 25 4000 0,02882200 0,00637800 (-77.9%) 100 1000 0,02958000 0,00791400 (-73.2%) 500 200 0,03548800 0,01860400 (-47.6%) 2500 40 0,06360000 0,06850800 (+7.7%) 8000 12 0,13816500 0,20550300 (+48.7%)
vavgsh: rept loop master patch 8 12500 0,01965900 0,00776600 (-60.5%) 25 4000 0,01875400 0,00638700 (-65.9%) 100 1000 0,01952200 0,00786900 (-59.7%) 500 200 0,02562000 0,01760300 (-31.3%) 2500 40 0,05384300 0,06742800 (+25.2%) 8000 12 0,13240800 0,20330000 (+53.5%)
vavgsw: rept loop master patch 8 12500 0,01407700 0,00775600 (-44.9%) 25 4000 0,01762300 0,00640000 (-63.7%) 100 1000 0,02046500 0,00788500 (-61.5%) 500 200 0,02745600 0,01843000 (-32.9%) 2500 40 0,05375500 0,06820500 (+26.9%) 8000 12 0,13068300 0,20304900 (+55.4%)
These results to me seems to indicate that with gvec the results have a slower translation but faster execution.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-7-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|
#
d57fbd8f |
| 19-Oct-2022 |
Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> |
target/ppc: Move VPRTYB[WDQ] to decodetree and use gvec
Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to decodetree. VPRTYBW and VPRTYBD now also use .fni4 and .fni8, respective
target/ppc: Move VPRTYB[WDQ] to decodetree and use gvec
Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to decodetree. VPRTYBW and VPRTYBD now also use .fni4 and .fni8, respectively.
vprtybw: rept loop master patch 8 12500 0,01198900 0,00703100 (-41.4%) 25 4000 0,01070100 0,00571400 (-46.6%) 100 1000 0,01123300 0,00678200 (-39.6%) 500 200 0,01601500 0,01535600 (-4.1%) 2500 40 0,03872900 0,05562100 (43.6%) 8000 12 0,10047000 0,16643000 (65.7%)
vprtybd: rept loop master patch 8 12500 0,00757700 0,00788100 (4.0%) 25 4000 0,00652500 0,00669600 (2.6%) 100 1000 0,00714400 0,00825400 (15.5%) 500 200 0,01211000 0,01903700 (57.2%) 2500 40 0,03483800 0,07021200 (101.5%) 8000 12 0,09591800 0,21036200 (119.3%)
vprtybq: rept loop master patch 8 12500 0,00675600 0,00667200 (-1.2%) 25 4000 0,00619400 0,00643200 (3.8%) 100 1000 0,00707100 0,00751100 (6.2%) 500 200 0,01199300 0,01342000 (11.9%) 2500 40 0,03490900 0,04092900 (17.2%) 8000 12 0,09588200 0,11465100 (19.6%)
I wasn't expecting such a performance lost in both VPRTYBD and VPRTYBQ, I'm not sure if it's worth to move those instructions. Comparing the assembly of the helper with the TCGop they are pretty similar, so I'm not sure why vprtybd took so much more time.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-6-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|
#
90b5aadb |
| 19-Oct-2022 |
Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> |
target/ppc: Move VNEG[WD] to decodtree and use gvec
Moved the instructions VNEGW and VNEGD to decodetree and used gvec to decode it.
vnegw: rept loop master patch 8 12500
target/ppc: Move VNEG[WD] to decodtree and use gvec
Moved the instructions VNEGW and VNEGD to decodetree and used gvec to decode it.
vnegw: rept loop master patch 8 12500 0,01053200 0,00548400 (-47.9%) 25 4000 0,01030500 0,00390000 (-62.2%) 100 1000 0,01096300 0,00395400 (-63.9%) 500 200 0,01472000 0,00712300 (-51.6%) 2500 40 0,03809000 0,02147700 (-43.6%) 8000 12 0,09957100 0,06202100 (-37.7%)
vnegd: rept loop master patch 8 12500 0,00594600 0,00543800 (-8.5%) 25 4000 0,00575200 0,00396400 (-31.1%) 100 1000 0,00676100 0,00394800 (-41.6%) 500 200 0,01149300 0,00709400 (-38.3%) 2500 40 0,03441500 0,02169600 (-37.0%) 8000 12 0,09516900 0,06337000 (-33.4%)
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-5-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|
#
611bc69b |
| 19-Oct-2022 |
Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> |
target/ppc: Move V(ADD|SUB)CUW to decodetree and use gvec
This patch moves VADDCUW and VSUBCUW to decodtree with gvec using an implementation based on the helper, with the main difference being chan
target/ppc: Move V(ADD|SUB)CUW to decodetree and use gvec
This patch moves VADDCUW and VSUBCUW to decodtree with gvec using an implementation based on the helper, with the main difference being changing the -1 (aka all bits set to 1) result returned by cmp when true to +1. It also implemented a .fni4 version of those instructions and dropped the helper.
vaddcuw: rept loop master patch 8 12500 0,01008200 0,00612400 (-39.3%) 25 4000 0,01091500 0,00471600 (-56.8%) 100 1000 0,01332500 0,00593700 (-55.4%) 500 200 0,01998500 0,01275700 (-36.2%) 2500 40 0,04704300 0,04364300 (-7.2%) 8000 12 0,10748200 0,11241000 (+4.6%)
vsubcuw: rept loop master patch 8 12500 0,01226200 0,00571600 (-53.4%) 25 4000 0,01493500 0,00462100 (-69.1%) 100 1000 0,01522700 0,00455100 (-70.1%) 500 200 0,02384600 0,01133500 (-52.5%) 2500 40 0,04935200 0,03178100 (-35.6%) 8000 12 0,09039900 0,09440600 (+4.4%)
Overall there was a gain in performance, but the TCGop code was still slightly bigger in the new version (it went from 4 to 5).
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-4-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|
#
306e4753 |
| 19-Oct-2022 |
Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> |
target/ppc: Move VMH[R]ADDSHS instruction to decodetree
This patch moves VMHADDSHS and VMHRADDSHS to decodetree I couldn't find a satisfactory implementation with TCG inline.
vmhaddshs: rept loo
target/ppc: Move VMH[R]ADDSHS instruction to decodetree
This patch moves VMHADDSHS and VMHRADDSHS to decodetree I couldn't find a satisfactory implementation with TCG inline.
vmhaddshs: rept loop master patch 8 12500 0,02983400 0,02648500 (-11.2%) 25 4000 0,02946000 0,02518000 (-14.5%) 100 1000 0,03104300 0,02638000 (-15.0%) 500 200 0,04002000 0,03502500 (-12.5%) 2500 40 0,08090100 0,07562200 (-6.5%) 8000 12 0,19242600 0,18626800 (-3.2%)
vmhraddshs: rept loop master patch 8 12500 0,03078600 0,02851000 (-7.4%) 25 4000 0,02793200 0,02746900 (-1.7%) 100 1000 0,02886000 0,02839900 (-1.6%) 500 200 0,03714700 0,03799200 (+2.3%) 2500 40 0,07948000 0,07852200 (-1.2%) 8000 12 0,19049800 0,18813900 (-1.2%)
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-3-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|
#
dc46167a |
| 19-Oct-2022 |
Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> |
target/ppc: Moved VMLADDUHM to decodetree and use gvec
This patch moves VMLADDUHM to decodetree a creates a gvec implementation using mul_vec and add_vec.
rept loop master patch 8
target/ppc: Moved VMLADDUHM to decodetree and use gvec
This patch moves VMLADDUHM to decodetree a creates a gvec implementation using mul_vec and add_vec.
rept loop master patch 8 12500 0,01810500 0,00903100 (-50.1%) 25 4000 0,01739400 0,00747700 (-57.0%) 100 1000 0,01843600 0,00901400 (-51.1%) 500 200 0,02574600 0,01971000 (-23.4%) 2500 40 0,05921600 0,07121800 (+20.3%) 8000 12 0,15326700 0,21725200 (+41.7%)
The significant difference in performance when REPT is low and LOOP is high I think is due to the fact that the new implementation has a higher translation time, as when using a helper only 5 TCGop are used but with the patch a total of 10 TCGop are needed (Power lacks a direct mul_vec equivalent so this instruction is implemented with the help of 5 others, vmuleu, vmulou, vmrgh, vmrgl and vpkum).
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-2-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|
#
af721a31 |
| 06-Sep-2022 |
Víctor Colombo <victor.colombo@eldorado.org.br> |
target/ppc: Set OV32 when OV is set
According to PowerISA: "OV32 is set whenever OV is implicitly set, and is set to the same value that OV is defined to be set to in 32-bit mode".
This patch chang
target/ppc: Set OV32 when OV is set
According to PowerISA: "OV32 is set whenever OV is implicitly set, and is set to the same value that OV is defined to be set to in 32-bit mode".
This patch changes helper_update_ov_legacy to set/clear ov32 when applicable.
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220906125523.38765-7-victor.colombo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|
#
b7d30fae |
| 06-Jun-2022 |
Matheus Ferst <matheus.ferst@eldorado.org.br> |
target/ppc: use int128.h methods in vsubcuq
And also move the insn to decodetree and remove the now unused avr_qw_not, avr_qw_cmpu, and avr_qw_add methods.
Signed-off-by: Matheus Ferst <matheus.fer
target/ppc: use int128.h methods in vsubcuq
And also move the insn to decodetree and remove the now unused avr_qw_not, avr_qw_cmpu, and avr_qw_add methods.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Víctor Colombo <victor.colombo@eldorado.org.br> Message-Id: <20220606150037.338931-8-matheus.ferst@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|