1*c3fb76b9STaylor SimpsonHexagon is Qualcomm's very long instruction word (VLIW) digital signal 2*c3fb76b9STaylor Simpsonprocessor(DSP). 3*c3fb76b9STaylor Simpson 4*c3fb76b9STaylor SimpsonThe following versions of the Hexagon core are supported 5*c3fb76b9STaylor Simpson Scalar core: v67 6*c3fb76b9STaylor Simpson https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual 7*c3fb76b9STaylor Simpson 8*c3fb76b9STaylor SimpsonWe presented an overview of the project at the 2019 KVM Forum. 9*c3fb76b9STaylor Simpson https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center 10*c3fb76b9STaylor Simpson 11*c3fb76b9STaylor Simpson*** Tour of the code *** 12*c3fb76b9STaylor Simpson 13*c3fb76b9STaylor SimpsonThe qemu-hexagon implementation is a combination of qemu and the Hexagon 14*c3fb76b9STaylor Simpsonarchitecture library (aka archlib). The three primary directories with 15*c3fb76b9STaylor SimpsonHexagon-specific code are 16*c3fb76b9STaylor Simpson 17*c3fb76b9STaylor Simpson qemu/target/hexagon 18*c3fb76b9STaylor Simpson This has all the instruction and packet semantics 19*c3fb76b9STaylor Simpson qemu/target/hexagon/imported 20*c3fb76b9STaylor Simpson These files are imported with very little modification from archlib 21*c3fb76b9STaylor Simpson *.idef Instruction semantics definition 22*c3fb76b9STaylor Simpson macros.def Mapping of macros to instruction attributes 23*c3fb76b9STaylor Simpson encode*.def Encoding patterns for each instruction 24*c3fb76b9STaylor Simpson iclass.def Instruction class definitions used to determine 25*c3fb76b9STaylor Simpson legal VLIW slots for each instruction 26*c3fb76b9STaylor Simpson qemu/linux-user/hexagon 27*c3fb76b9STaylor Simpson Helpers for loading the ELF file and making Linux system calls, 28*c3fb76b9STaylor Simpson signals, etc 29*c3fb76b9STaylor Simpson 30*c3fb76b9STaylor SimpsonWe start with scripts that generate a bunch of include files. This 31*c3fb76b9STaylor Simpsonis a two step process. The first step is to use the C preprocessor to expand 32*c3fb76b9STaylor Simpsonmacros inside the architecture definition files. This is done in 33*c3fb76b9STaylor Simpsontarget/hexagon/gen_semantics.c. This step produces 34*c3fb76b9STaylor Simpson <BUILD_DIR>/target/hexagon/semantics_generated.pyinc. 35*c3fb76b9STaylor SimpsonThat file is consumed by the following python scripts to produce the indicated 36*c3fb76b9STaylor Simpsonheader files in <BUILD_DIR>/target/hexagon 37*c3fb76b9STaylor Simpson gen_opcodes_def.py -> opcodes_def_generated.h.inc 38*c3fb76b9STaylor Simpson gen_op_regs.py -> op_regs_generated.h.inc 39*c3fb76b9STaylor Simpson gen_printinsn.py -> printinsn_generated.h.inc 40*c3fb76b9STaylor Simpson gen_op_attribs.py -> op_attribs_generated.h.inc 41*c3fb76b9STaylor Simpson gen_helper_protos.py -> helper_protos_generated.h.inc 42*c3fb76b9STaylor Simpson gen_shortcode.py -> shortcode_generated.h.inc 43*c3fb76b9STaylor Simpson gen_tcg_funcs.py -> tcg_funcs_generated.c.inc 44*c3fb76b9STaylor Simpson gen_tcg_func_table.py -> tcg_func_table_generated.c.inc 45*c3fb76b9STaylor Simpson gen_helper_funcs.py -> helper_funcs_generated.c.inc 46*c3fb76b9STaylor Simpson 47*c3fb76b9STaylor SimpsonQemu helper functions have 3 parts 48*c3fb76b9STaylor Simpson DEF_HELPER declaration indicates the signature of the helper 49*c3fb76b9STaylor Simpson gen_helper_<NAME> will generate a TCG call to the helper function 50*c3fb76b9STaylor Simpson The helper implementation 51*c3fb76b9STaylor Simpson 52*c3fb76b9STaylor SimpsonHere's an example of the A2_add instruction. 53*c3fb76b9STaylor Simpson Instruction tag A2_add 54*c3fb76b9STaylor Simpson Assembly syntax "Rd32=add(Rs32,Rt32)" 55*c3fb76b9STaylor Simpson Instruction semantics "{ RdV=RsV+RtV;}" 56*c3fb76b9STaylor Simpson 57*c3fb76b9STaylor SimpsonBy convention, the operands are identified by letter 58*c3fb76b9STaylor Simpson RdV is the destination register 59*c3fb76b9STaylor Simpson RsV, RtV are source registers 60*c3fb76b9STaylor Simpson 61*c3fb76b9STaylor SimpsonThe generator uses the operand naming conventions (see large comment in 62*c3fb76b9STaylor Simpsonhex_common.py) to determine the signature of the helper function. Here are the 63*c3fb76b9STaylor Simpsonresults for A2_add 64*c3fb76b9STaylor Simpson 65*c3fb76b9STaylor Simpsonhelper_protos_generated.h.inc 66*c3fb76b9STaylor Simpson DEF_HELPER_3(A2_add, s32, env, s32, s32) 67*c3fb76b9STaylor Simpson 68*c3fb76b9STaylor Simpsontcg_funcs_generated.c.inc 69*c3fb76b9STaylor Simpson static void generate_A2_add( 70*c3fb76b9STaylor Simpson CPUHexagonState *env, 71*c3fb76b9STaylor Simpson DisasContext *ctx, 72*c3fb76b9STaylor Simpson Insn *insn, 73*c3fb76b9STaylor Simpson Packet *pkt) 74*c3fb76b9STaylor Simpson { 75*c3fb76b9STaylor Simpson TCGv RdV = tcg_temp_local_new(); 76*c3fb76b9STaylor Simpson const int RdN = insn->regno[0]; 77*c3fb76b9STaylor Simpson TCGv RsV = hex_gpr[insn->regno[1]]; 78*c3fb76b9STaylor Simpson TCGv RtV = hex_gpr[insn->regno[2]]; 79*c3fb76b9STaylor Simpson gen_helper_A2_add(RdV, cpu_env, RsV, RtV); 80*c3fb76b9STaylor Simpson gen_log_reg_write(RdN, RdV); 81*c3fb76b9STaylor Simpson ctx_log_reg_write(ctx, RdN); 82*c3fb76b9STaylor Simpson tcg_temp_free(RdV); 83*c3fb76b9STaylor Simpson } 84*c3fb76b9STaylor Simpson 85*c3fb76b9STaylor Simpsonhelper_funcs_generated.c.inc 86*c3fb76b9STaylor Simpson int32_t HELPER(A2_add)(CPUHexagonState *env, int32_t RsV, int32_t RtV) 87*c3fb76b9STaylor Simpson { 88*c3fb76b9STaylor Simpson uint32_t slot __attribute__((unused)) = 4; 89*c3fb76b9STaylor Simpson int32_t RdV = 0; 90*c3fb76b9STaylor Simpson { RdV=RsV+RtV;} 91*c3fb76b9STaylor Simpson return RdV; 92*c3fb76b9STaylor Simpson } 93*c3fb76b9STaylor Simpson 94*c3fb76b9STaylor SimpsonNote that generate_A2_add updates the disassembly context to be processed 95*c3fb76b9STaylor Simpsonwhen the packet commits (see "Packet Semantics" below). 96*c3fb76b9STaylor Simpson 97*c3fb76b9STaylor SimpsonThe generator checks for fGEN_TCG_<tag> macro. This allows us to generate 98*c3fb76b9STaylor SimpsonTCG code instead of a call to the helper. If defined, the macro takes 1 99*c3fb76b9STaylor Simpsonargument. 100*c3fb76b9STaylor Simpson C semantics (aka short code) 101*c3fb76b9STaylor Simpson 102*c3fb76b9STaylor SimpsonThis allows the code generator to override the auto-generated code. In some 103*c3fb76b9STaylor Simpsoncases this is necessary for correct execution. We can also override for 104*c3fb76b9STaylor Simpsonfaster emulation. For example, calling a helper for add is more expensive 105*c3fb76b9STaylor Simpsonthan generating a TCG add operation. 106*c3fb76b9STaylor Simpson 107*c3fb76b9STaylor SimpsonThe gen_tcg.h file has any overrides. For example, we could write 108*c3fb76b9STaylor Simpson #define fGEN_TCG_A2_add(GENHLPR, SHORTCODE) \ 109*c3fb76b9STaylor Simpson tcg_gen_add_tl(RdV, RsV, RtV) 110*c3fb76b9STaylor Simpson 111*c3fb76b9STaylor SimpsonThe instruction semantics C code relies heavily on macros. In cases where the 112*c3fb76b9STaylor SimpsonC semantics are specified only with macros, we can override the default with 113*c3fb76b9STaylor Simpsonthe short semantics option and #define the macros to generate TCG code. One 114*c3fb76b9STaylor Simpsonexample is L2_loadw_locked: 115*c3fb76b9STaylor Simpson Instruction tag L2_loadw_locked 116*c3fb76b9STaylor Simpson Assembly syntax "Rd32=memw_locked(Rs32)" 117*c3fb76b9STaylor Simpson Instruction semantics "{ fEA_REG(RsV); fLOAD_LOCKED(1,4,u,EA,RdV) }" 118*c3fb76b9STaylor Simpson 119*c3fb76b9STaylor SimpsonIn gen_tcg.h, we use the shortcode 120*c3fb76b9STaylor Simpson#define fGEN_TCG_L2_loadw_locked(SHORTCODE) \ 121*c3fb76b9STaylor Simpson SHORTCODE 122*c3fb76b9STaylor Simpson 123*c3fb76b9STaylor SimpsonThere are also cases where we brute force the TCG code generation. 124*c3fb76b9STaylor SimpsonInstructions with multiple definitions are examples. These require special 125*c3fb76b9STaylor Simpsonhandling because qemu helpers can only return a single value. 126*c3fb76b9STaylor Simpson 127*c3fb76b9STaylor SimpsonIn addition to instruction semantics, we use a generator to create the decode 128*c3fb76b9STaylor Simpsontree. This generation is also a two step process. The first step is to run 129*c3fb76b9STaylor Simpsontarget/hexagon/gen_dectree_import.c to produce 130*c3fb76b9STaylor Simpson <BUILD_DIR>/target/hexagon/iset.py 131*c3fb76b9STaylor SimpsonThis file is imported by target/hexagon/dectree.py to produce 132*c3fb76b9STaylor Simpson <BUILD_DIR>/target/hexagon/dectree_generated.h.inc 133*c3fb76b9STaylor Simpson 134*c3fb76b9STaylor Simpson*** Key Files *** 135*c3fb76b9STaylor Simpson 136*c3fb76b9STaylor Simpsoncpu.h 137*c3fb76b9STaylor Simpson 138*c3fb76b9STaylor SimpsonThis file contains the definition of the CPUHexagonState struct. It is the 139*c3fb76b9STaylor Simpsonruntime information for each thread and contains stuff like the GPR and 140*c3fb76b9STaylor Simpsonpredicate registers. 141*c3fb76b9STaylor Simpson 142*c3fb76b9STaylor Simpsonmacros.h 143*c3fb76b9STaylor Simpson 144*c3fb76b9STaylor SimpsonThe Hexagon arch lib relies heavily on macros for the instruction semantics. 145*c3fb76b9STaylor SimpsonThis is a great advantage for qemu because we can override them for different 146*c3fb76b9STaylor Simpsonpurposes. You will also notice there are sometimes two definitions of a macro. 147*c3fb76b9STaylor SimpsonThe QEMU_GENERATE variable determines whether we want the macro to generate TCG 148*c3fb76b9STaylor Simpsoncode. If QEMU_GENERATE is not defined, we want the macro to generate vanilla 149*c3fb76b9STaylor SimpsonC code that will work in the helper implementation. 150*c3fb76b9STaylor Simpson 151*c3fb76b9STaylor Simpsontranslate.c 152*c3fb76b9STaylor Simpson 153*c3fb76b9STaylor SimpsonThe functions in this file generate TCG code for a translation block. Some 154*c3fb76b9STaylor Simpsonimportant functions in this file are 155*c3fb76b9STaylor Simpson 156*c3fb76b9STaylor Simpson gen_start_packet - initialize the data structures for packet semantics 157*c3fb76b9STaylor Simpson gen_commit_packet - commit the register writes, stores, etc for a packet 158*c3fb76b9STaylor Simpson decode_and_translate_packet - disassemble a packet and generate code 159*c3fb76b9STaylor Simpson 160*c3fb76b9STaylor Simpsongenptr.c 161*c3fb76b9STaylor Simpsongen_tcg.h 162*c3fb76b9STaylor Simpson 163*c3fb76b9STaylor SimpsonThese files create a function for each instruction. It is mostly composed of 164*c3fb76b9STaylor SimpsonfGEN_TCG_<tag> definitions followed by including tcg_funcs_generated.c.inc. 165*c3fb76b9STaylor Simpson 166*c3fb76b9STaylor Simpsonop_helper.c 167*c3fb76b9STaylor Simpson 168*c3fb76b9STaylor SimpsonThis file contains the implementations of all the helpers. There are a few 169*c3fb76b9STaylor Simpsongeneral purpose helpers, but most of them are generated by including 170*c3fb76b9STaylor Simpsonhelper_funcs_generated.c.inc. There are also several helpers used for debugging. 171*c3fb76b9STaylor Simpson 172*c3fb76b9STaylor Simpson 173*c3fb76b9STaylor Simpson*** Packet Semantics *** 174*c3fb76b9STaylor Simpson 175*c3fb76b9STaylor SimpsonVLIW packet semantics differ from serial semantics in that all input operands 176*c3fb76b9STaylor Simpsonare read, then the operations are performed, then all the results are written. 177*c3fb76b9STaylor SimpsonFor exmaple, this packet performs a swap of registers r0 and r1 178*c3fb76b9STaylor Simpson { r0 = r1; r1 = r0 } 179*c3fb76b9STaylor SimpsonNote that the result is different if the instructions are executed serially. 180*c3fb76b9STaylor Simpson 181*c3fb76b9STaylor SimpsonPacket semantics dictate that we defer any changes of state until the entire 182*c3fb76b9STaylor Simpsonpacket is committed. We record the results of each instruction in a side data 183*c3fb76b9STaylor Simpsonstructure, and update the visible processor state when we commit the packet. 184*c3fb76b9STaylor Simpson 185*c3fb76b9STaylor SimpsonThe data structures are divided between the runtime state and the translation 186*c3fb76b9STaylor Simpsoncontext. 187*c3fb76b9STaylor Simpson 188*c3fb76b9STaylor SimpsonDuring the TCG generation (see translate.[ch]), we use the DisasContext to 189*c3fb76b9STaylor Simpsontrack what needs to be done during packet commit. Here are the relevant 190*c3fb76b9STaylor Simpsonfields 191*c3fb76b9STaylor Simpson 192*c3fb76b9STaylor Simpson reg_log list of registers written 193*c3fb76b9STaylor Simpson reg_log_idx index into ctx_reg_log 194*c3fb76b9STaylor Simpson pred_log list of predicates written 195*c3fb76b9STaylor Simpson pred_log_idx index into ctx_pred_log 196*c3fb76b9STaylor Simpson store_width width of stores (indexed by slot) 197*c3fb76b9STaylor Simpson 198*c3fb76b9STaylor SimpsonDuring runtime, the following fields in CPUHexagonState (see cpu.h) are used 199*c3fb76b9STaylor Simpson 200*c3fb76b9STaylor Simpson new_value new value of a given register 201*c3fb76b9STaylor Simpson reg_written boolean indicating if register was written 202*c3fb76b9STaylor Simpson new_pred_value new value of a predicate register 203*c3fb76b9STaylor Simpson pred_written boolean indicating if predicate was written 204*c3fb76b9STaylor Simpson mem_log_stores record of the stores (indexed by slot) 205*c3fb76b9STaylor Simpson 206*c3fb76b9STaylor Simpson*** Debugging *** 207*c3fb76b9STaylor Simpson 208*c3fb76b9STaylor SimpsonYou can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in 209*c3fb76b9STaylor Simpsoninternal.h. This will stream a lot of information as it generates TCG and 210*c3fb76b9STaylor Simpsonexecutes the code. 211*c3fb76b9STaylor Simpson 212*c3fb76b9STaylor SimpsonTo track down nasty issues with Hexagon->TCG generation, we compare the 213*c3fb76b9STaylor Simpsonexecution results with actual hardware running on a Hexagon Linux target. 214*c3fb76b9STaylor SimpsonRun qemu with the "-d cpu" option. Then, we can diff the results and figure 215*c3fb76b9STaylor Simpsonout where qemu and hardware behave differently. 216*c3fb76b9STaylor Simpson 217*c3fb76b9STaylor SimpsonThe stacks are located at different locations. We handle this by changing 218*c3fb76b9STaylor Simpsonenv->stack_adjust in translate.c. First, set this to zero and run qemu. 219*c3fb76b9STaylor SimpsonThen, change env->stack_adjust to the difference between the two stack 220*c3fb76b9STaylor Simpsonlocations. Then rebuild qemu and run again. That will produce a very 221*c3fb76b9STaylor Simpsonclean diff. 222*c3fb76b9STaylor Simpson 223*c3fb76b9STaylor SimpsonHere are some handy places to set breakpoints 224*c3fb76b9STaylor Simpson 225*c3fb76b9STaylor Simpson At the call to gen_start_packet for a given PC (note that the line number 226*c3fb76b9STaylor Simpson might change in the future) 227*c3fb76b9STaylor Simpson br translate.c:602 if ctx->base.pc_next == 0xdeadbeef 228*c3fb76b9STaylor Simpson The helper function for each instruction is named helper_<TAG>, so here's 229*c3fb76b9STaylor Simpson an example that will set a breakpoint at the start 230*c3fb76b9STaylor Simpson br helper_A2_add 231*c3fb76b9STaylor Simpson If you have the HEX_DEBUG macro set, the following will be useful 232*c3fb76b9STaylor Simpson At the start of execution of a packet for a given PC 233*c3fb76b9STaylor Simpson br helper_debug_start_packet if env->gpr[41] == 0xdeadbeef 234*c3fb76b9STaylor Simpson At the end of execution of a packet for a given PC 235*c3fb76b9STaylor Simpson br helper_debug_commit_end if env->this_PC == 0xdeadbeef 236