1Hexagon is Qualcomm's very long instruction word (VLIW) digital signal 2processor(DSP). 3 4The following versions of the Hexagon core are supported 5 Scalar core: v67 6 https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual 7 8We presented an overview of the project at the 2019 KVM Forum. 9 https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center 10 11*** Tour of the code *** 12 13The qemu-hexagon implementation is a combination of qemu and the Hexagon 14architecture library (aka archlib). The three primary directories with 15Hexagon-specific code are 16 17 qemu/target/hexagon 18 This has all the instruction and packet semantics 19 qemu/target/hexagon/imported 20 These files are imported with very little modification from archlib 21 *.idef Instruction semantics definition 22 macros.def Mapping of macros to instruction attributes 23 encode*.def Encoding patterns for each instruction 24 iclass.def Instruction class definitions used to determine 25 legal VLIW slots for each instruction 26 qemu/linux-user/hexagon 27 Helpers for loading the ELF file and making Linux system calls, 28 signals, etc 29 30We start with scripts that generate a bunch of include files. This 31is a two step process. The first step is to use the C preprocessor to expand 32macros inside the architecture definition files. This is done in 33target/hexagon/gen_semantics.c. This step produces 34 <BUILD_DIR>/target/hexagon/semantics_generated.pyinc. 35That file is consumed by the following python scripts to produce the indicated 36header files in <BUILD_DIR>/target/hexagon 37 gen_opcodes_def.py -> opcodes_def_generated.h.inc 38 gen_op_regs.py -> op_regs_generated.h.inc 39 gen_printinsn.py -> printinsn_generated.h.inc 40 gen_op_attribs.py -> op_attribs_generated.h.inc 41 gen_helper_protos.py -> helper_protos_generated.h.inc 42 gen_shortcode.py -> shortcode_generated.h.inc 43 gen_tcg_funcs.py -> tcg_funcs_generated.c.inc 44 gen_tcg_func_table.py -> tcg_func_table_generated.c.inc 45 gen_helper_funcs.py -> helper_funcs_generated.c.inc 46 47Qemu helper functions have 3 parts 48 DEF_HELPER declaration indicates the signature of the helper 49 gen_helper_<NAME> will generate a TCG call to the helper function 50 The helper implementation 51 52Here's an example of the A2_add instruction. 53 Instruction tag A2_add 54 Assembly syntax "Rd32=add(Rs32,Rt32)" 55 Instruction semantics "{ RdV=RsV+RtV;}" 56 57By convention, the operands are identified by letter 58 RdV is the destination register 59 RsV, RtV are source registers 60 61The generator uses the operand naming conventions (see large comment in 62hex_common.py) to determine the signature of the helper function. Here are the 63results for A2_add 64 65helper_protos_generated.h.inc 66 DEF_HELPER_3(A2_add, s32, env, s32, s32) 67 68tcg_funcs_generated.c.inc 69 static void generate_A2_add( 70 CPUHexagonState *env, 71 DisasContext *ctx, 72 Insn *insn, 73 Packet *pkt) 74 { 75 TCGv RdV = tcg_temp_local_new(); 76 const int RdN = insn->regno[0]; 77 TCGv RsV = hex_gpr[insn->regno[1]]; 78 TCGv RtV = hex_gpr[insn->regno[2]]; 79 gen_helper_A2_add(RdV, cpu_env, RsV, RtV); 80 gen_log_reg_write(RdN, RdV); 81 ctx_log_reg_write(ctx, RdN); 82 tcg_temp_free(RdV); 83 } 84 85helper_funcs_generated.c.inc 86 int32_t HELPER(A2_add)(CPUHexagonState *env, int32_t RsV, int32_t RtV) 87 { 88 uint32_t slot __attribute__((unused)) = 4; 89 int32_t RdV = 0; 90 { RdV=RsV+RtV;} 91 return RdV; 92 } 93 94Note that generate_A2_add updates the disassembly context to be processed 95when the packet commits (see "Packet Semantics" below). 96 97The generator checks for fGEN_TCG_<tag> macro. This allows us to generate 98TCG code instead of a call to the helper. If defined, the macro takes 1 99argument. 100 C semantics (aka short code) 101 102This allows the code generator to override the auto-generated code. In some 103cases this is necessary for correct execution. We can also override for 104faster emulation. For example, calling a helper for add is more expensive 105than generating a TCG add operation. 106 107The gen_tcg.h file has any overrides. For example, we could write 108 #define fGEN_TCG_A2_add(GENHLPR, SHORTCODE) \ 109 tcg_gen_add_tl(RdV, RsV, RtV) 110 111The instruction semantics C code relies heavily on macros. In cases where the 112C semantics are specified only with macros, we can override the default with 113the short semantics option and #define the macros to generate TCG code. One 114example is L2_loadw_locked: 115 Instruction tag L2_loadw_locked 116 Assembly syntax "Rd32=memw_locked(Rs32)" 117 Instruction semantics "{ fEA_REG(RsV); fLOAD_LOCKED(1,4,u,EA,RdV) }" 118 119In gen_tcg.h, we use the shortcode 120#define fGEN_TCG_L2_loadw_locked(SHORTCODE) \ 121 SHORTCODE 122 123There are also cases where we brute force the TCG code generation. 124Instructions with multiple definitions are examples. These require special 125handling because qemu helpers can only return a single value. 126 127In addition to instruction semantics, we use a generator to create the decode 128tree. This generation is also a two step process. The first step is to run 129target/hexagon/gen_dectree_import.c to produce 130 <BUILD_DIR>/target/hexagon/iset.py 131This file is imported by target/hexagon/dectree.py to produce 132 <BUILD_DIR>/target/hexagon/dectree_generated.h.inc 133 134*** Key Files *** 135 136cpu.h 137 138This file contains the definition of the CPUHexagonState struct. It is the 139runtime information for each thread and contains stuff like the GPR and 140predicate registers. 141 142macros.h 143 144The Hexagon arch lib relies heavily on macros for the instruction semantics. 145This is a great advantage for qemu because we can override them for different 146purposes. You will also notice there are sometimes two definitions of a macro. 147The QEMU_GENERATE variable determines whether we want the macro to generate TCG 148code. If QEMU_GENERATE is not defined, we want the macro to generate vanilla 149C code that will work in the helper implementation. 150 151translate.c 152 153The functions in this file generate TCG code for a translation block. Some 154important functions in this file are 155 156 gen_start_packet - initialize the data structures for packet semantics 157 gen_commit_packet - commit the register writes, stores, etc for a packet 158 decode_and_translate_packet - disassemble a packet and generate code 159 160genptr.c 161gen_tcg.h 162 163These files create a function for each instruction. It is mostly composed of 164fGEN_TCG_<tag> definitions followed by including tcg_funcs_generated.c.inc. 165 166op_helper.c 167 168This file contains the implementations of all the helpers. There are a few 169general purpose helpers, but most of them are generated by including 170helper_funcs_generated.c.inc. There are also several helpers used for debugging. 171 172 173*** Packet Semantics *** 174 175VLIW packet semantics differ from serial semantics in that all input operands 176are read, then the operations are performed, then all the results are written. 177For exmaple, this packet performs a swap of registers r0 and r1 178 { r0 = r1; r1 = r0 } 179Note that the result is different if the instructions are executed serially. 180 181Packet semantics dictate that we defer any changes of state until the entire 182packet is committed. We record the results of each instruction in a side data 183structure, and update the visible processor state when we commit the packet. 184 185The data structures are divided between the runtime state and the translation 186context. 187 188During the TCG generation (see translate.[ch]), we use the DisasContext to 189track what needs to be done during packet commit. Here are the relevant 190fields 191 192 reg_log list of registers written 193 reg_log_idx index into ctx_reg_log 194 pred_log list of predicates written 195 pred_log_idx index into ctx_pred_log 196 store_width width of stores (indexed by slot) 197 198During runtime, the following fields in CPUHexagonState (see cpu.h) are used 199 200 new_value new value of a given register 201 reg_written boolean indicating if register was written 202 new_pred_value new value of a predicate register 203 pred_written boolean indicating if predicate was written 204 mem_log_stores record of the stores (indexed by slot) 205 206*** Debugging *** 207 208You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in 209internal.h. This will stream a lot of information as it generates TCG and 210executes the code. 211 212To track down nasty issues with Hexagon->TCG generation, we compare the 213execution results with actual hardware running on a Hexagon Linux target. 214Run qemu with the "-d cpu" option. Then, we can diff the results and figure 215out where qemu and hardware behave differently. 216 217The stacks are located at different locations. We handle this by changing 218env->stack_adjust in translate.c. First, set this to zero and run qemu. 219Then, change env->stack_adjust to the difference between the two stack 220locations. Then rebuild qemu and run again. That will produce a very 221clean diff. 222 223Here are some handy places to set breakpoints 224 225 At the call to gen_start_packet for a given PC (note that the line number 226 might change in the future) 227 br translate.c:602 if ctx->base.pc_next == 0xdeadbeef 228 The helper function for each instruction is named helper_<TAG>, so here's 229 an example that will set a breakpoint at the start 230 br helper_A2_add 231 If you have the HEX_DEBUG macro set, the following will be useful 232 At the start of execution of a packet for a given PC 233 br helper_debug_start_packet if env->gpr[41] == 0xdeadbeef 234 At the end of execution of a packet for a given PC 235 br helper_debug_commit_end if env->this_PC == 0xdeadbeef 236