Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
README.rst | H A D | 25-Sep-2024 | 27.6 KiB | 715 | 536 | |
idef-parser.h | H A D | 09-Jun-2024 | 9.3 KiB | 242 | 125 | |
idef-parser.lex | H A D | 18-May-2023 | 23.2 KiB | 476 | 444 | |
idef-parser.y | H A D | 08-Aug-2024 | 28.9 KiB | 913 | 801 | |
macros.h.inc | H A D | 25-Sep-2024 | 5.3 KiB | 132 | 114 | |
parser-helpers.c | H A D | 09-Jun-2024 | 70.3 KiB | 2,149 | 1,757 | |
parser-helpers.h | H A D | 09-Jun-2024 | 11 KiB | 354 | 223 | |
prepare | H A D | 28-Nov-2023 | 809 | 25 | 3 |
README.rst
1 Hexagon ISA instruction definitions to tinycode generator compiler 2 ------------------------------------------------------------------ 3 4 idef-parser is a small compiler able to translate the Hexagon ISA description 5 language into tinycode generator code, that can be easily integrated into QEMU. 6 7 Compilation Example 8 ------------------- 9 10 To better understand the scope of the idef-parser, we'll explore an applicative 11 example. Let's start by one of the simplest Hexagon instruction: the ``add``. 12 13 The ISA description language represents the ``add`` instruction as 14 follows: 15 16 .. code:: c 17 18 A2_add(RdV, in RsV, in RtV) { 19 { RdV=RsV+RtV;} 20 } 21 22 idef-parser will compile the above code into the following code: 23 24 .. code:: c 25 26 /* A2_add */ 27 void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV, 28 TCGv_i32 RsV, TCGv_i32 RtV) 29 /* { RdV=RsV+RtV;} */ 30 { 31 TCGv_i32 tmp_0 = tcg_temp_new_i32(); 32 tcg_gen_add_i32(tmp_0, RsV, RtV); 33 tcg_gen_mov_i32(RdV, tmp_0); 34 } 35 36 The output of the compilation process will be a function, containing the 37 tinycode generator code, implementing the correct semantics. That function will 38 not access any global variable, because all the accessed data structures will be 39 passed explicitly as function parameters. Among the passed parameters we will 40 have TCGv (tinycode variables) representing the input and output registers of 41 the architecture, integers representing the immediates that come from the code, 42 and other data structures which hold information about the disassemblation 43 context (``DisasContext`` struct). 44 45 Let's begin by describing the input code. The ``add`` instruction is associated 46 with a unique identifier, in this case ``A2_add``, which allows to distinguish 47 variants of the same instruction, and expresses the class to which the 48 instruction belongs, in this case ``A2`` corresponds to the Hexagon 49 ``ALU32/ALU`` instruction subclass. 50 51 After the instruction identifier, we have a series of parameters that represents 52 TCG variables that will be passed to the generated function. Parameters marked 53 with ``in`` are already initialized, while the others are output parameters. 54 55 We will leverage this information to infer several information: 56 57 - Fill in the output function signature with the correct TCGv registers 58 - Fill in the output function signature with the immediate integers 59 - Keep track of which registers, among the declared one, have been 60 initialized 61 62 Let's now observe the actual instruction description code, in this case: 63 64 .. code:: c 65 66 { RdV=RsV+RtV;} 67 68 This code is composed by a subset of the C syntax, and is the result of the 69 application of some macro definitions contained in the ``macros.h`` file. 70 71 This file is used to reduce the complexity of the input language where complex 72 variants of similar constructs can be mapped to a unique primitive, so that the 73 idef-parser has to handle a lower number of computation primitives. 74 75 As you may notice, the description code modifies the registers which have been 76 declared by the declaration statements. In this case all the three registers 77 will be declared, ``RsV`` and ``RtV`` will also be read and ``RdV`` will be 78 written. 79 80 Now let's have a quick look at the generated code, line by line. 81 82 :: 83 84 TCGv_i32 tmp_0 = tcg_temp_new_i32(); 85 86 This code starts by declaring a temporary TCGv to hold the result from the sum 87 operation. 88 89 :: 90 91 tcg_gen_add_i32(tmp_0, RsV, RtV); 92 93 Then, we are generating the sum tinycode operator between the selected 94 registers, storing the result in the just declared temporary. 95 96 :: 97 98 tcg_gen_mov_i32(RdV, tmp_0); 99 100 The result of the addition is now stored in the temporary, we move it into the 101 correct destination register. This code may seem inefficient, but QEMU will 102 perform some optimizations on the tinycode, reducing the unnecessary copy. 103 104 Parser Input 105 ------------ 106 107 Before moving on to the structure of idef-parser itself, let us spend some words 108 on its' input. There are two preprocessing steps applied to the generated 109 instruction semantics in ``semantics_generated.pyinc`` that we need to consider. 110 Firstly, 111 112 :: 113 114 gen_idef_parser_funcs.py 115 116 which takes instruction semantics in ``semantics_generated.pyinc`` to C-like 117 pseudo code, output into ``idef_parser_input.h.inc``. For instance, the 118 ``J2_jumpr`` instruction which jumps to an address stored in a register 119 argument. This is instruction is defined as 120 121 :: 122 123 SEMANTICS( \ 124 "J2_jumpr", \ 125 "jumpr Rs32", \ 126 """{fJUMPR(RsN,RsV,COF_TYPE_JUMPR);}""" \ 127 ) 128 129 in ``semantics_generated.pyinc``. Running ``gen_idef_parser_funcs.py`` 130 we obtain the pseudo code 131 132 :: 133 134 J2_jumpr(in RsV) { 135 {fJUMPR(RsN,RsV,COF_TYPE_JUMPR);} 136 } 137 138 with macros such as ``fJUMPR`` intact. 139 140 The second step is to expand macros into a form suitable for our parser. 141 These macros are defined in ``idef-parser/macros.h.inc`` and the step is 142 carried out by the ``prepare`` script which runs the C preprocessor on 143 ``idef_parser_input.h.inc`` to produce 144 ``idef_parser_input.preprocessed.h.inc``. 145 146 To finish the above example, after preprocessing ``J2_jumpr`` we obtain 147 148 :: 149 150 J2_jumpr(in RsV) { 151 {(PC = RsV);} 152 } 153 154 where ``fJUMPR(RsN,RsV,COF_TYPE_JUMPR);`` was expanded to ``(PC = RsV)``, 155 signifying a write to the Program Counter ``PC``. Note, that ``PC`` in 156 this expression is not a variable in the strict C sense since it is not 157 declared anywhere, but rather a symbol which is easy to match in 158 idef-parser later on. 159 160 Parser Structure 161 ---------------- 162 163 The idef-parser is built using the ``flex`` and ``bison``. 164 165 ``flex`` is used to split the input string into tokens, each described using a 166 regular expression. The token description is contained in the 167 ``idef-parser.lex`` source file. The flex-generated scanner takes care also to 168 extract from the input text other meaningful information, e.g., the numerical 169 value in case of an immediate constant, and decorates the token with the 170 extracted information. 171 172 ``bison`` is used to generate the actual parser, starting from the parsing 173 description contained in the ``idef-parser.y`` file. The generated parser 174 executes the ``main`` function at the end of the ``idef-parser.y`` file, which 175 opens input and output files, creates the parsing context, and eventually calls 176 the ``yyparse()`` function, which starts the execution of the LALR(1) parser 177 (see `Wikipedia <https://en.wikipedia.org/wiki/LALR_parser>`__ for more 178 information about LALR parsing techniques). The LALR(1) parser, whenever it has 179 to shift a token, calls the ``yylex()`` function, which is defined by the 180 flex-generated code, and reads the input file returning the next scanned token. 181 182 The tokens are mapped on the source language grammar, defined in the 183 ``idef-parser.y`` file to build a unique syntactic tree, according to the 184 specified operator precedences and associativity rules. 185 186 The grammar describes the whole file which contains the Hexagon instruction 187 descriptions, therefore it starts from the ``input`` nonterminal, which is a 188 list of instructions, each instruction is represented by the following grammar 189 rule, representing the structure of the input file shown above: 190 191 :: 192 193 instruction : INAME arguments code 194 | error 195 196 arguments : '(' ')' 197 | '(' argument_list ')'; 198 199 argument_list : argument_decl ',' argument_list 200 | argument_decl 201 202 argument_decl : REG 203 | PRED 204 | IN REG 205 | IN PRED 206 | IMM 207 | var 208 ; 209 210 code : '{' statements '}' 211 212 statements : statements statement 213 | statement 214 215 statement : control_statement 216 | var_decl ';' 217 | rvalue ';' 218 | code_block 219 | ';' 220 221 code_block : '{' statements '}' 222 | '{' '}' 223 224 With this initial portion of the grammar we are defining the instruction, its' 225 arguments, and its' statements. Each argument is defined by the 226 ``argument_decl`` rule, and can be either 227 228 :: 229 230 Description Example 231 ---------------------------------------- 232 output register RsV 233 output predicate register P0 234 input register in RsV 235 input predicate register in P0 236 immediate value 1234 237 local variable EA 238 239 Note, the only local variable allowed to be used as an argument is the effective 240 address ``EA``. Similarly, each statement can be a ``control_statement``, a 241 variable declaration such as ``int a;``, a code block, which is just a 242 bracket-enclosed list of statements, a ``';'``, which is a ``nop`` instruction, 243 and an ``rvalue ';'``. 244 245 Expressions 246 ~~~~~~~~~~~ 247 248 Allowed in the input code are C language expressions with a few exceptions 249 to simplify parsing. For instance, variable names such as ``RdV``, ``RssV``, 250 ``PdV``, ``CsV``, and other idiomatic register names from Hexagon, are 251 reserved specifically for register arguments. These arguments then map to 252 ``TCGv_i32`` or ``TCGv_i64`` depending on the register size. Similarly, ``UiV``, 253 ``riV``, etc. refer to immediate arguments and will map to C integers. 254 255 Also, as mentioned earlier, the names ``PC``, ``SP``, ``FP``, etc. are used to 256 refer to Hexagon registers such as the program counter, stack pointer, and frame 257 pointer seen here. Writes to these registers then correspond to assignments 258 ``PC = ...``, and reads correspond to uses of the variable ``PC``. 259 260 Moreover, another example of one such exception is the selective expansion of 261 macros present in ``macros.h``. As an example, consider the ``fABS`` macro which 262 in plain C is defined as 263 264 :: 265 266 #define fABS(A) (((A) < 0) ? (-(A)) : (A)) 267 268 and returns the absolute value of the argument ``A``. This macro is not included 269 in ``idef-parser/macros.h.inc`` and as such is not expanded and kept as a "call" 270 ``fABS(...)``. Reason being, that ``fABS`` is easier to match and map to 271 ``tcg_gen_abs_<width>``, compared to the full ternary expression above. Loads of 272 macros in ``macros.h`` are kept unexpanded to aid in parsing, as seen in the 273 example above, for more information see ``idef-parser/idef-parser.lex``. 274 275 Finally, in mapping these input expressions to tinycode generators, idef-parser 276 tries to perform as much as possible in plain C. Such as, performing binary 277 operations in C instead of tinycode generators, thus effectively constant 278 folding the expression. 279 280 Variables and Variable Declarations 281 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 282 283 Similarly to C, variables in the input code must be explicitly declared, such as 284 ``int var1;`` which declares an uninitialized variable ``var1``. Initialization 285 ``int var2 = 0;`` is also allowed and behaves as expected. In tinycode 286 generators the previous declarations are mapped to 287 288 :: 289 290 int var1; -> TCGv_i32 var1 = tcg_temp_new_i32(); 291 292 int var2 = 0; -> TCGv_i32 var1 = tcg_temp_new_i32(); 293 tcg_gen_movi_i32(j, ((int64_t) 0ULL)); 294 295 which are later automatically freed at the end of the function they're declared 296 in. Contrary to C, we only allow variables to be declared with an integer type 297 specified in the following table (without permutation of keywords) 298 299 :: 300 301 type bit-width signedness 302 ---------------------------------------------------------- 303 int 32 signed 304 signed 305 signed int 306 307 unsigned 32 unsigned 308 unsigned int 309 310 long 64 signed 311 long int 312 signed long 313 signed long int 314 315 unsigned long 64 unsigned 316 unsigned long int 317 318 long long 64 signed 319 long long int 320 signed long long 321 signed long long int 322 323 unsigned long long 64 unsigned 324 unsigned long long int 325 326 size[1,2,4,8][s,u]_t 8-64 signed or unsigned 327 328 In idef-parser, variable names are matched by a generic ``VARID`` token, 329 which will feature the variable name as a decoration. For a variable declaration 330 idef-parser calls ``gen_varid_allocate`` with the ``VARID`` token to save the 331 name, size, and bit width of the newly declared variable. In addition, this 332 function also ensures that variables aren't declared multiple times, and prints 333 and error message if that is the case. Upon use of a variable, the ``VARID`` 334 token is used to lookup the size and bit width of the variable. 335 336 Type System 337 ~~~~~~~~~~~ 338 339 idef-parser features a simple type system which is used to correctly implement 340 the signedness and bit width of the operations. 341 342 The type of each ``rvalue`` is determined by two attributes: its bit width 343 (``unsigned bit_width``) and its signedness (``HexSignedness signedness``). 344 345 For each operation, the type of ``rvalue``\ s influence the way in which the 346 operands are handled and emitted. For example a right shift between signed 347 operators will be an arithmetic shift, while one between unsigned operators 348 will be a logical shift. If one of the two operands is signed, and the other 349 is unsigned, the operation will be signed. 350 351 The bit width also influences the outcome of the operations, in particular while 352 the input languages features a fine granularity type system, with types of 8, 353 16, 32, 64 (and more for vectorial instructions) bits, the tinycode only 354 features 32 and 64 bit widths. We propagate as much as possible the fine 355 granularity type, until the value has to be used inside an operation between 356 ``rvalue``\ s; in that case if one of the two operands is greater than 32 bits 357 we promote the whole operation to 64 bit, taking care of properly extending the 358 two operands. Fortunately, the most critical instructions already feature 359 explicit casts and zero/sign extensions which are properly propagated down to 360 our parser. 361 362 The combination of ``rvalue``\ s are handled through the use of the 363 ``gen_bin_op`` and ``gen_bin_cmp`` helper functions. These two functions handle 364 the appropriate compile-time or run-time emission of operations to perform the 365 required computation. 366 367 Control Statements 368 ~~~~~~~~~~~~~~~~~~ 369 370 ``control_statement``\ s are all the statements which modify the order of 371 execution of the generated code according to input parameters. They are expanded 372 by the following grammar rule: 373 374 :: 375 376 control_statement : frame_check 377 | cancel_statement 378 | if_statement 379 | for_statement 380 | fpart1_statement 381 382 ``if_statement``\ s require the emission of labels and branch instructions which 383 effectively perform conditional jumps (``tcg_gen_brcondi``) according to the 384 value of an expression. Note, the tinycode generators we produce for conditional 385 statements do not perfectly mirror what would be expected in C, for instance we 386 do not reproduce short-circuiting of the ``&&`` operator, and use of the ``||`` 387 operator is disallowed. All the predicated instructions, and in general all the 388 instructions where there could be alternative values assigned to an ``lvalue``, 389 like C-style ternary expressions: 390 391 :: 392 393 rvalue : rvalue QMARK rvalue COLON rvalue 394 395 are handled using the conditional move tinycode instruction 396 (``tcg_gen_movcond``), which avoids the additional complexity of managing labels 397 and jumps. 398 399 Instead, regarding the ``for`` loops, exploiting the fact that they always 400 iterate on immediate values, therefore their iteration ranges are always known 401 at compile time, we implemented those emitting plain C ``for`` loops. This is 402 possible because the loops will be executed in the QEMU code, leading to the 403 consequential unrolling of the for loop, since the tinycode generator 404 instructions will be executed multiple times, and the respective generated 405 tinycode will represent the unrolled execution of the loop. 406 407 Parsing Context 408 ~~~~~~~~~~~~~~~ 409 410 All the helper functions in ``idef-parser.y`` carry two fixed parameters, which 411 are the parsing context ``c`` and the ``YYLLOC`` location information. The 412 context is explicitly passed to all the functions because the parser we generate 413 is a reentrant one, meaning that it does not have any global variable, and 414 therefore the instruction compilation could easily be parallelized in the 415 future. Finally for each rule we propagate information about the location of the 416 involved tokens to generate pretty error reporting, able to highlight the 417 portion of the input code which generated each error. 418 419 Debugging 420 --------- 421 422 Developing the idef-parser can lead to two types of errors: compile-time errors 423 and parsing errors. 424 425 Compile-time errors in Bison-generated parsers are usually due to conflicts in 426 the described grammar. Conflicts forbid the grammar to produce a unique 427 derivation tree, thus must be solved (except for the dangling else problem, 428 which is marked as expected through the ``%expect 1`` Bison option). 429 430 For solving conflicts you need a basic understanding of `shift-reduce conflicts 431 <https://www.gnu.org/software/Bison/manual/html_node/Shift_002fReduce.html>`__ 432 and `reduce-reduce conflicts 433 <https://www.gnu.org/software/Bison/manual/html_node/Reduce_002fReduce.html>`__, 434 then, if you are using a Bison version > 3.7.1 you can ask Bison to generate 435 some counterexamples which highlight ambiguous derivations, passing the 436 ``-Wcex`` option to Bison. In general shift/reduce conflicts are solved by 437 redesigning the grammar in an unambiguous way or by setting the token priority 438 correctly, while reduce/reduce conflicts are solved by redesigning the 439 interested part of the grammar. 440 441 Run-time errors can be divided between lexing and parsing errors, lexing errors 442 are hard to detect, since the ``var`` token will catch everything which is not 443 caught by other tokens, but easy to fix, because most of the time a simple 444 regex editing will be enough. 445 446 idef-parser features a fancy parsing error reporting scheme, which for each 447 parsing error reports the fragment of the input text which was involved in the 448 parsing rule that generated an error. 449 450 Implementing an instruction goes through several sequential steps, here are some 451 suggestions to make each instruction proceed to the next step. 452 453 - not-emitted 454 455 Means that the parsing of the input code relative to that instruction failed, 456 this could be due to a lexical error or to some mismatch between the order of 457 valid tokens and a parser rule. You should check that tokens are correctly 458 identified and mapped, and that there is a rule matching the token sequence 459 that you need to parse. 460 461 - emitted 462 463 This instruction class contains all the instructions which are emitted but 464 fail to compile when included in QEMU. The compilation errors are shown by 465 the QEMU building process and will lead to fixing the bug. Most common 466 errors regard the mismatch of parameters for tinycode generator functions, 467 which boil down to errors in the idef-parser type system. 468 469 - compiled 470 471 These instruction generate valid tinycode generator code, which however fail 472 the QEMU or the harness tests, these cases must be handled manually by 473 looking into the failing tests and looking at the generated tinycode 474 generator instruction and at the generated tinycode itself. Tip: handle the 475 failing harness tests first, because they usually feature only a single 476 instruction, thus will require less execution trace navigation. If a 477 multi-threaded test fail, fixing all the other tests will be the easier 478 option, hoping that the multi-threaded one will be indirectly fixed. 479 480 An example of debugging this type of failure is provided in the following 481 section. 482 483 - tests-passed 484 485 This is the final goal for each instruction, meaning that the instruction 486 passes the test suite. 487 488 Another approach to fix QEMU system test, where many instructions might fail, is 489 to compare the execution trace of your implementation with the reference 490 implementations already present in QEMU. To do so you should obtain a QEMU build 491 where the instruction pass the test, and run it with the following command: 492 493 :: 494 495 sudo unshare -p sudo -u <USER> bash -c \ 496 'env -i <qemu-hexagon full path> -d cpu <TEST>' 497 498 And do the same for your implementation, the generated execution traces will be 499 inherently aligned and can be inspected for behavioral differences using the 500 ``diff`` tool. 501 502 Example of debugging erroneous tinycode generator code 503 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 504 505 The goal of this section is to provide a complete example of debugging 506 incorrectly emitted tinycode generator for a single instruction. 507 508 Let's first introduce a bug in the tinycode generator of the ``A2_add`` 509 instruction, 510 511 :: 512 513 void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV, 514 TCGv_i32 RsV, TCGv_i32 RtV) 515 /* RdV=RsV+RtV;} */ 516 { 517 TCGv_i32 tmp_0 = tcg_temp_new_i32(); 518 tcg_gen_add_i32(tmp_0, RsV, RsV); 519 tcg_gen_mov_i32(RdV, tmp_0); 520 } 521 522 Here the bug, albeit hard to spot, is in ``tcg_gen_add_i32(tmp_0, RsV, RsV);`` 523 where we compute ``RsV + RsV`` instead of ``RsV + RtV``, as would be expected. 524 This particular bug is a bit tricky to pinpoint when debugging, since the 525 ``A2_add`` instruction is so ubiquitous. As a result, pretty much all tests will 526 fail and therefore not provide a lot of information about the bug. 527 528 For example, let's run the ``check-tcg`` tests 529 530 :: 531 532 make check-tcg TIMEOUT=1200 \ 533 DOCKER_IMAGE=debian-hexagon-cross \ 534 ENGINE=podman V=1 \ 535 DOCKER_CROSS_CC_GUEST=hexagon-unknown-linux-musl-clang 536 537 In the output, we find a failure in the very first test case ``float_convs`` 538 due to a segmentation fault. Similarly, all harness and libc tests will fail as 539 well. At this point we have no clue where the actual bug lies, and need to start 540 ruling out instructions. As such a good starting point is to utilize the debug 541 options ``-d in_asm,cpu`` of QEMU to inspect the Hexagon instructions being run, 542 alongside the CPU state. We additionally need a working version of the emulator 543 to compare our buggy CPU state against, running 544 545 :: 546 547 meson configure -Dhexagon_idef_parser=false 548 549 will disable the idef-parser for all instructions and fallback on manual 550 tinycode generator overrides, or on helper function implementations. Recompiling 551 gives us ``qemu-hexagon`` which passes all tests. If ``qemu-hexagon-buggy`` is 552 our binary with the incorrect tinycode generators, we can compare the CPU state 553 between the two versions 554 555 :: 556 557 ./qemu-hexagon-buggy -d in_asm,cpu float_convs &> out_buggy 558 ./qemu-hexagon -d in_asm,cpu float_convs &> out_working 559 560 Looking at ``diff -u out_buggy out_working`` shows us that the CPU state begins 561 to diverge on line 141, with an incorrect value in the ``R1`` register 562 563 :: 564 565 @@ -138,7 +138,7 @@ 566 567 General Purpose Registers = { 568 r0 = 0x4100f9c0 569 - r1 = 0x00042108 570 + r1 = 0x00000000 571 r2 = 0x00021084 572 r3 = 0x00000000 573 r4 = 0x00000000 574 575 If we also look into ``out_buggy`` directly we can inspect the input assembly 576 which the caused the incorrect CPU state, around line 141 we find 577 578 :: 579 580 116 | ---------------- 581 117 | IN: _start_c 582 118 | 0x000210b0: 0xa09dc002 { allocframe(R29,#0x10):raw } 583 ... | ... 584 137 | 0x000210fc: 0x5a00c4aa { call PC+2388 } 585 138 | 586 139 | General Purpose Registers = { 587 140 | r0 = 0x4100fa70 588 141 | r1 = 0x00042108 589 142 | r2 = 0x00021084 590 143 | r3 = 0x00000000 591 592 Importantly, we see some Hexagon assembly followed by a dump of the CPU state, 593 now the CPU state is actually dumped before the input assembly above is ran. 594 As such, we are actually interested in the instructions ran before this. 595 596 Scrolling up a bit, we find 597 598 :: 599 600 54 | ---------------- 601 55 | IN: _start 602 56 | 0x00021088: 0x6a09c002 { R2 = C9/pc } 603 57 | 0x0002108c: 0xbfe2ff82 { R2 = add(R2,#0xfffffffc) } 604 58 | 0x00021090: 0x9182c001 { R1 = memw(R2+#0x0) } 605 59 | 0x00021094: 0xf302c101 { R1 = add(R2,R1) } 606 60 | 0x00021098: 0x7800c01e { R30 = #0x0 } 607 61 | 0x0002109c: 0x707dc000 { R0 = R29 } 608 62 | 0x000210a0: 0x763dfe1d { R29 = and(R29,#0xfffffff0) } 609 63 | 0x000210a4: 0xa79dfdfe { memw(R29+#0xfffffff8) = R29 } 610 64 | 0x000210a8: 0xbffdff1d { R29 = add(R29,#0xfffffff8) } 611 65 | 0x000210ac: 0x5a00c002 { call PC+4 } 612 66 | 613 67 | General Purpose Registers = { 614 68 | r0 = 0x00000000 615 69 | r1 = 0x00000000 616 70 | r2 = 0x00000000 617 71 | r3 = 0x00000000 618 619 Remember, the instructions on lines 56-65 are ran on the CPU state shown below 620 instructions, and as the CPU state has not diverged at this point, we know the 621 starting state is accurate. The bug must then lie within the instructions shown 622 here. Next we may notice that ``R1`` is only touched by lines 57 and 58, that is 623 by 624 625 :: 626 627 58 | 0x00021090: 0x9182c001 { R1 = memw(R2+#0x0) } 628 59 | 0x00021094: 0xf302c101 { R1 = add(R2,R1) } 629 630 Therefore, we are either dealing with an correct load instruction 631 ``R1 = memw(R2+#0x0)`` or with an incorrect add ``R1 = add(R2,R1)``. At this 632 point it might be easy enough to go directly to the emitted code for the 633 instructions mentioned and look for bugs, but we could also run 634 ``./qemu-heaxgon -d op,in_asm float_conv`` where we find for the following 635 tinycode for the Hexagon ``add`` instruction 636 637 :: 638 639 ---- 00021094 640 mov_i32 pkt_has_store_s1,$0x0 641 add_i32 tmp0,r2,r2 642 mov_i32 loc2,tmp0 643 mov_i32 new_r1,loc2 644 mov_i32 r1,new_r1 645 646 Here we have finally located our bug ``add_i32 tmp0,r2,r2``. 647 648 Limitations and Future Development 649 ---------------------------------- 650 651 The main limitation of the current parser is given by the syntax-driven nature 652 of the Bison-generated parsers. This has the severe implication of only being 653 able to generate code in the order of evaluation of the various rules, without, 654 in any case, being able to backtrack and alter the generated code. 655 656 An example limitation is highlighted by this statement of the input language: 657 658 :: 659 660 { (PsV==0xff) ? (PdV=0xff) : (PdV=0x00); } 661 662 This ternary assignment, when written in this form requires us to emit some 663 proper control flow statements, which emit a jump to the first or to the second 664 code block, whose implementation is extremely convoluted, because when matching 665 the ternary assignment, the code evaluating the two assignments will be already 666 generated. 667 668 Instead we pre-process that statement, making it become: 669 670 :: 671 672 { PdV = ((PsV==0xff)) ? 0xff : 0x00; } 673 674 Which can be easily matched by the following parser rules: 675 676 :: 677 678 statement | rvalue ';' 679 680 rvalue : rvalue QMARK rvalue COLON rvalue 681 | rvalue EQ rvalue 682 | LPAR rvalue RPAR 683 | assign_statement 684 | IMM 685 686 assign_statement : pred ASSIGN rvalue 687 688 Another example that highlight the limitation of the flex/bison parser can be 689 found even in the add operation we already saw: 690 691 :: 692 693 TCGv_i32 tmp_0 = tcg_temp_new_i32(); 694 tcg_gen_add_i32(tmp_0, RsV, RtV); 695 tcg_gen_mov_i32(RdV, tmp_0); 696 697 The fact that we cannot directly use ``RdV`` as the destination of the sum is a 698 consequence of the syntax-driven nature of the parser. In fact when we parse the 699 assignment, the ``rvalue`` token, representing the sum has already been reduced, 700 and thus its code emitted and unchangeable. We rely on the fact that QEMU will 701 optimize our code reducing the useless move operations and the relative 702 temporaries. 703 704 A possible improvement of the parser regards the support for vectorial 705 instructions and floating point instructions, which will require the extension 706 of the scanner, the parser, and a partial re-design of the type system, allowing 707 to build the vectorial semantics over the available vectorial tinycode generator 708 primitives. 709 710 A more radical improvement will use the parser, not to generate directly the 711 tinycode generator code, but to generate an intermediate representation like the 712 LLVM IR, which in turn could be compiled using the clang TCG backend. That code 713 could be furtherly optimized, overcoming the limitations of the syntax-driven 714 parsing and could lead to a more optimized generated code. 715