1Hexagon ISA instruction definitions to tinycode generator compiler 2------------------------------------------------------------------ 3 4idef-parser is a small compiler able to translate the Hexagon ISA description 5language into tinycode generator code, that can be easily integrated into QEMU. 6 7Compilation Example 8------------------- 9 10To better understand the scope of the idef-parser, we'll explore an applicative 11example. Let's start by one of the simplest Hexagon instruction: the ``add``. 12 13The ISA description language represents the ``add`` instruction as 14follows: 15 16.. code:: c 17 18 A2_add(RdV, in RsV, in RtV) { 19 { RdV=RsV+RtV;} 20 } 21 22idef-parser will compile the above code into the following code: 23 24.. code:: c 25 26 /* A2_add */ 27 void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV, 28 TCGv_i32 RsV, TCGv_i32 RtV) 29 /* { RdV=RsV+RtV;} */ 30 { 31 TCGv_i32 tmp_0 = tcg_temp_new_i32(); 32 tcg_gen_add_i32(tmp_0, RsV, RtV); 33 tcg_gen_mov_i32(RdV, tmp_0); 34 tcg_temp_free_i32(tmp_0); 35 } 36 37The output of the compilation process will be a function, containing the 38tinycode generator code, implementing the correct semantics. That function will 39not access any global variable, because all the accessed data structures will be 40passed explicitly as function parameters. Among the passed parameters we will 41have TCGv (tinycode variables) representing the input and output registers of 42the architecture, integers representing the immediates that come from the code, 43and other data structures which hold information about the disassemblation 44context (``DisasContext`` struct). 45 46Let's begin by describing the input code. The ``add`` instruction is associated 47with a unique identifier, in this case ``A2_add``, which allows to distinguish 48variants of the same instruction, and expresses the class to which the 49instruction belongs, in this case ``A2`` corresponds to the Hexagon 50``ALU32/ALU`` instruction subclass. 51 52After the instruction identifier, we have a series of parameters that represents 53TCG variables that will be passed to the generated function. Parameters marked 54with ``in`` are already initialized, while the others are output parameters. 55 56We will leverage this information to infer several information: 57 58- Fill in the output function signature with the correct TCGv registers 59- Fill in the output function signature with the immediate integers 60- Keep track of which registers, among the declared one, have been 61 initialized 62 63Let's now observe the actual instruction description code, in this case: 64 65.. code:: c 66 67 { RdV=RsV+RtV;} 68 69This code is composed by a subset of the C syntax, and is the result of the 70application of some macro definitions contained in the ``macros.h`` file. 71 72This file is used to reduce the complexity of the input language where complex 73variants of similar constructs can be mapped to a unique primitive, so that the 74idef-parser has to handle a lower number of computation primitives. 75 76As you may notice, the description code modifies the registers which have been 77declared by the declaration statements. In this case all the three registers 78will be declared, ``RsV`` and ``RtV`` will also be read and ``RdV`` will be 79written. 80 81Now let's have a quick look at the generated code, line by line. 82 83:: 84 85 TCGv_i32 tmp_0 = tcg_temp_new_i32(); 86 87This code starts by declaring a temporary TCGv to hold the result from the sum 88operation. 89 90:: 91 92 tcg_gen_add_i32(tmp_0, RsV, RtV); 93 94Then, we are generating the sum tinycode operator between the selected 95registers, storing the result in the just declared temporary. 96 97:: 98 99 tcg_gen_mov_i32(RdV, tmp_0); 100 101The result of the addition is now stored in the temporary, we move it into the 102correct destination register. This code may seem inefficient, but QEMU will 103perform some optimizations on the tinycode, reducing the unnecessary copy. 104 105:: 106 107 tcg_temp_free_i32(tmp_0); 108 109Finally, we free the temporary we used to hold the addition result. 110 111Parser Input 112------------ 113 114Before moving on to the structure of idef-parser itself, let us spend some words 115on its' input. There are two preprocessing steps applied to the generated 116instruction semantics in ``semantics_generated.pyinc`` that we need to consider. 117Firstly, 118 119:: 120 121 gen_idef_parser_funcs.py 122 123which takes instruction semantics in ``semantics_generated.pyinc`` to C-like 124pseudo code, output into ``idef_parser_input.h.inc``. For instance, the 125``J2_jumpr`` instruction which jumps to an address stored in a register 126argument. This is instruction is defined as 127 128:: 129 130 SEMANTICS( \ 131 "J2_jumpr", \ 132 "jumpr Rs32", \ 133 """{fJUMPR(RsN,RsV,COF_TYPE_JUMPR);}""" \ 134 ) 135 136in ``semantics_generated.pyinc``. Running ``gen_idef_parser_funcs.py`` 137we obtain the pseudo code 138 139:: 140 141 J2_jumpr(in RsV) { 142 {fJUMPR(RsN,RsV,COF_TYPE_JUMPR);} 143 } 144 145with macros such as ``fJUMPR`` intact. 146 147The second step is to expand macros into a form suitable for our parser. 148These macros are defined in ``idef-parser/macros.inc`` and the step is 149carried out by the ``prepare`` script which runs the C preprocessor on 150``idef_parser_input.h.inc`` to produce 151``idef_parser_input.preprocessed.h.inc``. 152 153To finish the above example, after preprocessing ``J2_jumpr`` we obtain 154 155:: 156 157 J2_jumpr(in RsV) { 158 {(PC = RsV);} 159 } 160 161where ``fJUMPR(RsN,RsV,COF_TYPE_JUMPR);`` was expanded to ``(PC = RsV)``, 162signifying a write to the Program Counter ``PC``. Note, that ``PC`` in 163this expression is not a variable in the strict C sense since it is not 164declared anywhere, but rather a symbol which is easy to match in 165idef-parser later on. 166 167Parser Structure 168---------------- 169 170The idef-parser is built using the ``flex`` and ``bison``. 171 172``flex`` is used to split the input string into tokens, each described using a 173regular expression. The token description is contained in the 174``idef-parser.lex`` source file. The flex-generated scanner takes care also to 175extract from the input text other meaningful information, e.g., the numerical 176value in case of an immediate constant, and decorates the token with the 177extracted information. 178 179``bison`` is used to generate the actual parser, starting from the parsing 180description contained in the ``idef-parser.y`` file. The generated parser 181executes the ``main`` function at the end of the ``idef-parser.y`` file, which 182opens input and output files, creates the parsing context, and eventually calls 183the ``yyparse()`` function, which starts the execution of the LALR(1) parser 184(see `Wikipedia <https://en.wikipedia.org/wiki/LALR_parser>`__ for more 185information about LALR parsing techniques). The LALR(1) parser, whenever it has 186to shift a token, calls the ``yylex()`` function, which is defined by the 187flex-generated code, and reads the input file returning the next scanned token. 188 189The tokens are mapped on the source language grammar, defined in the 190``idef-parser.y`` file to build a unique syntactic tree, according to the 191specified operator precedences and associativity rules. 192 193The grammar describes the whole file which contains the Hexagon instruction 194descriptions, therefore it starts from the ``input`` nonterminal, which is a 195list of instructions, each instruction is represented by the following grammar 196rule, representing the structure of the input file shown above: 197 198:: 199 200 instruction : INAME arguments code 201 | error 202 203 arguments : '(' ')' 204 | '(' argument_list ')'; 205 206 argument_list : argument_decl ',' argument_list 207 | argument_decl 208 209 argument_decl : REG 210 | PRED 211 | IN REG 212 | IN PRED 213 | IMM 214 | var 215 ; 216 217 code : '{' statements '}' 218 219 statements : statements statement 220 | statement 221 222 statement : control_statement 223 | var_decl ';' 224 | rvalue ';' 225 | code_block 226 | ';' 227 228 code_block : '{' statements '}' 229 | '{' '}' 230 231With this initial portion of the grammar we are defining the instruction, its' 232arguments, and its' statements. Each argument is defined by the 233``argument_decl`` rule, and can be either 234 235:: 236 237 Description Example 238 ---------------------------------------- 239 output register RsV 240 output predicate register P0 241 input register in RsV 242 input predicate register in P0 243 immediate value 1234 244 local variable EA 245 246Note, the only local variable allowed to be used as an argument is the effective 247address ``EA``. Similarly, each statement can be a ``control_statement``, a 248variable declaration such as ``int a;``, a code block, which is just a 249bracket-enclosed list of statements, a ``';'``, which is a ``nop`` instruction, 250and an ``rvalue ';'``. 251 252Expressions 253~~~~~~~~~~~ 254 255Allowed in the input code are C language expressions with a few exceptions 256to simplify parsing. For instance, variable names such as ``RdV``, ``RssV``, 257``PdV``, ``CsV``, and other idiomatic register names from Hexagon, are 258reserved specifically for register arguments. These arguments then map to 259``TCGv_i32`` or ``TCGv_i64`` depending on the register size. Similarly, ``UiV``, 260``riV``, etc. refer to immediate arguments and will map to C integers. 261 262Also, as mentioned earlier, the names ``PC``, ``SP``, ``FP``, etc. are used to 263refer to Hexagon registers such as the program counter, stack pointer, and frame 264pointer seen here. Writes to these registers then correspond to assignments 265``PC = ...``, and reads correspond to uses of the variable ``PC``. 266 267Moreover, another example of one such exception is the selective expansion of 268macros present in ``macros.h``. As an example, consider the ``fABS`` macro which 269in plain C is defined as 270 271:: 272 273 #define fABS(A) (((A) < 0) ? (-(A)) : (A)) 274 275and returns the absolute value of the argument ``A``. This macro is not included 276in ``idef-parser/macros.inc`` and as such is not expanded and kept as a "call" 277``fABS(...)``. Reason being, that ``fABS`` is easier to match and map to 278``tcg_gen_abs_<width>``, compared to the full ternary expression above. Loads of 279macros in ``macros.h`` are kept unexpanded to aid in parsing, as seen in the 280example above, for more information see ``idef-parser/idef-parser.lex``. 281 282Finally, in mapping these input expressions to tinycode generators, idef-parser 283tries to perform as much as possible in plain C. Such as, performing binary 284operations in C instead of tinycode generators, thus effectively constant 285folding the expression. 286 287Variables and Variable Declarations 288~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 289 290Similarly to C, variables in the input code must be explicitly declared, such as 291``int var1;`` which declares an uninitialized variable ``var1``. Initialization 292``int var2 = 0;`` is also allowed and behaves as expected. In tinycode 293generators the previous declarations are mapped to 294 295:: 296 297 int var1; -> TCGv_i32 var1 = tcg_temp_local_new_i32(); 298 299 int var2 = 0; -> TCGv_i32 var1 = tcg_temp_local_new_i32(); 300 tcg_gen_movi_i32(j, ((int64_t) 0ULL)); 301 302which are later automatically freed at the end of the function they're declared 303in. Contrary to C, we only allow variables to be declared with an integer type 304specified in the following table (without permutation of keywords) 305 306:: 307 308 type bit-width signedness 309 ---------------------------------------------------------- 310 int 32 signed 311 signed 312 signed int 313 314 unsigned 32 unsigned 315 unsigned int 316 317 long 64 signed 318 long int 319 signed long 320 signed long int 321 322 unsigned long 64 unsigned 323 unsigned long int 324 325 long long 64 signed 326 long long int 327 signed long long 328 signed long long int 329 330 unsigned long long 64 unsigned 331 unsigned long long int 332 333 size[1,2,4,8][s,u]_t 8-64 signed or unsigned 334 335In idef-parser, variable names are matched by a generic ``VARID`` token, 336which will feature the variable name as a decoration. For a variable declaration 337idef-parser calls ``gen_varid_allocate`` with the ``VARID`` token to save the 338name, size, and bit width of the newly declared variable. In addition, this 339function also ensures that variables aren't declared multiple times, and prints 340and error message if that is the case. Upon use of a variable, the ``VARID`` 341token is used to lookup the size and bit width of the variable. 342 343Type System 344~~~~~~~~~~~ 345 346idef-parser features a simple type system which is used to correctly implement 347the signedness and bit width of the operations. 348 349The type of each ``rvalue`` is determined by two attributes: its bit width 350(``unsigned bit_width``) and its signedness (``HexSignedness signedness``). 351 352For each operation, the type of ``rvalue``\ s influence the way in which the 353operands are handled and emitted. For example a right shift between signed 354operators will be an arithmetic shift, while one between unsigned operators 355will be a logical shift. If one of the two operands is signed, and the other 356is unsigned, the operation will be signed. 357 358The bit width also influences the outcome of the operations, in particular while 359the input languages features a fine granularity type system, with types of 8, 36016, 32, 64 (and more for vectorial instructions) bits, the tinycode only 361features 32 and 64 bit widths. We propagate as much as possible the fine 362granularity type, until the value has to be used inside an operation between 363``rvalue``\ s; in that case if one of the two operands is greater than 32 bits 364we promote the whole operation to 64 bit, taking care of properly extending the 365two operands. Fortunately, the most critical instructions already feature 366explicit casts and zero/sign extensions which are properly propagated down to 367our parser. 368 369The combination of ``rvalue``\ s are handled through the use of the 370``gen_bin_op`` and ``gen_bin_cmp`` helper functions. These two functions handle 371the appropriate compile-time or run-time emission of operations to perform the 372required computation. 373 374Control Statements 375~~~~~~~~~~~~~~~~~~ 376 377``control_statement``\ s are all the statements which modify the order of 378execution of the generated code according to input parameters. They are expanded 379by the following grammar rule: 380 381:: 382 383 control_statement : frame_check 384 | cancel_statement 385 | if_statement 386 | for_statement 387 | fpart1_statement 388 389``if_statement``\ s require the emission of labels and branch instructions which 390effectively perform conditional jumps (``tcg_gen_brcondi``) according to the 391value of an expression. Note, the tinycode generators we produce for conditional 392statements do not perfectly mirror what would be expected in C, for instance we 393do not reproduce short-circuiting of the ``&&`` operator, and use of the ``||`` 394operator is disallowed. All the predicated instructions, and in general all the 395instructions where there could be alternative values assigned to an ``lvalue``, 396like C-style ternary expressions: 397 398:: 399 400 rvalue : rvalue QMARK rvalue COLON rvalue 401 402are handled using the conditional move tinycode instruction 403(``tcg_gen_movcond``), which avoids the additional complexity of managing labels 404and jumps. 405 406Instead, regarding the ``for`` loops, exploiting the fact that they always 407iterate on immediate values, therefore their iteration ranges are always known 408at compile time, we implemented those emitting plain C ``for`` loops. This is 409possible because the loops will be executed in the QEMU code, leading to the 410consequential unrolling of the for loop, since the tinycode generator 411instructions will be executed multiple times, and the respective generated 412tinycode will represent the unrolled execution of the loop. 413 414Parsing Context 415~~~~~~~~~~~~~~~ 416 417All the helper functions in ``idef-parser.y`` carry two fixed parameters, which 418are the parsing context ``c`` and the ``YYLLOC`` location information. The 419context is explicitly passed to all the functions because the parser we generate 420is a reentrant one, meaning that it does not have any global variable, and 421therefore the instruction compilation could easily be parallelized in the 422future. Finally for each rule we propagate information about the location of the 423involved tokens to generate pretty error reporting, able to highlight the 424portion of the input code which generated each error. 425 426Debugging 427--------- 428 429Developing the idef-parser can lead to two types of errors: compile-time errors 430and parsing errors. 431 432Compile-time errors in Bison-generated parsers are usually due to conflicts in 433the described grammar. Conflicts forbid the grammar to produce a unique 434derivation tree, thus must be solved (except for the dangling else problem, 435which is marked as expected through the ``%expect 1`` Bison option). 436 437For solving conflicts you need a basic understanding of `shift-reduce conflicts 438<https://www.gnu.org/software/Bison/manual/html_node/Shift_002fReduce.html>`__ 439and `reduce-reduce conflicts 440<https://www.gnu.org/software/Bison/manual/html_node/Reduce_002fReduce.html>`__, 441then, if you are using a Bison version > 3.7.1 you can ask Bison to generate 442some counterexamples which highlight ambiguous derivations, passing the 443``-Wcex`` option to Bison. In general shift/reduce conflicts are solved by 444redesigning the grammar in an unambiguous way or by setting the token priority 445correctly, while reduce/reduce conflicts are solved by redesigning the 446interested part of the grammar. 447 448Run-time errors can be divided between lexing and parsing errors, lexing errors 449are hard to detect, since the ``var`` token will catch everything which is not 450catched by other tokens, but easy to fix, because most of the time a simple 451regex editing will be enough. 452 453idef-parser features a fancy parsing error reporting scheme, which for each 454parsing error reports the fragment of the input text which was involved in the 455parsing rule that generated an error. 456 457Implementing an instruction goes through several sequential steps, here are some 458suggestions to make each instruction proceed to the next step. 459 460- not-emitted 461 462 Means that the parsing of the input code relative to that instruction failed, 463 this could be due to a lexical error or to some mismatch between the order of 464 valid tokens and a parser rule. You should check that tokens are correctly 465 identified and mapped, and that there is a rule matching the token sequence 466 that you need to parse. 467 468- emitted 469 470 This instruction class contains all the instructions which are emitted but 471 fail to compile when included in QEMU. The compilation errors are shown by 472 the QEMU building process and will lead to fixing the bug. Most common 473 errors regard the mismatch of parameters for tinycode generator functions, 474 which boil down to errors in the idef-parser type system. 475 476- compiled 477 478 These instruction generate valid tinycode generator code, which however fail 479 the QEMU or the harness tests, these cases must be handled manually by 480 looking into the failing tests and looking at the generated tinycode 481 generator instruction and at the generated tinycode itself. Tip: handle the 482 failing harness tests first, because they usually feature only a single 483 instruction, thus will require less execution trace navigation. If a 484 multi-threaded test fail, fixing all the other tests will be the easier 485 option, hoping that the multi-threaded one will be indirectly fixed. 486 487 An example of debugging this type of failure is provided in the following 488 section. 489 490- tests-passed 491 492 This is the final goal for each instruction, meaning that the instruction 493 passes the test suite. 494 495Another approach to fix QEMU system test, where many instructions might fail, is 496to compare the execution trace of your implementation with the reference 497implementations already present in QEMU. To do so you should obtain a QEMU build 498where the instruction pass the test, and run it with the following command: 499 500:: 501 502 sudo unshare -p sudo -u <USER> bash -c \ 503 'env -i <qemu-hexagon full path> -d cpu <TEST>' 504 505And do the same for your implementation, the generated execution traces will be 506inherently aligned and can be inspected for behavioral differences using the 507``diff`` tool. 508 509Example of debugging erroneous tinycode generator code 510~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 511 512The goal of this section is to provide a complete example of debugging 513incorrectly emitted tinycode generator for a single instruction. 514 515Let's first introduce a bug in the tinycode generator of the ``A2_add`` 516instruction, 517 518:: 519 520 void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV, 521 TCGv_i32 RsV, TCGv_i32 RtV) 522 /* RdV=RsV+RtV;} */ 523 { 524 TCGv_i32 tmp_0 = tcg_temp_new_i32(); 525 tcg_gen_add_i32(tmp_0, RsV, RsV); 526 tcg_gen_mov_i32(RdV, tmp_0); 527 tcg_temp_free_i32(tmp_0); 528 } 529 530Here the bug, albeit hard to spot, is in ``tcg_gen_add_i32(tmp_0, RsV, RsV);`` 531where we compute ``RsV + RsV`` instead of ``RsV + RtV``, as would be expected. 532This particular bug is a bit tricky to pinpoint when debugging, since the 533``A2_add`` instruction is so ubiquitous. As a result, pretty much all tests will 534fail and therefore not provide a lot of information about the bug. 535 536For example, let's run the ``check-tcg`` tests 537 538:: 539 540 make check-tcg TIMEOUT=1200 \ 541 DOCKER_IMAGE=debian-hexagon-cross \ 542 ENGINE=podman V=1 \ 543 DOCKER_CROSS_CC_GUEST=hexagon-unknown-linux-musl-clang 544 545In the output, we find a failure in the very first test case ``float_convs`` 546due to a segmentation fault. Similarly, all harness and libc tests will fail as 547well. At this point we have no clue where the actual bug lies, and need to start 548ruling out instructions. As such a good starting point is to utilize the debug 549options ``-d in_asm,cpu`` of QEMU to inspect the Hexagon instructions being run, 550alongside the CPU state. We additionally need a working version of the emulator 551to compare our buggy CPU state against, running 552 553:: 554 555 meson configure -Dhexagon_idef_parser=false 556 557will disable the idef-parser for all instructions and fallback on manual 558tinycode generator overrides, or on helper function implementations. Recompiling 559gives us ``qemu-hexagon`` which passes all tests. If ``qemu-hexagon-buggy`` is 560our binary with the incorrect tinycode generators, we can compare the CPU state 561between the two versions 562 563:: 564 565 ./qemu-hexagon-buggy -d in_asm,cpu float_convs &> out_buggy 566 ./qemu-hexagon -d in_asm,cpu float_convs &> out_working 567 568Looking at ``diff -u out_buggy out_working`` shows us that the CPU state begins 569to diverge on line 141, with an incorrect value in the ``R1`` register 570 571:: 572 573 @@ -138,7 +138,7 @@ 574 575 General Purpose Registers = { 576 r0 = 0x4100f9c0 577 - r1 = 0x00042108 578 + r1 = 0x00000000 579 r2 = 0x00021084 580 r3 = 0x00000000 581 r4 = 0x00000000 582 583If we also look into ``out_buggy`` directly we can inspect the input assembly 584which the caused the incorrect CPU state, around line 141 we find 585 586:: 587 588 116 | ---------------- 589 117 | IN: _start_c 590 118 | 0x000210b0: 0xa09dc002 { allocframe(R29,#0x10):raw } 591 ... | ... 592 137 | 0x000210fc: 0x5a00c4aa { call PC+2388 } 593 138 | 594 139 | General Purpose Registers = { 595 140 | r0 = 0x4100fa70 596 141 | r1 = 0x00042108 597 142 | r2 = 0x00021084 598 143 | r3 = 0x00000000 599 600Importantly, we see some Hexagon assembly followed by a dump of the CPU state, 601now the CPU state is actually dumped before the input assembly above is ran. 602As such, we are actually interested in the instructions ran before this. 603 604Scrolling up a bit, we find 605 606:: 607 608 54 | ---------------- 609 55 | IN: _start 610 56 | 0x00021088: 0x6a09c002 { R2 = C9/pc } 611 57 | 0x0002108c: 0xbfe2ff82 { R2 = add(R2,#0xfffffffc) } 612 58 | 0x00021090: 0x9182c001 { R1 = memw(R2+#0x0) } 613 59 | 0x00021094: 0xf302c101 { R1 = add(R2,R1) } 614 60 | 0x00021098: 0x7800c01e { R30 = #0x0 } 615 61 | 0x0002109c: 0x707dc000 { R0 = R29 } 616 62 | 0x000210a0: 0x763dfe1d { R29 = and(R29,#0xfffffff0) } 617 63 | 0x000210a4: 0xa79dfdfe { memw(R29+#0xfffffff8) = R29 } 618 64 | 0x000210a8: 0xbffdff1d { R29 = add(R29,#0xfffffff8) } 619 65 | 0x000210ac: 0x5a00c002 { call PC+4 } 620 66 | 621 67 | General Purpose Registers = { 622 68 | r0 = 0x00000000 623 69 | r1 = 0x00000000 624 70 | r2 = 0x00000000 625 71 | r3 = 0x00000000 626 627Remember, the instructions on lines 56-65 are ran on the CPU state shown below 628instructions, and as the CPU state has not diverged at this point, we know the 629starting state is accurate. The bug must then lie within the instructions shown 630here. Next we may notice that ``R1`` is only touched by lines 57 and 58, that is 631by 632 633:: 634 635 58 | 0x00021090: 0x9182c001 { R1 = memw(R2+#0x0) } 636 59 | 0x00021094: 0xf302c101 { R1 = add(R2,R1) } 637 638Therefore, we are either dealing with an correct load instruction 639``R1 = memw(R2+#0x0)`` or with an incorrect add ``R1 = add(R2,R1)``. At this 640point it might be easy enough to go directly to the emitted code for the 641instructions mentioned and look for bugs, but we could also run 642``./qemu-heaxgon -d op,in_asm float_conv`` where we find for the following 643tinycode for the Hexagon ``add`` instruction 644 645:: 646 647 ---- 00021094 648 mov_i32 pkt_has_store_s1,$0x0 649 add_i32 tmp0,r2,r2 650 mov_i32 loc2,tmp0 651 mov_i32 new_r1,loc2 652 mov_i32 r1,new_r1 653 654Here we have finally located our bug ``add_i32 tmp0,r2,r2``. 655 656Limitations and Future Development 657---------------------------------- 658 659The main limitation of the current parser is given by the syntax-driven nature 660of the Bison-generated parsers. This has the severe implication of only being 661able to generate code in the order of evaluation of the various rules, without, 662in any case, being able to backtrack and alter the generated code. 663 664An example limitation is highlighted by this statement of the input language: 665 666:: 667 668 { (PsV==0xff) ? (PdV=0xff) : (PdV=0x00); } 669 670This ternary assignment, when written in this form requires us to emit some 671proper control flow statements, which emit a jump to the first or to the second 672code block, whose implementation is extremely convoluted, because when matching 673the ternary assignment, the code evaluating the two assignments will be already 674generated. 675 676Instead we pre-process that statement, making it become: 677 678:: 679 680 { PdV = ((PsV==0xff)) ? 0xff : 0x00; } 681 682Which can be easily matched by the following parser rules: 683 684:: 685 686 statement | rvalue ';' 687 688 rvalue : rvalue QMARK rvalue COLON rvalue 689 | rvalue EQ rvalue 690 | LPAR rvalue RPAR 691 | assign_statement 692 | IMM 693 694 assign_statement : pred ASSIGN rvalue 695 696Another example that highlight the limitation of the flex/bison parser can be 697found even in the add operation we already saw: 698 699:: 700 701 TCGv_i32 tmp_0 = tcg_temp_new_i32(); 702 tcg_gen_add_i32(tmp_0, RsV, RtV); 703 tcg_gen_mov_i32(RdV, tmp_0); 704 705The fact that we cannot directly use ``RdV`` as the destination of the sum is a 706consequence of the syntax-driven nature of the parser. In fact when we parse the 707assignment, the ``rvalue`` token, representing the sum has already been reduced, 708and thus its code emitted and unchangeable. We rely on the fact that QEMU will 709optimize our code reducing the useless move operations and the relative 710temporaries. 711 712A possible improvement of the parser regards the support for vectorial 713instructions and floating point instructions, which will require the extension 714of the scanner, the parser, and a partial re-design of the type system, allowing 715to build the vectorial semantics over the available vectorial tinycode generator 716primitives. 717 718A more radical improvement will use the parser, not to generate directly the 719tinycode generator code, but to generate an intermediate representation like the 720LLVM IR, which in turn could be compiled using the clang TCG backend. That code 721could be furtherly optimized, overcoming the limitations of the syntax-driven 722parsing and could lead to a more optimized generated code. 723