1Hexagon ISA instruction definitions to tinycode generator compiler
2------------------------------------------------------------------
3
4idef-parser is a small compiler able to translate the Hexagon ISA description
5language into tinycode generator code, that can be easily integrated into QEMU.
6
7Compilation Example
8-------------------
9
10To better understand the scope of the idef-parser, we'll explore an applicative
11example. Let's start by one of the simplest Hexagon instruction: the ``add``.
12
13The ISA description language represents the ``add`` instruction as
14follows:
15
16.. code:: c
17
18   A2_add(RdV, in RsV, in RtV) {
19       { RdV=RsV+RtV;}
20   }
21
22idef-parser will compile the above code into the following code:
23
24.. code:: c
25
26   /* A2_add */
27   void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV,
28                    TCGv_i32 RsV, TCGv_i32 RtV)
29   /*  { RdV=RsV+RtV;} */
30   {
31       TCGv_i32 tmp_0 = tcg_temp_new_i32();
32       tcg_gen_add_i32(tmp_0, RsV, RtV);
33       tcg_gen_mov_i32(RdV, tmp_0);
34       tcg_temp_free_i32(tmp_0);
35   }
36
37The output of the compilation process will be a function, containing the
38tinycode generator code, implementing the correct semantics. That function will
39not access any global variable, because all the accessed data structures will be
40passed explicitly as function parameters. Among the passed parameters we will
41have TCGv (tinycode variables) representing the input and output registers of
42the architecture, integers representing the immediates that come from the code,
43and other data structures which hold information about the disassemblation
44context (``DisasContext`` struct).
45
46Let's begin by describing the input code. The ``add`` instruction is associated
47with a unique identifier, in this case ``A2_add``, which allows to distinguish
48variants of the same instruction, and expresses the class to which the
49instruction belongs, in this case ``A2`` corresponds to the Hexagon
50``ALU32/ALU`` instruction subclass.
51
52After the instruction identifier, we have a series of parameters that represents
53TCG variables that will be passed to the generated function. Parameters marked
54with ``in`` are already initialized, while the others are output parameters.
55
56We will leverage this information to infer several information:
57
58-  Fill in the output function signature with the correct TCGv registers
59-  Fill in the output function signature with the immediate integers
60-  Keep track of which registers, among the declared one, have been
61   initialized
62
63Let's now observe the actual instruction description code, in this case:
64
65.. code:: c
66
67   { RdV=RsV+RtV;}
68
69This code is composed by a subset of the C syntax, and is the result of the
70application of some macro definitions contained in the ``macros.h`` file.
71
72This file is used to reduce the complexity of the input language where complex
73variants of similar constructs can be mapped to a unique primitive, so that the
74idef-parser has to handle a lower number of computation primitives.
75
76As you may notice, the description code modifies the registers which have been
77declared by the declaration statements. In this case all the three registers
78will be declared, ``RsV`` and ``RtV`` will also be read and ``RdV`` will be
79written.
80
81Now let's have a quick look at the generated code, line by line.
82
83::
84
85   TCGv_i32 tmp_0 = tcg_temp_new_i32();
86
87This code starts by declaring a temporary TCGv to hold the result from the sum
88operation.
89
90::
91
92   tcg_gen_add_i32(tmp_0, RsV, RtV);
93
94Then, we are generating the sum tinycode operator between the selected
95registers, storing the result in the just declared temporary.
96
97::
98
99   tcg_gen_mov_i32(RdV, tmp_0);
100
101The result of the addition is now stored in the temporary, we move it into the
102correct destination register. This code may seem inefficient, but QEMU will
103perform some optimizations on the tinycode, reducing the unnecessary copy.
104
105::
106
107   tcg_temp_free_i32(tmp_0);
108
109Finally, we free the temporary we used to hold the addition result.
110
111Parser Input
112------------
113
114Before moving on to the structure of idef-parser itself, let us spend some words
115on its' input. There are two preprocessing steps applied to the generated
116instruction semantics in ``semantics_generated.pyinc`` that we need to consider.
117Firstly,
118
119::
120
121    gen_idef_parser_funcs.py
122
123which takes instruction semantics in ``semantics_generated.pyinc`` to C-like
124pseudo code, output into ``idef_parser_input.h.inc``. For instance, the
125``J2_jumpr`` instruction which jumps to an address stored in a register
126argument. This is instruction is defined as
127
128::
129
130    SEMANTICS( \
131        "J2_jumpr", \
132        "jumpr Rs32", \
133        """{fJUMPR(RsN,RsV,COF_TYPE_JUMPR);}""" \
134    )
135
136in ``semantics_generated.pyinc``. Running ``gen_idef_parser_funcs.py``
137we obtain the pseudo code
138
139::
140
141    J2_jumpr(in RsV) {
142        {fJUMPR(RsN,RsV,COF_TYPE_JUMPR);}
143    }
144
145with macros such as ``fJUMPR`` intact.
146
147The second step is to expand macros into a form suitable for our parser.
148These macros are defined in ``idef-parser/macros.inc`` and the step is
149carried out by the ``prepare`` script which runs the C preprocessor on
150``idef_parser_input.h.inc`` to produce
151``idef_parser_input.preprocessed.h.inc``.
152
153To finish the above example, after preprocessing ``J2_jumpr`` we obtain
154
155::
156
157    J2_jumpr(in RsV) {
158        {(PC = RsV);}
159    }
160
161where ``fJUMPR(RsN,RsV,COF_TYPE_JUMPR);`` was expanded to ``(PC = RsV)``,
162signifying a write to the Program Counter ``PC``.  Note, that ``PC`` in
163this expression is not a variable in the strict C sense since it is not
164declared anywhere, but rather a symbol which is easy to match in
165idef-parser later on.
166
167Parser Structure
168----------------
169
170The idef-parser is built using the ``flex`` and ``bison``.
171
172``flex`` is used to split the input string into tokens, each described using a
173regular expression. The token description is contained in the
174``idef-parser.lex`` source file. The flex-generated scanner takes care also to
175extract from the input text other meaningful information, e.g., the numerical
176value in case of an immediate constant, and decorates the token with the
177extracted information.
178
179``bison`` is used to generate the actual parser, starting from the parsing
180description contained in the ``idef-parser.y`` file. The generated parser
181executes the ``main`` function at the end of the ``idef-parser.y`` file, which
182opens input and output files, creates the parsing context, and eventually calls
183the ``yyparse()`` function, which starts the execution of the LALR(1) parser
184(see `Wikipedia <https://en.wikipedia.org/wiki/LALR_parser>`__ for more
185information about LALR parsing techniques). The LALR(1) parser, whenever it has
186to shift a token, calls the ``yylex()`` function, which is defined by the
187flex-generated code, and reads the input file returning the next scanned token.
188
189The tokens are mapped on the source language grammar, defined in the
190``idef-parser.y`` file to build a unique syntactic tree, according to the
191specified operator precedences and associativity rules.
192
193The grammar describes the whole file which contains the Hexagon instruction
194descriptions, therefore it starts from the ``input`` nonterminal, which is a
195list of instructions, each instruction is represented by the following grammar
196rule, representing the structure of the input file shown above:
197
198::
199
200   instruction : INAME arguments code
201               | error
202
203   arguments : '(' ')'
204             | '(' argument_list ')';
205
206   argument_list : argument_decl ',' argument_list
207                 | argument_decl
208
209   argument_decl : REG
210                 | PRED
211                 | IN REG
212                 | IN PRED
213                 | IMM
214                 | var
215                 ;
216
217   code        : '{' statements '}'
218
219   statements  : statements statement
220               | statement
221
222   statement   : control_statement
223               | var_decl ';'
224               | rvalue ';'
225               | code_block
226               | ';'
227
228   code_block  : '{' statements '}'
229               | '{' '}'
230
231With this initial portion of the grammar we are defining the instruction, its'
232arguments, and its' statements. Each argument is defined by the
233``argument_decl`` rule, and can be either
234
235::
236
237    Description                  Example
238    ----------------------------------------
239    output register              RsV
240    output predicate register    P0
241    input register               in RsV
242    input predicate register     in P0
243    immediate value              1234
244    local variable               EA
245
246Note, the only local variable allowed to be used as an argument is the effective
247address ``EA``. Similarly, each statement can be a ``control_statement``, a
248variable declaration such as ``int a;``, a code block, which is just a
249bracket-enclosed list of statements, a ``';'``, which is a ``nop`` instruction,
250and an ``rvalue ';'``.
251
252Expressions
253~~~~~~~~~~~
254
255Allowed in the input code are C language expressions with a few exceptions
256to simplify parsing. For instance, variable names such as ``RdV``, ``RssV``,
257``PdV``, ``CsV``, and other idiomatic register names from Hexagon, are
258reserved specifically for register arguments. These arguments then map to
259``TCGv_i32`` or ``TCGv_i64`` depending on the register size. Similarly, ``UiV``,
260``riV``, etc. refer to immediate arguments and will map to C integers.
261
262Also, as mentioned earlier, the names ``PC``, ``SP``, ``FP``, etc. are used to
263refer to Hexagon registers such as the program counter, stack pointer, and frame
264pointer seen here. Writes to these registers then correspond to assignments
265``PC = ...``, and reads correspond to uses of the variable ``PC``.
266
267Moreover, another example of one such exception is the selective expansion of
268macros present in ``macros.h``. As an example, consider the ``fABS`` macro which
269in plain C is defined as
270
271::
272
273    #define fABS(A) (((A) < 0) ? (-(A)) : (A))
274
275and returns the absolute value of the argument ``A``. This macro is not included
276in ``idef-parser/macros.inc`` and as such is not expanded and kept as a "call"
277``fABS(...)``. Reason being, that ``fABS`` is easier to match and map to
278``tcg_gen_abs_<width>``, compared to the full ternary expression above. Loads of
279macros in ``macros.h`` are kept unexpanded to aid in parsing, as seen in the
280example above, for more information see ``idef-parser/idef-parser.lex``.
281
282Finally, in mapping these input expressions to tinycode generators, idef-parser
283tries to perform as much as possible in plain C. Such as, performing binary
284operations in C instead of tinycode generators, thus effectively constant
285folding the expression.
286
287Variables and Variable Declarations
288~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
289
290Similarly to C, variables in the input code must be explicitly declared, such as
291``int var1;`` which declares an uninitialized variable ``var1``. Initialization
292``int var2 = 0;`` is also allowed and behaves as expected. In tinycode
293generators the previous declarations are mapped to
294
295::
296
297    int var1;           ->      TCGv_i32 var1 = tcg_temp_new_i32();
298
299    int var2 = 0;       ->      TCGv_i32 var1 = tcg_temp_new_i32();
300                                tcg_gen_movi_i32(j, ((int64_t) 0ULL));
301
302which are later automatically freed at the end of the function they're declared
303in. Contrary to C, we only allow variables to be declared with an integer type
304specified in the following table (without permutation of keywords)
305
306::
307
308    type                        bit-width    signedness
309    ----------------------------------------------------------
310    int                         32           signed
311    signed
312    signed int
313
314    unsigned                    32           unsigned
315    unsigned int
316
317    long                        64           signed
318    long int
319    signed long
320    signed long int
321
322    unsigned long               64           unsigned
323    unsigned long int
324
325    long long                   64           signed
326    long long int
327    signed long long
328    signed long long int
329
330    unsigned long long          64           unsigned
331    unsigned long long int
332
333    size[1,2,4,8][s,u]_t        8-64         signed or unsigned
334
335In idef-parser, variable names are matched by a generic ``VARID`` token,
336which will feature the variable name as a decoration. For a variable declaration
337idef-parser calls ``gen_varid_allocate`` with the ``VARID`` token to save the
338name, size, and bit width of the newly declared variable. In addition, this
339function also ensures that variables aren't declared multiple times, and prints
340and error message if that is the case. Upon use of a variable, the ``VARID``
341token is used to lookup the size and bit width of the variable.
342
343Type System
344~~~~~~~~~~~
345
346idef-parser features a simple type system which is used to correctly implement
347the signedness and bit width of the operations.
348
349The type of each ``rvalue`` is determined by two attributes: its bit width
350(``unsigned bit_width``) and its signedness (``HexSignedness signedness``).
351
352For each operation, the type of ``rvalue``\ s influence the way in which the
353operands are handled and emitted. For example a right shift between signed
354operators will be an arithmetic shift, while one between unsigned operators
355will be a logical shift. If one of the two operands is signed, and the other
356is unsigned, the operation will be signed.
357
358The bit width also influences the outcome of the operations, in particular while
359the input languages features a fine granularity type system, with types of 8,
36016, 32, 64 (and more for vectorial instructions) bits, the tinycode only
361features 32 and 64 bit widths. We propagate as much as possible the fine
362granularity type, until the value has to be used inside an operation between
363``rvalue``\ s; in that case if one of the two operands is greater than 32 bits
364we promote the whole operation to 64 bit, taking care of properly extending the
365two operands. Fortunately, the most critical instructions already feature
366explicit casts and zero/sign extensions which are properly propagated down to
367our parser.
368
369The combination of ``rvalue``\ s are handled through the use of the
370``gen_bin_op`` and ``gen_bin_cmp`` helper functions. These two functions handle
371the appropriate compile-time or run-time emission of operations to perform the
372required computation.
373
374Control Statements
375~~~~~~~~~~~~~~~~~~
376
377``control_statement``\ s are all the statements which modify the order of
378execution of the generated code according to input parameters. They are expanded
379by the following grammar rule:
380
381::
382
383   control_statement : frame_check
384                     | cancel_statement
385                     | if_statement
386                     | for_statement
387                     | fpart1_statement
388
389``if_statement``\ s require the emission of labels and branch instructions which
390effectively perform conditional jumps (``tcg_gen_brcondi``) according to the
391value of an expression. Note, the tinycode generators we produce for conditional
392statements do not perfectly mirror what would be expected in C, for instance we
393do not reproduce short-circuiting of the ``&&`` operator, and use of the ``||``
394operator is disallowed. All the predicated instructions, and in general all the
395instructions where there could be alternative values assigned to an ``lvalue``,
396like C-style ternary expressions:
397
398::
399
400   rvalue            : rvalue QMARK rvalue COLON rvalue
401
402are handled using the conditional move tinycode instruction
403(``tcg_gen_movcond``), which avoids the additional complexity of managing labels
404and jumps.
405
406Instead, regarding the ``for`` loops, exploiting the fact that they always
407iterate on immediate values, therefore their iteration ranges are always known
408at compile time, we implemented those emitting plain C ``for`` loops. This is
409possible because the loops will be executed in the QEMU code, leading to the
410consequential unrolling of the for loop, since the tinycode generator
411instructions will be executed multiple times, and the respective generated
412tinycode will represent the unrolled execution of the loop.
413
414Parsing Context
415~~~~~~~~~~~~~~~
416
417All the helper functions in ``idef-parser.y`` carry two fixed parameters, which
418are the parsing context ``c`` and the ``YYLLOC`` location information. The
419context is explicitly passed to all the functions because the parser we generate
420is a reentrant one, meaning that it does not have any global variable, and
421therefore the instruction compilation could easily be parallelized in the
422future. Finally for each rule we propagate information about the location of the
423involved tokens to generate pretty error reporting, able to highlight the
424portion of the input code which generated each error.
425
426Debugging
427---------
428
429Developing the idef-parser can lead to two types of errors: compile-time errors
430and parsing errors.
431
432Compile-time errors in Bison-generated parsers are usually due to conflicts in
433the described grammar. Conflicts forbid the grammar to produce a unique
434derivation tree, thus must be solved (except for the dangling else problem,
435which is marked as expected through the ``%expect 1`` Bison option).
436
437For solving conflicts you need a basic understanding of `shift-reduce conflicts
438<https://www.gnu.org/software/Bison/manual/html_node/Shift_002fReduce.html>`__
439and `reduce-reduce conflicts
440<https://www.gnu.org/software/Bison/manual/html_node/Reduce_002fReduce.html>`__,
441then, if you are using a Bison version > 3.7.1 you can ask Bison to generate
442some counterexamples which highlight ambiguous derivations, passing the
443``-Wcex`` option to Bison. In general shift/reduce conflicts are solved by
444redesigning the grammar in an unambiguous way or by setting the token priority
445correctly, while reduce/reduce conflicts are solved by redesigning the
446interested part of the grammar.
447
448Run-time errors can be divided between lexing and parsing errors, lexing errors
449are hard to detect, since the ``var`` token will catch everything which is not
450catched by other tokens, but easy to fix, because most of the time a simple
451regex editing will be enough.
452
453idef-parser features a fancy parsing error reporting scheme, which for each
454parsing error reports the fragment of the input text which was involved in the
455parsing rule that generated an error.
456
457Implementing an instruction goes through several sequential steps, here are some
458suggestions to make each instruction proceed to the next step.
459
460-  not-emitted
461
462   Means that the parsing of the input code relative to that instruction failed,
463   this could be due to a lexical error or to some mismatch between the order of
464   valid tokens and a parser rule. You should check that tokens are correctly
465   identified and mapped, and that there is a rule matching the token sequence
466   that you need to parse.
467
468-  emitted
469
470   This instruction class contains all the instructions which are emitted but
471   fail to compile when included in QEMU. The compilation errors are shown by
472   the QEMU building process and will lead to fixing the bug.  Most common
473   errors regard the mismatch of parameters for tinycode generator functions,
474   which boil down to errors in the idef-parser type system.
475
476-  compiled
477
478   These instruction generate valid tinycode generator code, which however fail
479   the QEMU or the harness tests, these cases must be handled manually by
480   looking into the failing tests and looking at the generated tinycode
481   generator instruction and at the generated tinycode itself. Tip: handle the
482   failing harness tests first, because they usually feature only a single
483   instruction, thus will require less execution trace navigation. If a
484   multi-threaded test fail, fixing all the other tests will be the easier
485   option, hoping that the multi-threaded one will be indirectly fixed.
486
487   An example of debugging this type of failure is provided in the following
488   section.
489
490-  tests-passed
491
492   This is the final goal for each instruction, meaning that the instruction
493   passes the test suite.
494
495Another approach to fix QEMU system test, where many instructions might fail, is
496to compare the execution trace of your implementation with the reference
497implementations already present in QEMU. To do so you should obtain a QEMU build
498where the instruction pass the test, and run it with the following command:
499
500::
501
502   sudo unshare -p sudo -u <USER> bash -c \
503   'env -i <qemu-hexagon full path> -d cpu <TEST>'
504
505And do the same for your implementation, the generated execution traces will be
506inherently aligned and can be inspected for behavioral differences using the
507``diff`` tool.
508
509Example of debugging erroneous tinycode generator code
510~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
511
512The goal of this section is to provide a complete example of debugging
513incorrectly emitted tinycode generator for a single instruction.
514
515Let's first introduce a bug in the tinycode generator of the ``A2_add``
516instruction,
517
518::
519
520    void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV,
521                     TCGv_i32 RsV, TCGv_i32 RtV)
522    /*  RdV=RsV+RtV;} */
523    {
524        TCGv_i32 tmp_0 = tcg_temp_new_i32();
525        tcg_gen_add_i32(tmp_0, RsV, RsV);
526        tcg_gen_mov_i32(RdV, tmp_0);
527        tcg_temp_free_i32(tmp_0);
528    }
529
530Here the bug, albeit hard to spot, is in ``tcg_gen_add_i32(tmp_0, RsV, RsV);``
531where we compute ``RsV + RsV`` instead of ``RsV + RtV``, as would be expected.
532This particular bug is a bit tricky to pinpoint when debugging, since the
533``A2_add`` instruction is so ubiquitous. As a result, pretty much all tests will
534fail and therefore not provide a lot of information about the bug.
535
536For example, let's run the ``check-tcg`` tests
537
538::
539
540    make check-tcg TIMEOUT=1200 \
541                   DOCKER_IMAGE=debian-hexagon-cross \
542                   ENGINE=podman V=1 \
543                   DOCKER_CROSS_CC_GUEST=hexagon-unknown-linux-musl-clang
544
545In the output, we find a failure in the very first test case ``float_convs``
546due to a segmentation fault. Similarly, all harness and libc tests will fail as
547well. At this point we have no clue where the actual bug lies, and need to start
548ruling out instructions. As such a good starting point is to utilize the debug
549options ``-d in_asm,cpu`` of QEMU to inspect the Hexagon instructions being run,
550alongside the CPU state. We additionally need a working version of the emulator
551to compare our buggy CPU state against, running
552
553::
554
555    meson configure -Dhexagon_idef_parser=false
556
557will disable the idef-parser for all instructions and fallback on manual
558tinycode generator overrides, or on helper function implementations. Recompiling
559gives us ``qemu-hexagon`` which passes all tests. If ``qemu-hexagon-buggy`` is
560our binary with the incorrect tinycode generators, we can compare the CPU state
561between the two versions
562
563::
564
565    ./qemu-hexagon-buggy -d in_asm,cpu float_convs &> out_buggy
566    ./qemu-hexagon       -d in_asm,cpu float_convs &> out_working
567
568Looking at ``diff -u out_buggy out_working`` shows us that the CPU state begins
569to diverge on line 141, with an incorrect value in the ``R1`` register
570
571::
572
573    @@ -138,7 +138,7 @@
574
575     General Purpose Registers = {
576       r0 = 0x4100f9c0
577    -  r1 = 0x00042108
578    +  r1 = 0x00000000
579       r2 = 0x00021084
580       r3 = 0x00000000
581       r4 = 0x00000000
582
583If we also look into ``out_buggy`` directly we can inspect the input assembly
584which the caused the incorrect CPU state, around line 141 we find
585
586::
587
588    116 |  ----------------
589    117 |  IN: _start_c
590    118 |  0x000210b0:  0xa09dc002	{	allocframe(R29,#0x10):raw }
591    ... |  ...
592    137 |  0x000210fc:  0x5a00c4aa	{	call PC+2388 }
593    138 |
594    139 |  General Purpose Registers = {
595    140 |    r0 = 0x4100fa70
596    141 |    r1 = 0x00042108
597    142 |    r2 = 0x00021084
598    143 |    r3 = 0x00000000
599
600Importantly, we see some Hexagon assembly followed by a dump of the CPU state,
601now the CPU state is actually dumped before the input assembly above is ran.
602As such, we are actually interested in the instructions ran before this.
603
604Scrolling up a bit, we find
605
606::
607
608    54 |  ----------------
609    55 |  IN: _start
610    56 |  0x00021088:  0x6a09c002	{	R2 = C9/pc }
611    57 |  0x0002108c:  0xbfe2ff82	{	R2 = add(R2,#0xfffffffc) }
612    58 |  0x00021090:  0x9182c001	{	R1 = memw(R2+#0x0) }
613    59 |  0x00021094:  0xf302c101	{	R1 = add(R2,R1) }
614    60 |  0x00021098:  0x7800c01e	{	R30 = #0x0 }
615    61 |  0x0002109c:  0x707dc000	{	R0 = R29 }
616    62 |  0x000210a0:  0x763dfe1d	{	R29 = and(R29,#0xfffffff0) }
617    63 |  0x000210a4:  0xa79dfdfe	{	memw(R29+#0xfffffff8) = R29 }
618    64 |  0x000210a8:  0xbffdff1d	{	R29 = add(R29,#0xfffffff8) }
619    65 |  0x000210ac:  0x5a00c002	{	call PC+4 }
620    66 |
621    67 |  General Purpose Registers = {
622    68 |    r0 = 0x00000000
623    69 |    r1 = 0x00000000
624    70 |    r2 = 0x00000000
625    71 |    r3 = 0x00000000
626
627Remember, the instructions on lines 56-65 are ran on the CPU state shown below
628instructions, and as the CPU state has not diverged at this point, we know the
629starting state is accurate. The bug must then lie within the instructions shown
630here. Next we may notice that ``R1`` is only touched by lines 57 and 58, that is
631by
632
633::
634
635    58 |  0x00021090:  0x9182c001	{	R1 = memw(R2+#0x0) }
636    59 |  0x00021094:  0xf302c101	{	R1 = add(R2,R1) }
637
638Therefore, we are either dealing with an correct load instruction
639``R1 = memw(R2+#0x0)`` or with an incorrect add ``R1 = add(R2,R1)``. At this
640point it might be easy enough to go directly to the emitted code for the
641instructions mentioned and look for bugs, but we could also run
642``./qemu-heaxgon -d op,in_asm float_conv`` where we find for the following
643tinycode for the Hexagon ``add`` instruction
644
645::
646
647   ---- 00021094
648   mov_i32 pkt_has_store_s1,$0x0
649   add_i32 tmp0,r2,r2
650   mov_i32 loc2,tmp0
651   mov_i32 new_r1,loc2
652   mov_i32 r1,new_r1
653
654Here we have finally located our bug ``add_i32 tmp0,r2,r2``.
655
656Limitations and Future Development
657----------------------------------
658
659The main limitation of the current parser is given by the syntax-driven nature
660of the Bison-generated parsers. This has the severe implication of only being
661able to generate code in the order of evaluation of the various rules, without,
662in any case, being able to backtrack and alter the generated code.
663
664An example limitation is highlighted by this statement of the input language:
665
666::
667
668   { (PsV==0xff) ? (PdV=0xff) : (PdV=0x00); }
669
670This ternary assignment, when written in this form requires us to emit some
671proper control flow statements, which emit a jump to the first or to the second
672code block, whose implementation is extremely convoluted, because when matching
673the ternary assignment, the code evaluating the two assignments will be already
674generated.
675
676Instead we pre-process that statement, making it become:
677
678::
679
680   { PdV = ((PsV==0xff)) ? 0xff : 0x00; }
681
682Which can be easily matched by the following parser rules:
683
684::
685
686   statement             | rvalue ';'
687
688   rvalue                : rvalue QMARK rvalue COLON rvalue
689                         | rvalue EQ rvalue
690                         | LPAR rvalue RPAR
691                         | assign_statement
692                         | IMM
693
694   assign_statement      : pred ASSIGN rvalue
695
696Another example that highlight the limitation of the flex/bison parser can be
697found even in the add operation we already saw:
698
699::
700
701   TCGv_i32 tmp_0 = tcg_temp_new_i32();
702   tcg_gen_add_i32(tmp_0, RsV, RtV);
703   tcg_gen_mov_i32(RdV, tmp_0);
704
705The fact that we cannot directly use ``RdV`` as the destination of the sum is a
706consequence of the syntax-driven nature of the parser. In fact when we parse the
707assignment, the ``rvalue`` token, representing the sum has already been reduced,
708and thus its code emitted and unchangeable. We rely on the fact that QEMU will
709optimize our code reducing the useless move operations and the relative
710temporaries.
711
712A possible improvement of the parser regards the support for vectorial
713instructions and floating point instructions, which will require the extension
714of the scanner, the parser, and a partial re-design of the type system, allowing
715to build the vectorial semantics over the available vectorial tinycode generator
716primitives.
717
718A more radical improvement will use the parser, not to generate directly the
719tinycode generator code, but to generate an intermediate representation like the
720LLVM IR, which in turn could be compiled using the clang TCG backend. That code
721could be furtherly optimized, overcoming the limitations of the syntax-driven
722parsing and could lead to a more optimized generated code.
723