1Hexagon ISA instruction definitions to tinycode generator compiler
2------------------------------------------------------------------
3
4idef-parser is a small compiler able to translate the Hexagon ISA description
5language into tinycode generator code, that can be easily integrated into QEMU.
6
7Compilation Example
8-------------------
9
10To better understand the scope of the idef-parser, we'll explore an applicative
11example. Let's start by one of the simplest Hexagon instruction: the ``add``.
12
13The ISA description language represents the ``add`` instruction as
14follows:
15
16.. code:: c
17
18   A2_add(RdV, in RsV, in RtV) {
19       { RdV=RsV+RtV;}
20   }
21
22idef-parser will compile the above code into the following code:
23
24.. code:: c
25
26   /* A2_add */
27   void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV,
28                    TCGv_i32 RsV, TCGv_i32 RtV)
29   /*  { RdV=RsV+RtV;} */
30   {
31       TCGv_i32 tmp_0 = tcg_temp_new_i32();
32       tcg_gen_add_i32(tmp_0, RsV, RtV);
33       tcg_gen_mov_i32(RdV, tmp_0);
34   }
35
36The output of the compilation process will be a function, containing the
37tinycode generator code, implementing the correct semantics. That function will
38not access any global variable, because all the accessed data structures will be
39passed explicitly as function parameters. Among the passed parameters we will
40have TCGv (tinycode variables) representing the input and output registers of
41the architecture, integers representing the immediates that come from the code,
42and other data structures which hold information about the disassemblation
43context (``DisasContext`` struct).
44
45Let's begin by describing the input code. The ``add`` instruction is associated
46with a unique identifier, in this case ``A2_add``, which allows to distinguish
47variants of the same instruction, and expresses the class to which the
48instruction belongs, in this case ``A2`` corresponds to the Hexagon
49``ALU32/ALU`` instruction subclass.
50
51After the instruction identifier, we have a series of parameters that represents
52TCG variables that will be passed to the generated function. Parameters marked
53with ``in`` are already initialized, while the others are output parameters.
54
55We will leverage this information to infer several information:
56
57-  Fill in the output function signature with the correct TCGv registers
58-  Fill in the output function signature with the immediate integers
59-  Keep track of which registers, among the declared one, have been
60   initialized
61
62Let's now observe the actual instruction description code, in this case:
63
64.. code:: c
65
66   { RdV=RsV+RtV;}
67
68This code is composed by a subset of the C syntax, and is the result of the
69application of some macro definitions contained in the ``macros.h`` file.
70
71This file is used to reduce the complexity of the input language where complex
72variants of similar constructs can be mapped to a unique primitive, so that the
73idef-parser has to handle a lower number of computation primitives.
74
75As you may notice, the description code modifies the registers which have been
76declared by the declaration statements. In this case all the three registers
77will be declared, ``RsV`` and ``RtV`` will also be read and ``RdV`` will be
78written.
79
80Now let's have a quick look at the generated code, line by line.
81
82::
83
84   TCGv_i32 tmp_0 = tcg_temp_new_i32();
85
86This code starts by declaring a temporary TCGv to hold the result from the sum
87operation.
88
89::
90
91   tcg_gen_add_i32(tmp_0, RsV, RtV);
92
93Then, we are generating the sum tinycode operator between the selected
94registers, storing the result in the just declared temporary.
95
96::
97
98   tcg_gen_mov_i32(RdV, tmp_0);
99
100The result of the addition is now stored in the temporary, we move it into the
101correct destination register. This code may seem inefficient, but QEMU will
102perform some optimizations on the tinycode, reducing the unnecessary copy.
103
104Parser Input
105------------
106
107Before moving on to the structure of idef-parser itself, let us spend some words
108on its' input. There are two preprocessing steps applied to the generated
109instruction semantics in ``semantics_generated.pyinc`` that we need to consider.
110Firstly,
111
112::
113
114    gen_idef_parser_funcs.py
115
116which takes instruction semantics in ``semantics_generated.pyinc`` to C-like
117pseudo code, output into ``idef_parser_input.h.inc``. For instance, the
118``J2_jumpr`` instruction which jumps to an address stored in a register
119argument. This is instruction is defined as
120
121::
122
123    SEMANTICS( \
124        "J2_jumpr", \
125        "jumpr Rs32", \
126        """{fJUMPR(RsN,RsV,COF_TYPE_JUMPR);}""" \
127    )
128
129in ``semantics_generated.pyinc``. Running ``gen_idef_parser_funcs.py``
130we obtain the pseudo code
131
132::
133
134    J2_jumpr(in RsV) {
135        {fJUMPR(RsN,RsV,COF_TYPE_JUMPR);}
136    }
137
138with macros such as ``fJUMPR`` intact.
139
140The second step is to expand macros into a form suitable for our parser.
141These macros are defined in ``idef-parser/macros.inc`` and the step is
142carried out by the ``prepare`` script which runs the C preprocessor on
143``idef_parser_input.h.inc`` to produce
144``idef_parser_input.preprocessed.h.inc``.
145
146To finish the above example, after preprocessing ``J2_jumpr`` we obtain
147
148::
149
150    J2_jumpr(in RsV) {
151        {(PC = RsV);}
152    }
153
154where ``fJUMPR(RsN,RsV,COF_TYPE_JUMPR);`` was expanded to ``(PC = RsV)``,
155signifying a write to the Program Counter ``PC``.  Note, that ``PC`` in
156this expression is not a variable in the strict C sense since it is not
157declared anywhere, but rather a symbol which is easy to match in
158idef-parser later on.
159
160Parser Structure
161----------------
162
163The idef-parser is built using the ``flex`` and ``bison``.
164
165``flex`` is used to split the input string into tokens, each described using a
166regular expression. The token description is contained in the
167``idef-parser.lex`` source file. The flex-generated scanner takes care also to
168extract from the input text other meaningful information, e.g., the numerical
169value in case of an immediate constant, and decorates the token with the
170extracted information.
171
172``bison`` is used to generate the actual parser, starting from the parsing
173description contained in the ``idef-parser.y`` file. The generated parser
174executes the ``main`` function at the end of the ``idef-parser.y`` file, which
175opens input and output files, creates the parsing context, and eventually calls
176the ``yyparse()`` function, which starts the execution of the LALR(1) parser
177(see `Wikipedia <https://en.wikipedia.org/wiki/LALR_parser>`__ for more
178information about LALR parsing techniques). The LALR(1) parser, whenever it has
179to shift a token, calls the ``yylex()`` function, which is defined by the
180flex-generated code, and reads the input file returning the next scanned token.
181
182The tokens are mapped on the source language grammar, defined in the
183``idef-parser.y`` file to build a unique syntactic tree, according to the
184specified operator precedences and associativity rules.
185
186The grammar describes the whole file which contains the Hexagon instruction
187descriptions, therefore it starts from the ``input`` nonterminal, which is a
188list of instructions, each instruction is represented by the following grammar
189rule, representing the structure of the input file shown above:
190
191::
192
193   instruction : INAME arguments code
194               | error
195
196   arguments : '(' ')'
197             | '(' argument_list ')';
198
199   argument_list : argument_decl ',' argument_list
200                 | argument_decl
201
202   argument_decl : REG
203                 | PRED
204                 | IN REG
205                 | IN PRED
206                 | IMM
207                 | var
208                 ;
209
210   code        : '{' statements '}'
211
212   statements  : statements statement
213               | statement
214
215   statement   : control_statement
216               | var_decl ';'
217               | rvalue ';'
218               | code_block
219               | ';'
220
221   code_block  : '{' statements '}'
222               | '{' '}'
223
224With this initial portion of the grammar we are defining the instruction, its'
225arguments, and its' statements. Each argument is defined by the
226``argument_decl`` rule, and can be either
227
228::
229
230    Description                  Example
231    ----------------------------------------
232    output register              RsV
233    output predicate register    P0
234    input register               in RsV
235    input predicate register     in P0
236    immediate value              1234
237    local variable               EA
238
239Note, the only local variable allowed to be used as an argument is the effective
240address ``EA``. Similarly, each statement can be a ``control_statement``, a
241variable declaration such as ``int a;``, a code block, which is just a
242bracket-enclosed list of statements, a ``';'``, which is a ``nop`` instruction,
243and an ``rvalue ';'``.
244
245Expressions
246~~~~~~~~~~~
247
248Allowed in the input code are C language expressions with a few exceptions
249to simplify parsing. For instance, variable names such as ``RdV``, ``RssV``,
250``PdV``, ``CsV``, and other idiomatic register names from Hexagon, are
251reserved specifically for register arguments. These arguments then map to
252``TCGv_i32`` or ``TCGv_i64`` depending on the register size. Similarly, ``UiV``,
253``riV``, etc. refer to immediate arguments and will map to C integers.
254
255Also, as mentioned earlier, the names ``PC``, ``SP``, ``FP``, etc. are used to
256refer to Hexagon registers such as the program counter, stack pointer, and frame
257pointer seen here. Writes to these registers then correspond to assignments
258``PC = ...``, and reads correspond to uses of the variable ``PC``.
259
260Moreover, another example of one such exception is the selective expansion of
261macros present in ``macros.h``. As an example, consider the ``fABS`` macro which
262in plain C is defined as
263
264::
265
266    #define fABS(A) (((A) < 0) ? (-(A)) : (A))
267
268and returns the absolute value of the argument ``A``. This macro is not included
269in ``idef-parser/macros.inc`` and as such is not expanded and kept as a "call"
270``fABS(...)``. Reason being, that ``fABS`` is easier to match and map to
271``tcg_gen_abs_<width>``, compared to the full ternary expression above. Loads of
272macros in ``macros.h`` are kept unexpanded to aid in parsing, as seen in the
273example above, for more information see ``idef-parser/idef-parser.lex``.
274
275Finally, in mapping these input expressions to tinycode generators, idef-parser
276tries to perform as much as possible in plain C. Such as, performing binary
277operations in C instead of tinycode generators, thus effectively constant
278folding the expression.
279
280Variables and Variable Declarations
281~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
282
283Similarly to C, variables in the input code must be explicitly declared, such as
284``int var1;`` which declares an uninitialized variable ``var1``. Initialization
285``int var2 = 0;`` is also allowed and behaves as expected. In tinycode
286generators the previous declarations are mapped to
287
288::
289
290    int var1;           ->      TCGv_i32 var1 = tcg_temp_new_i32();
291
292    int var2 = 0;       ->      TCGv_i32 var1 = tcg_temp_new_i32();
293                                tcg_gen_movi_i32(j, ((int64_t) 0ULL));
294
295which are later automatically freed at the end of the function they're declared
296in. Contrary to C, we only allow variables to be declared with an integer type
297specified in the following table (without permutation of keywords)
298
299::
300
301    type                        bit-width    signedness
302    ----------------------------------------------------------
303    int                         32           signed
304    signed
305    signed int
306
307    unsigned                    32           unsigned
308    unsigned int
309
310    long                        64           signed
311    long int
312    signed long
313    signed long int
314
315    unsigned long               64           unsigned
316    unsigned long int
317
318    long long                   64           signed
319    long long int
320    signed long long
321    signed long long int
322
323    unsigned long long          64           unsigned
324    unsigned long long int
325
326    size[1,2,4,8][s,u]_t        8-64         signed or unsigned
327
328In idef-parser, variable names are matched by a generic ``VARID`` token,
329which will feature the variable name as a decoration. For a variable declaration
330idef-parser calls ``gen_varid_allocate`` with the ``VARID`` token to save the
331name, size, and bit width of the newly declared variable. In addition, this
332function also ensures that variables aren't declared multiple times, and prints
333and error message if that is the case. Upon use of a variable, the ``VARID``
334token is used to lookup the size and bit width of the variable.
335
336Type System
337~~~~~~~~~~~
338
339idef-parser features a simple type system which is used to correctly implement
340the signedness and bit width of the operations.
341
342The type of each ``rvalue`` is determined by two attributes: its bit width
343(``unsigned bit_width``) and its signedness (``HexSignedness signedness``).
344
345For each operation, the type of ``rvalue``\ s influence the way in which the
346operands are handled and emitted. For example a right shift between signed
347operators will be an arithmetic shift, while one between unsigned operators
348will be a logical shift. If one of the two operands is signed, and the other
349is unsigned, the operation will be signed.
350
351The bit width also influences the outcome of the operations, in particular while
352the input languages features a fine granularity type system, with types of 8,
35316, 32, 64 (and more for vectorial instructions) bits, the tinycode only
354features 32 and 64 bit widths. We propagate as much as possible the fine
355granularity type, until the value has to be used inside an operation between
356``rvalue``\ s; in that case if one of the two operands is greater than 32 bits
357we promote the whole operation to 64 bit, taking care of properly extending the
358two operands. Fortunately, the most critical instructions already feature
359explicit casts and zero/sign extensions which are properly propagated down to
360our parser.
361
362The combination of ``rvalue``\ s are handled through the use of the
363``gen_bin_op`` and ``gen_bin_cmp`` helper functions. These two functions handle
364the appropriate compile-time or run-time emission of operations to perform the
365required computation.
366
367Control Statements
368~~~~~~~~~~~~~~~~~~
369
370``control_statement``\ s are all the statements which modify the order of
371execution of the generated code according to input parameters. They are expanded
372by the following grammar rule:
373
374::
375
376   control_statement : frame_check
377                     | cancel_statement
378                     | if_statement
379                     | for_statement
380                     | fpart1_statement
381
382``if_statement``\ s require the emission of labels and branch instructions which
383effectively perform conditional jumps (``tcg_gen_brcondi``) according to the
384value of an expression. Note, the tinycode generators we produce for conditional
385statements do not perfectly mirror what would be expected in C, for instance we
386do not reproduce short-circuiting of the ``&&`` operator, and use of the ``||``
387operator is disallowed. All the predicated instructions, and in general all the
388instructions where there could be alternative values assigned to an ``lvalue``,
389like C-style ternary expressions:
390
391::
392
393   rvalue            : rvalue QMARK rvalue COLON rvalue
394
395are handled using the conditional move tinycode instruction
396(``tcg_gen_movcond``), which avoids the additional complexity of managing labels
397and jumps.
398
399Instead, regarding the ``for`` loops, exploiting the fact that they always
400iterate on immediate values, therefore their iteration ranges are always known
401at compile time, we implemented those emitting plain C ``for`` loops. This is
402possible because the loops will be executed in the QEMU code, leading to the
403consequential unrolling of the for loop, since the tinycode generator
404instructions will be executed multiple times, and the respective generated
405tinycode will represent the unrolled execution of the loop.
406
407Parsing Context
408~~~~~~~~~~~~~~~
409
410All the helper functions in ``idef-parser.y`` carry two fixed parameters, which
411are the parsing context ``c`` and the ``YYLLOC`` location information. The
412context is explicitly passed to all the functions because the parser we generate
413is a reentrant one, meaning that it does not have any global variable, and
414therefore the instruction compilation could easily be parallelized in the
415future. Finally for each rule we propagate information about the location of the
416involved tokens to generate pretty error reporting, able to highlight the
417portion of the input code which generated each error.
418
419Debugging
420---------
421
422Developing the idef-parser can lead to two types of errors: compile-time errors
423and parsing errors.
424
425Compile-time errors in Bison-generated parsers are usually due to conflicts in
426the described grammar. Conflicts forbid the grammar to produce a unique
427derivation tree, thus must be solved (except for the dangling else problem,
428which is marked as expected through the ``%expect 1`` Bison option).
429
430For solving conflicts you need a basic understanding of `shift-reduce conflicts
431<https://www.gnu.org/software/Bison/manual/html_node/Shift_002fReduce.html>`__
432and `reduce-reduce conflicts
433<https://www.gnu.org/software/Bison/manual/html_node/Reduce_002fReduce.html>`__,
434then, if you are using a Bison version > 3.7.1 you can ask Bison to generate
435some counterexamples which highlight ambiguous derivations, passing the
436``-Wcex`` option to Bison. In general shift/reduce conflicts are solved by
437redesigning the grammar in an unambiguous way or by setting the token priority
438correctly, while reduce/reduce conflicts are solved by redesigning the
439interested part of the grammar.
440
441Run-time errors can be divided between lexing and parsing errors, lexing errors
442are hard to detect, since the ``var`` token will catch everything which is not
443caught by other tokens, but easy to fix, because most of the time a simple
444regex editing will be enough.
445
446idef-parser features a fancy parsing error reporting scheme, which for each
447parsing error reports the fragment of the input text which was involved in the
448parsing rule that generated an error.
449
450Implementing an instruction goes through several sequential steps, here are some
451suggestions to make each instruction proceed to the next step.
452
453-  not-emitted
454
455   Means that the parsing of the input code relative to that instruction failed,
456   this could be due to a lexical error or to some mismatch between the order of
457   valid tokens and a parser rule. You should check that tokens are correctly
458   identified and mapped, and that there is a rule matching the token sequence
459   that you need to parse.
460
461-  emitted
462
463   This instruction class contains all the instructions which are emitted but
464   fail to compile when included in QEMU. The compilation errors are shown by
465   the QEMU building process and will lead to fixing the bug.  Most common
466   errors regard the mismatch of parameters for tinycode generator functions,
467   which boil down to errors in the idef-parser type system.
468
469-  compiled
470
471   These instruction generate valid tinycode generator code, which however fail
472   the QEMU or the harness tests, these cases must be handled manually by
473   looking into the failing tests and looking at the generated tinycode
474   generator instruction and at the generated tinycode itself. Tip: handle the
475   failing harness tests first, because they usually feature only a single
476   instruction, thus will require less execution trace navigation. If a
477   multi-threaded test fail, fixing all the other tests will be the easier
478   option, hoping that the multi-threaded one will be indirectly fixed.
479
480   An example of debugging this type of failure is provided in the following
481   section.
482
483-  tests-passed
484
485   This is the final goal for each instruction, meaning that the instruction
486   passes the test suite.
487
488Another approach to fix QEMU system test, where many instructions might fail, is
489to compare the execution trace of your implementation with the reference
490implementations already present in QEMU. To do so you should obtain a QEMU build
491where the instruction pass the test, and run it with the following command:
492
493::
494
495   sudo unshare -p sudo -u <USER> bash -c \
496   'env -i <qemu-hexagon full path> -d cpu <TEST>'
497
498And do the same for your implementation, the generated execution traces will be
499inherently aligned and can be inspected for behavioral differences using the
500``diff`` tool.
501
502Example of debugging erroneous tinycode generator code
503~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
504
505The goal of this section is to provide a complete example of debugging
506incorrectly emitted tinycode generator for a single instruction.
507
508Let's first introduce a bug in the tinycode generator of the ``A2_add``
509instruction,
510
511::
512
513    void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV,
514                     TCGv_i32 RsV, TCGv_i32 RtV)
515    /*  RdV=RsV+RtV;} */
516    {
517        TCGv_i32 tmp_0 = tcg_temp_new_i32();
518        tcg_gen_add_i32(tmp_0, RsV, RsV);
519        tcg_gen_mov_i32(RdV, tmp_0);
520    }
521
522Here the bug, albeit hard to spot, is in ``tcg_gen_add_i32(tmp_0, RsV, RsV);``
523where we compute ``RsV + RsV`` instead of ``RsV + RtV``, as would be expected.
524This particular bug is a bit tricky to pinpoint when debugging, since the
525``A2_add`` instruction is so ubiquitous. As a result, pretty much all tests will
526fail and therefore not provide a lot of information about the bug.
527
528For example, let's run the ``check-tcg`` tests
529
530::
531
532    make check-tcg TIMEOUT=1200 \
533                   DOCKER_IMAGE=debian-hexagon-cross \
534                   ENGINE=podman V=1 \
535                   DOCKER_CROSS_CC_GUEST=hexagon-unknown-linux-musl-clang
536
537In the output, we find a failure in the very first test case ``float_convs``
538due to a segmentation fault. Similarly, all harness and libc tests will fail as
539well. At this point we have no clue where the actual bug lies, and need to start
540ruling out instructions. As such a good starting point is to utilize the debug
541options ``-d in_asm,cpu`` of QEMU to inspect the Hexagon instructions being run,
542alongside the CPU state. We additionally need a working version of the emulator
543to compare our buggy CPU state against, running
544
545::
546
547    meson configure -Dhexagon_idef_parser=false
548
549will disable the idef-parser for all instructions and fallback on manual
550tinycode generator overrides, or on helper function implementations. Recompiling
551gives us ``qemu-hexagon`` which passes all tests. If ``qemu-hexagon-buggy`` is
552our binary with the incorrect tinycode generators, we can compare the CPU state
553between the two versions
554
555::
556
557    ./qemu-hexagon-buggy -d in_asm,cpu float_convs &> out_buggy
558    ./qemu-hexagon       -d in_asm,cpu float_convs &> out_working
559
560Looking at ``diff -u out_buggy out_working`` shows us that the CPU state begins
561to diverge on line 141, with an incorrect value in the ``R1`` register
562
563::
564
565    @@ -138,7 +138,7 @@
566
567     General Purpose Registers = {
568       r0 = 0x4100f9c0
569    -  r1 = 0x00042108
570    +  r1 = 0x00000000
571       r2 = 0x00021084
572       r3 = 0x00000000
573       r4 = 0x00000000
574
575If we also look into ``out_buggy`` directly we can inspect the input assembly
576which the caused the incorrect CPU state, around line 141 we find
577
578::
579
580    116 |  ----------------
581    117 |  IN: _start_c
582    118 |  0x000210b0:  0xa09dc002	{	allocframe(R29,#0x10):raw }
583    ... |  ...
584    137 |  0x000210fc:  0x5a00c4aa	{	call PC+2388 }
585    138 |
586    139 |  General Purpose Registers = {
587    140 |    r0 = 0x4100fa70
588    141 |    r1 = 0x00042108
589    142 |    r2 = 0x00021084
590    143 |    r3 = 0x00000000
591
592Importantly, we see some Hexagon assembly followed by a dump of the CPU state,
593now the CPU state is actually dumped before the input assembly above is ran.
594As such, we are actually interested in the instructions ran before this.
595
596Scrolling up a bit, we find
597
598::
599
600    54 |  ----------------
601    55 |  IN: _start
602    56 |  0x00021088:  0x6a09c002	{	R2 = C9/pc }
603    57 |  0x0002108c:  0xbfe2ff82	{	R2 = add(R2,#0xfffffffc) }
604    58 |  0x00021090:  0x9182c001	{	R1 = memw(R2+#0x0) }
605    59 |  0x00021094:  0xf302c101	{	R1 = add(R2,R1) }
606    60 |  0x00021098:  0x7800c01e	{	R30 = #0x0 }
607    61 |  0x0002109c:  0x707dc000	{	R0 = R29 }
608    62 |  0x000210a0:  0x763dfe1d	{	R29 = and(R29,#0xfffffff0) }
609    63 |  0x000210a4:  0xa79dfdfe	{	memw(R29+#0xfffffff8) = R29 }
610    64 |  0x000210a8:  0xbffdff1d	{	R29 = add(R29,#0xfffffff8) }
611    65 |  0x000210ac:  0x5a00c002	{	call PC+4 }
612    66 |
613    67 |  General Purpose Registers = {
614    68 |    r0 = 0x00000000
615    69 |    r1 = 0x00000000
616    70 |    r2 = 0x00000000
617    71 |    r3 = 0x00000000
618
619Remember, the instructions on lines 56-65 are ran on the CPU state shown below
620instructions, and as the CPU state has not diverged at this point, we know the
621starting state is accurate. The bug must then lie within the instructions shown
622here. Next we may notice that ``R1`` is only touched by lines 57 and 58, that is
623by
624
625::
626
627    58 |  0x00021090:  0x9182c001	{	R1 = memw(R2+#0x0) }
628    59 |  0x00021094:  0xf302c101	{	R1 = add(R2,R1) }
629
630Therefore, we are either dealing with an correct load instruction
631``R1 = memw(R2+#0x0)`` or with an incorrect add ``R1 = add(R2,R1)``. At this
632point it might be easy enough to go directly to the emitted code for the
633instructions mentioned and look for bugs, but we could also run
634``./qemu-heaxgon -d op,in_asm float_conv`` where we find for the following
635tinycode for the Hexagon ``add`` instruction
636
637::
638
639   ---- 00021094
640   mov_i32 pkt_has_store_s1,$0x0
641   add_i32 tmp0,r2,r2
642   mov_i32 loc2,tmp0
643   mov_i32 new_r1,loc2
644   mov_i32 r1,new_r1
645
646Here we have finally located our bug ``add_i32 tmp0,r2,r2``.
647
648Limitations and Future Development
649----------------------------------
650
651The main limitation of the current parser is given by the syntax-driven nature
652of the Bison-generated parsers. This has the severe implication of only being
653able to generate code in the order of evaluation of the various rules, without,
654in any case, being able to backtrack and alter the generated code.
655
656An example limitation is highlighted by this statement of the input language:
657
658::
659
660   { (PsV==0xff) ? (PdV=0xff) : (PdV=0x00); }
661
662This ternary assignment, when written in this form requires us to emit some
663proper control flow statements, which emit a jump to the first or to the second
664code block, whose implementation is extremely convoluted, because when matching
665the ternary assignment, the code evaluating the two assignments will be already
666generated.
667
668Instead we pre-process that statement, making it become:
669
670::
671
672   { PdV = ((PsV==0xff)) ? 0xff : 0x00; }
673
674Which can be easily matched by the following parser rules:
675
676::
677
678   statement             | rvalue ';'
679
680   rvalue                : rvalue QMARK rvalue COLON rvalue
681                         | rvalue EQ rvalue
682                         | LPAR rvalue RPAR
683                         | assign_statement
684                         | IMM
685
686   assign_statement      : pred ASSIGN rvalue
687
688Another example that highlight the limitation of the flex/bison parser can be
689found even in the add operation we already saw:
690
691::
692
693   TCGv_i32 tmp_0 = tcg_temp_new_i32();
694   tcg_gen_add_i32(tmp_0, RsV, RtV);
695   tcg_gen_mov_i32(RdV, tmp_0);
696
697The fact that we cannot directly use ``RdV`` as the destination of the sum is a
698consequence of the syntax-driven nature of the parser. In fact when we parse the
699assignment, the ``rvalue`` token, representing the sum has already been reduced,
700and thus its code emitted and unchangeable. We rely on the fact that QEMU will
701optimize our code reducing the useless move operations and the relative
702temporaries.
703
704A possible improvement of the parser regards the support for vectorial
705instructions and floating point instructions, which will require the extension
706of the scanner, the parser, and a partial re-design of the type system, allowing
707to build the vectorial semantics over the available vectorial tinycode generator
708primitives.
709
710A more radical improvement will use the parser, not to generate directly the
711tinycode generator code, but to generate an intermediate representation like the
712LLVM IR, which in turn could be compiled using the clang TCG backend. That code
713could be furtherly optimized, overcoming the limitations of the syntax-driven
714parsing and could lead to a more optimized generated code.
715