Name
Date
Size
#Lines
LOC

..--

README.rstH A D25-Sep-202427.6 KiB715536

idef-parser.hH A D09-Jun-20249.3 KiB242125

idef-parser.lexH A D18-May-202323.2 KiB476444

idef-parser.yH A D08-Aug-202428.9 KiB913801

macros.h.incH A D25-Sep-20245.3 KiB132114

parser-helpers.cH A D09-Jun-202470.3 KiB2,1491,757

parser-helpers.hH A D09-Jun-202411 KiB354223

prepareH A D28-Nov-2023809 253

README.rst

1 Hexagon ISA instruction definitions to tinycode generator compiler
2 ------------------------------------------------------------------
3 
4 idef-parser is a small compiler able to translate the Hexagon ISA description
5 language into tinycode generator code, that can be easily integrated into QEMU.
6 
7 Compilation Example
8 -------------------
9 
10 To better understand the scope of the idef-parser, we'll explore an applicative
11 example. Let's start by one of the simplest Hexagon instruction: the ``add``.
12 
13 The ISA description language represents the ``add`` instruction as
14 follows:
15 
16 .. code:: c
17 
18    A2_add(RdV, in RsV, in RtV) {
19        { RdV=RsV+RtV;}
20    }
21 
22 idef-parser will compile the above code into the following code:
23 
24 .. code:: c
25 
26    /* A2_add */
27    void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV,
28                     TCGv_i32 RsV, TCGv_i32 RtV)
29    /*  { RdV=RsV+RtV;} */
30    {
31        TCGv_i32 tmp_0 = tcg_temp_new_i32();
32        tcg_gen_add_i32(tmp_0, RsV, RtV);
33        tcg_gen_mov_i32(RdV, tmp_0);
34    }
35 
36 The output of the compilation process will be a function, containing the
37 tinycode generator code, implementing the correct semantics. That function will
38 not access any global variable, because all the accessed data structures will be
39 passed explicitly as function parameters. Among the passed parameters we will
40 have TCGv (tinycode variables) representing the input and output registers of
41 the architecture, integers representing the immediates that come from the code,
42 and other data structures which hold information about the disassemblation
43 context (``DisasContext`` struct).
44 
45 Let's begin by describing the input code. The ``add`` instruction is associated
46 with a unique identifier, in this case ``A2_add``, which allows to distinguish
47 variants of the same instruction, and expresses the class to which the
48 instruction belongs, in this case ``A2`` corresponds to the Hexagon
49 ``ALU32/ALU`` instruction subclass.
50 
51 After the instruction identifier, we have a series of parameters that represents
52 TCG variables that will be passed to the generated function. Parameters marked
53 with ``in`` are already initialized, while the others are output parameters.
54 
55 We will leverage this information to infer several information:
56 
57 -  Fill in the output function signature with the correct TCGv registers
58 -  Fill in the output function signature with the immediate integers
59 -  Keep track of which registers, among the declared one, have been
60    initialized
61 
62 Let's now observe the actual instruction description code, in this case:
63 
64 .. code:: c
65 
66    { RdV=RsV+RtV;}
67 
68 This code is composed by a subset of the C syntax, and is the result of the
69 application of some macro definitions contained in the ``macros.h`` file.
70 
71 This file is used to reduce the complexity of the input language where complex
72 variants of similar constructs can be mapped to a unique primitive, so that the
73 idef-parser has to handle a lower number of computation primitives.
74 
75 As you may notice, the description code modifies the registers which have been
76 declared by the declaration statements. In this case all the three registers
77 will be declared, ``RsV`` and ``RtV`` will also be read and ``RdV`` will be
78 written.
79 
80 Now let's have a quick look at the generated code, line by line.
81 
82 ::
83 
84    TCGv_i32 tmp_0 = tcg_temp_new_i32();
85 
86 This code starts by declaring a temporary TCGv to hold the result from the sum
87 operation.
88 
89 ::
90 
91    tcg_gen_add_i32(tmp_0, RsV, RtV);
92 
93 Then, we are generating the sum tinycode operator between the selected
94 registers, storing the result in the just declared temporary.
95 
96 ::
97 
98    tcg_gen_mov_i32(RdV, tmp_0);
99 
100 The result of the addition is now stored in the temporary, we move it into the
101 correct destination register. This code may seem inefficient, but QEMU will
102 perform some optimizations on the tinycode, reducing the unnecessary copy.
103 
104 Parser Input
105 ------------
106 
107 Before moving on to the structure of idef-parser itself, let us spend some words
108 on its' input. There are two preprocessing steps applied to the generated
109 instruction semantics in ``semantics_generated.pyinc`` that we need to consider.
110 Firstly,
111 
112 ::
113 
114     gen_idef_parser_funcs.py
115 
116 which takes instruction semantics in ``semantics_generated.pyinc`` to C-like
117 pseudo code, output into ``idef_parser_input.h.inc``. For instance, the
118 ``J2_jumpr`` instruction which jumps to an address stored in a register
119 argument. This is instruction is defined as
120 
121 ::
122 
123     SEMANTICS( \
124         "J2_jumpr", \
125         "jumpr Rs32", \
126         """{fJUMPR(RsN,RsV,COF_TYPE_JUMPR);}""" \
127     )
128 
129 in ``semantics_generated.pyinc``. Running ``gen_idef_parser_funcs.py``
130 we obtain the pseudo code
131 
132 ::
133 
134     J2_jumpr(in RsV) {
135         {fJUMPR(RsN,RsV,COF_TYPE_JUMPR);}
136     }
137 
138 with macros such as ``fJUMPR`` intact.
139 
140 The second step is to expand macros into a form suitable for our parser.
141 These macros are defined in ``idef-parser/macros.h.inc`` and the step is
142 carried out by the ``prepare`` script which runs the C preprocessor on
143 ``idef_parser_input.h.inc`` to produce
144 ``idef_parser_input.preprocessed.h.inc``.
145 
146 To finish the above example, after preprocessing ``J2_jumpr`` we obtain
147 
148 ::
149 
150     J2_jumpr(in RsV) {
151         {(PC = RsV);}
152     }
153 
154 where ``fJUMPR(RsN,RsV,COF_TYPE_JUMPR);`` was expanded to ``(PC = RsV)``,
155 signifying a write to the Program Counter ``PC``.  Note, that ``PC`` in
156 this expression is not a variable in the strict C sense since it is not
157 declared anywhere, but rather a symbol which is easy to match in
158 idef-parser later on.
159 
160 Parser Structure
161 ----------------
162 
163 The idef-parser is built using the ``flex`` and ``bison``.
164 
165 ``flex`` is used to split the input string into tokens, each described using a
166 regular expression. The token description is contained in the
167 ``idef-parser.lex`` source file. The flex-generated scanner takes care also to
168 extract from the input text other meaningful information, e.g., the numerical
169 value in case of an immediate constant, and decorates the token with the
170 extracted information.
171 
172 ``bison`` is used to generate the actual parser, starting from the parsing
173 description contained in the ``idef-parser.y`` file. The generated parser
174 executes the ``main`` function at the end of the ``idef-parser.y`` file, which
175 opens input and output files, creates the parsing context, and eventually calls
176 the ``yyparse()`` function, which starts the execution of the LALR(1) parser
177 (see `Wikipedia <https://en.wikipedia.org/wiki/LALR_parser>`__ for more
178 information about LALR parsing techniques). The LALR(1) parser, whenever it has
179 to shift a token, calls the ``yylex()`` function, which is defined by the
180 flex-generated code, and reads the input file returning the next scanned token.
181 
182 The tokens are mapped on the source language grammar, defined in the
183 ``idef-parser.y`` file to build a unique syntactic tree, according to the
184 specified operator precedences and associativity rules.
185 
186 The grammar describes the whole file which contains the Hexagon instruction
187 descriptions, therefore it starts from the ``input`` nonterminal, which is a
188 list of instructions, each instruction is represented by the following grammar
189 rule, representing the structure of the input file shown above:
190 
191 ::
192 
193    instruction : INAME arguments code
194                | error
195 
196    arguments : '(' ')'
197              | '(' argument_list ')';
198 
199    argument_list : argument_decl ',' argument_list
200                  | argument_decl
201 
202    argument_decl : REG
203                  | PRED
204                  | IN REG
205                  | IN PRED
206                  | IMM
207                  | var
208                  ;
209 
210    code        : '{' statements '}'
211 
212    statements  : statements statement
213                | statement
214 
215    statement   : control_statement
216                | var_decl ';'
217                | rvalue ';'
218                | code_block
219                | ';'
220 
221    code_block  : '{' statements '}'
222                | '{' '}'
223 
224 With this initial portion of the grammar we are defining the instruction, its'
225 arguments, and its' statements. Each argument is defined by the
226 ``argument_decl`` rule, and can be either
227 
228 ::
229 
230     Description                  Example
231     ----------------------------------------
232     output register              RsV
233     output predicate register    P0
234     input register               in RsV
235     input predicate register     in P0
236     immediate value              1234
237     local variable               EA
238 
239 Note, the only local variable allowed to be used as an argument is the effective
240 address ``EA``. Similarly, each statement can be a ``control_statement``, a
241 variable declaration such as ``int a;``, a code block, which is just a
242 bracket-enclosed list of statements, a ``';'``, which is a ``nop`` instruction,
243 and an ``rvalue ';'``.
244 
245 Expressions
246 ~~~~~~~~~~~
247 
248 Allowed in the input code are C language expressions with a few exceptions
249 to simplify parsing. For instance, variable names such as ``RdV``, ``RssV``,
250 ``PdV``, ``CsV``, and other idiomatic register names from Hexagon, are
251 reserved specifically for register arguments. These arguments then map to
252 ``TCGv_i32`` or ``TCGv_i64`` depending on the register size. Similarly, ``UiV``,
253 ``riV``, etc. refer to immediate arguments and will map to C integers.
254 
255 Also, as mentioned earlier, the names ``PC``, ``SP``, ``FP``, etc. are used to
256 refer to Hexagon registers such as the program counter, stack pointer, and frame
257 pointer seen here. Writes to these registers then correspond to assignments
258 ``PC = ...``, and reads correspond to uses of the variable ``PC``.
259 
260 Moreover, another example of one such exception is the selective expansion of
261 macros present in ``macros.h``. As an example, consider the ``fABS`` macro which
262 in plain C is defined as
263 
264 ::
265 
266     #define fABS(A) (((A) < 0) ? (-(A)) : (A))
267 
268 and returns the absolute value of the argument ``A``. This macro is not included
269 in ``idef-parser/macros.h.inc`` and as such is not expanded and kept as a "call"
270 ``fABS(...)``. Reason being, that ``fABS`` is easier to match and map to
271 ``tcg_gen_abs_<width>``, compared to the full ternary expression above. Loads of
272 macros in ``macros.h`` are kept unexpanded to aid in parsing, as seen in the
273 example above, for more information see ``idef-parser/idef-parser.lex``.
274 
275 Finally, in mapping these input expressions to tinycode generators, idef-parser
276 tries to perform as much as possible in plain C. Such as, performing binary
277 operations in C instead of tinycode generators, thus effectively constant
278 folding the expression.
279 
280 Variables and Variable Declarations
281 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
282 
283 Similarly to C, variables in the input code must be explicitly declared, such as
284 ``int var1;`` which declares an uninitialized variable ``var1``. Initialization
285 ``int var2 = 0;`` is also allowed and behaves as expected. In tinycode
286 generators the previous declarations are mapped to
287 
288 ::
289 
290     int var1;           ->      TCGv_i32 var1 = tcg_temp_new_i32();
291 
292     int var2 = 0;       ->      TCGv_i32 var1 = tcg_temp_new_i32();
293                                 tcg_gen_movi_i32(j, ((int64_t) 0ULL));
294 
295 which are later automatically freed at the end of the function they're declared
296 in. Contrary to C, we only allow variables to be declared with an integer type
297 specified in the following table (without permutation of keywords)
298 
299 ::
300 
301     type                        bit-width    signedness
302     ----------------------------------------------------------
303     int                         32           signed
304     signed
305     signed int
306 
307     unsigned                    32           unsigned
308     unsigned int
309 
310     long                        64           signed
311     long int
312     signed long
313     signed long int
314 
315     unsigned long               64           unsigned
316     unsigned long int
317 
318     long long                   64           signed
319     long long int
320     signed long long
321     signed long long int
322 
323     unsigned long long          64           unsigned
324     unsigned long long int
325 
326     size[1,2,4,8][s,u]_t        8-64         signed or unsigned
327 
328 In idef-parser, variable names are matched by a generic ``VARID`` token,
329 which will feature the variable name as a decoration. For a variable declaration
330 idef-parser calls ``gen_varid_allocate`` with the ``VARID`` token to save the
331 name, size, and bit width of the newly declared variable. In addition, this
332 function also ensures that variables aren't declared multiple times, and prints
333 and error message if that is the case. Upon use of a variable, the ``VARID``
334 token is used to lookup the size and bit width of the variable.
335 
336 Type System
337 ~~~~~~~~~~~
338 
339 idef-parser features a simple type system which is used to correctly implement
340 the signedness and bit width of the operations.
341 
342 The type of each ``rvalue`` is determined by two attributes: its bit width
343 (``unsigned bit_width``) and its signedness (``HexSignedness signedness``).
344 
345 For each operation, the type of ``rvalue``\ s influence the way in which the
346 operands are handled and emitted. For example a right shift between signed
347 operators will be an arithmetic shift, while one between unsigned operators
348 will be a logical shift. If one of the two operands is signed, and the other
349 is unsigned, the operation will be signed.
350 
351 The bit width also influences the outcome of the operations, in particular while
352 the input languages features a fine granularity type system, with types of 8,
353 16, 32, 64 (and more for vectorial instructions) bits, the tinycode only
354 features 32 and 64 bit widths. We propagate as much as possible the fine
355 granularity type, until the value has to be used inside an operation between
356 ``rvalue``\ s; in that case if one of the two operands is greater than 32 bits
357 we promote the whole operation to 64 bit, taking care of properly extending the
358 two operands. Fortunately, the most critical instructions already feature
359 explicit casts and zero/sign extensions which are properly propagated down to
360 our parser.
361 
362 The combination of ``rvalue``\ s are handled through the use of the
363 ``gen_bin_op`` and ``gen_bin_cmp`` helper functions. These two functions handle
364 the appropriate compile-time or run-time emission of operations to perform the
365 required computation.
366 
367 Control Statements
368 ~~~~~~~~~~~~~~~~~~
369 
370 ``control_statement``\ s are all the statements which modify the order of
371 execution of the generated code according to input parameters. They are expanded
372 by the following grammar rule:
373 
374 ::
375 
376    control_statement : frame_check
377                      | cancel_statement
378                      | if_statement
379                      | for_statement
380                      | fpart1_statement
381 
382 ``if_statement``\ s require the emission of labels and branch instructions which
383 effectively perform conditional jumps (``tcg_gen_brcondi``) according to the
384 value of an expression. Note, the tinycode generators we produce for conditional
385 statements do not perfectly mirror what would be expected in C, for instance we
386 do not reproduce short-circuiting of the ``&&`` operator, and use of the ``||``
387 operator is disallowed. All the predicated instructions, and in general all the
388 instructions where there could be alternative values assigned to an ``lvalue``,
389 like C-style ternary expressions:
390 
391 ::
392 
393    rvalue            : rvalue QMARK rvalue COLON rvalue
394 
395 are handled using the conditional move tinycode instruction
396 (``tcg_gen_movcond``), which avoids the additional complexity of managing labels
397 and jumps.
398 
399 Instead, regarding the ``for`` loops, exploiting the fact that they always
400 iterate on immediate values, therefore their iteration ranges are always known
401 at compile time, we implemented those emitting plain C ``for`` loops. This is
402 possible because the loops will be executed in the QEMU code, leading to the
403 consequential unrolling of the for loop, since the tinycode generator
404 instructions will be executed multiple times, and the respective generated
405 tinycode will represent the unrolled execution of the loop.
406 
407 Parsing Context
408 ~~~~~~~~~~~~~~~
409 
410 All the helper functions in ``idef-parser.y`` carry two fixed parameters, which
411 are the parsing context ``c`` and the ``YYLLOC`` location information. The
412 context is explicitly passed to all the functions because the parser we generate
413 is a reentrant one, meaning that it does not have any global variable, and
414 therefore the instruction compilation could easily be parallelized in the
415 future. Finally for each rule we propagate information about the location of the
416 involved tokens to generate pretty error reporting, able to highlight the
417 portion of the input code which generated each error.
418 
419 Debugging
420 ---------
421 
422 Developing the idef-parser can lead to two types of errors: compile-time errors
423 and parsing errors.
424 
425 Compile-time errors in Bison-generated parsers are usually due to conflicts in
426 the described grammar. Conflicts forbid the grammar to produce a unique
427 derivation tree, thus must be solved (except for the dangling else problem,
428 which is marked as expected through the ``%expect 1`` Bison option).
429 
430 For solving conflicts you need a basic understanding of `shift-reduce conflicts
431 <https://www.gnu.org/software/Bison/manual/html_node/Shift_002fReduce.html>`__
432 and `reduce-reduce conflicts
433 <https://www.gnu.org/software/Bison/manual/html_node/Reduce_002fReduce.html>`__,
434 then, if you are using a Bison version > 3.7.1 you can ask Bison to generate
435 some counterexamples which highlight ambiguous derivations, passing the
436 ``-Wcex`` option to Bison. In general shift/reduce conflicts are solved by
437 redesigning the grammar in an unambiguous way or by setting the token priority
438 correctly, while reduce/reduce conflicts are solved by redesigning the
439 interested part of the grammar.
440 
441 Run-time errors can be divided between lexing and parsing errors, lexing errors
442 are hard to detect, since the ``var`` token will catch everything which is not
443 caught by other tokens, but easy to fix, because most of the time a simple
444 regex editing will be enough.
445 
446 idef-parser features a fancy parsing error reporting scheme, which for each
447 parsing error reports the fragment of the input text which was involved in the
448 parsing rule that generated an error.
449 
450 Implementing an instruction goes through several sequential steps, here are some
451 suggestions to make each instruction proceed to the next step.
452 
453 -  not-emitted
454 
455    Means that the parsing of the input code relative to that instruction failed,
456    this could be due to a lexical error or to some mismatch between the order of
457    valid tokens and a parser rule. You should check that tokens are correctly
458    identified and mapped, and that there is a rule matching the token sequence
459    that you need to parse.
460 
461 -  emitted
462 
463    This instruction class contains all the instructions which are emitted but
464    fail to compile when included in QEMU. The compilation errors are shown by
465    the QEMU building process and will lead to fixing the bug.  Most common
466    errors regard the mismatch of parameters for tinycode generator functions,
467    which boil down to errors in the idef-parser type system.
468 
469 -  compiled
470 
471    These instruction generate valid tinycode generator code, which however fail
472    the QEMU or the harness tests, these cases must be handled manually by
473    looking into the failing tests and looking at the generated tinycode
474    generator instruction and at the generated tinycode itself. Tip: handle the
475    failing harness tests first, because they usually feature only a single
476    instruction, thus will require less execution trace navigation. If a
477    multi-threaded test fail, fixing all the other tests will be the easier
478    option, hoping that the multi-threaded one will be indirectly fixed.
479 
480    An example of debugging this type of failure is provided in the following
481    section.
482 
483 -  tests-passed
484 
485    This is the final goal for each instruction, meaning that the instruction
486    passes the test suite.
487 
488 Another approach to fix QEMU system test, where many instructions might fail, is
489 to compare the execution trace of your implementation with the reference
490 implementations already present in QEMU. To do so you should obtain a QEMU build
491 where the instruction pass the test, and run it with the following command:
492 
493 ::
494 
495    sudo unshare -p sudo -u <USER> bash -c \
496    'env -i <qemu-hexagon full path> -d cpu <TEST>'
497 
498 And do the same for your implementation, the generated execution traces will be
499 inherently aligned and can be inspected for behavioral differences using the
500 ``diff`` tool.
501 
502 Example of debugging erroneous tinycode generator code
503 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
504 
505 The goal of this section is to provide a complete example of debugging
506 incorrectly emitted tinycode generator for a single instruction.
507 
508 Let's first introduce a bug in the tinycode generator of the ``A2_add``
509 instruction,
510 
511 ::
512 
513     void emit_A2_add(DisasContext *ctx, Insn *insn, Packet *pkt, TCGv_i32 RdV,
514                      TCGv_i32 RsV, TCGv_i32 RtV)
515     /*  RdV=RsV+RtV;} */
516     {
517         TCGv_i32 tmp_0 = tcg_temp_new_i32();
518         tcg_gen_add_i32(tmp_0, RsV, RsV);
519         tcg_gen_mov_i32(RdV, tmp_0);
520     }
521 
522 Here the bug, albeit hard to spot, is in ``tcg_gen_add_i32(tmp_0, RsV, RsV);``
523 where we compute ``RsV + RsV`` instead of ``RsV + RtV``, as would be expected.
524 This particular bug is a bit tricky to pinpoint when debugging, since the
525 ``A2_add`` instruction is so ubiquitous. As a result, pretty much all tests will
526 fail and therefore not provide a lot of information about the bug.
527 
528 For example, let's run the ``check-tcg`` tests
529 
530 ::
531 
532     make check-tcg TIMEOUT=1200 \
533                    DOCKER_IMAGE=debian-hexagon-cross \
534                    ENGINE=podman V=1 \
535                    DOCKER_CROSS_CC_GUEST=hexagon-unknown-linux-musl-clang
536 
537 In the output, we find a failure in the very first test case ``float_convs``
538 due to a segmentation fault. Similarly, all harness and libc tests will fail as
539 well. At this point we have no clue where the actual bug lies, and need to start
540 ruling out instructions. As such a good starting point is to utilize the debug
541 options ``-d in_asm,cpu`` of QEMU to inspect the Hexagon instructions being run,
542 alongside the CPU state. We additionally need a working version of the emulator
543 to compare our buggy CPU state against, running
544 
545 ::
546 
547     meson configure -Dhexagon_idef_parser=false
548 
549 will disable the idef-parser for all instructions and fallback on manual
550 tinycode generator overrides, or on helper function implementations. Recompiling
551 gives us ``qemu-hexagon`` which passes all tests. If ``qemu-hexagon-buggy`` is
552 our binary with the incorrect tinycode generators, we can compare the CPU state
553 between the two versions
554 
555 ::
556 
557     ./qemu-hexagon-buggy -d in_asm,cpu float_convs &> out_buggy
558     ./qemu-hexagon       -d in_asm,cpu float_convs &> out_working
559 
560 Looking at ``diff -u out_buggy out_working`` shows us that the CPU state begins
561 to diverge on line 141, with an incorrect value in the ``R1`` register
562 
563 ::
564 
565     @@ -138,7 +138,7 @@
566 
567      General Purpose Registers = {
568        r0 = 0x4100f9c0
569     -  r1 = 0x00042108
570     +  r1 = 0x00000000
571        r2 = 0x00021084
572        r3 = 0x00000000
573        r4 = 0x00000000
574 
575 If we also look into ``out_buggy`` directly we can inspect the input assembly
576 which the caused the incorrect CPU state, around line 141 we find
577 
578 ::
579 
580     116 |  ----------------
581     117 |  IN: _start_c
582     118 |  0x000210b0:  0xa09dc002	{	allocframe(R29,#0x10):raw }
583     ... |  ...
584     137 |  0x000210fc:  0x5a00c4aa	{	call PC+2388 }
585     138 |
586     139 |  General Purpose Registers = {
587     140 |    r0 = 0x4100fa70
588     141 |    r1 = 0x00042108
589     142 |    r2 = 0x00021084
590     143 |    r3 = 0x00000000
591 
592 Importantly, we see some Hexagon assembly followed by a dump of the CPU state,
593 now the CPU state is actually dumped before the input assembly above is ran.
594 As such, we are actually interested in the instructions ran before this.
595 
596 Scrolling up a bit, we find
597 
598 ::
599 
600     54 |  ----------------
601     55 |  IN: _start
602     56 |  0x00021088:  0x6a09c002	{	R2 = C9/pc }
603     57 |  0x0002108c:  0xbfe2ff82	{	R2 = add(R2,#0xfffffffc) }
604     58 |  0x00021090:  0x9182c001	{	R1 = memw(R2+#0x0) }
605     59 |  0x00021094:  0xf302c101	{	R1 = add(R2,R1) }
606     60 |  0x00021098:  0x7800c01e	{	R30 = #0x0 }
607     61 |  0x0002109c:  0x707dc000	{	R0 = R29 }
608     62 |  0x000210a0:  0x763dfe1d	{	R29 = and(R29,#0xfffffff0) }
609     63 |  0x000210a4:  0xa79dfdfe	{	memw(R29+#0xfffffff8) = R29 }
610     64 |  0x000210a8:  0xbffdff1d	{	R29 = add(R29,#0xfffffff8) }
611     65 |  0x000210ac:  0x5a00c002	{	call PC+4 }
612     66 |
613     67 |  General Purpose Registers = {
614     68 |    r0 = 0x00000000
615     69 |    r1 = 0x00000000
616     70 |    r2 = 0x00000000
617     71 |    r3 = 0x00000000
618 
619 Remember, the instructions on lines 56-65 are ran on the CPU state shown below
620 instructions, and as the CPU state has not diverged at this point, we know the
621 starting state is accurate. The bug must then lie within the instructions shown
622 here. Next we may notice that ``R1`` is only touched by lines 57 and 58, that is
623 by
624 
625 ::
626 
627     58 |  0x00021090:  0x9182c001	{	R1 = memw(R2+#0x0) }
628     59 |  0x00021094:  0xf302c101	{	R1 = add(R2,R1) }
629 
630 Therefore, we are either dealing with an correct load instruction
631 ``R1 = memw(R2+#0x0)`` or with an incorrect add ``R1 = add(R2,R1)``. At this
632 point it might be easy enough to go directly to the emitted code for the
633 instructions mentioned and look for bugs, but we could also run
634 ``./qemu-heaxgon -d op,in_asm float_conv`` where we find for the following
635 tinycode for the Hexagon ``add`` instruction
636 
637 ::
638 
639    ---- 00021094
640    mov_i32 pkt_has_store_s1,$0x0
641    add_i32 tmp0,r2,r2
642    mov_i32 loc2,tmp0
643    mov_i32 new_r1,loc2
644    mov_i32 r1,new_r1
645 
646 Here we have finally located our bug ``add_i32 tmp0,r2,r2``.
647 
648 Limitations and Future Development
649 ----------------------------------
650 
651 The main limitation of the current parser is given by the syntax-driven nature
652 of the Bison-generated parsers. This has the severe implication of only being
653 able to generate code in the order of evaluation of the various rules, without,
654 in any case, being able to backtrack and alter the generated code.
655 
656 An example limitation is highlighted by this statement of the input language:
657 
658 ::
659 
660    { (PsV==0xff) ? (PdV=0xff) : (PdV=0x00); }
661 
662 This ternary assignment, when written in this form requires us to emit some
663 proper control flow statements, which emit a jump to the first or to the second
664 code block, whose implementation is extremely convoluted, because when matching
665 the ternary assignment, the code evaluating the two assignments will be already
666 generated.
667 
668 Instead we pre-process that statement, making it become:
669 
670 ::
671 
672    { PdV = ((PsV==0xff)) ? 0xff : 0x00; }
673 
674 Which can be easily matched by the following parser rules:
675 
676 ::
677 
678    statement             | rvalue ';'
679 
680    rvalue                : rvalue QMARK rvalue COLON rvalue
681                          | rvalue EQ rvalue
682                          | LPAR rvalue RPAR
683                          | assign_statement
684                          | IMM
685 
686    assign_statement      : pred ASSIGN rvalue
687 
688 Another example that highlight the limitation of the flex/bison parser can be
689 found even in the add operation we already saw:
690 
691 ::
692 
693    TCGv_i32 tmp_0 = tcg_temp_new_i32();
694    tcg_gen_add_i32(tmp_0, RsV, RtV);
695    tcg_gen_mov_i32(RdV, tmp_0);
696 
697 The fact that we cannot directly use ``RdV`` as the destination of the sum is a
698 consequence of the syntax-driven nature of the parser. In fact when we parse the
699 assignment, the ``rvalue`` token, representing the sum has already been reduced,
700 and thus its code emitted and unchangeable. We rely on the fact that QEMU will
701 optimize our code reducing the useless move operations and the relative
702 temporaries.
703 
704 A possible improvement of the parser regards the support for vectorial
705 instructions and floating point instructions, which will require the extension
706 of the scanner, the parser, and a partial re-design of the type system, allowing
707 to build the vectorial semantics over the available vectorial tinycode generator
708 primitives.
709 
710 A more radical improvement will use the parser, not to generate directly the
711 tinycode generator code, but to generate an intermediate representation like the
712 LLVM IR, which in turn could be compiled using the clang TCG backend. That code
713 could be furtherly optimized, overcoming the limitations of the syntax-driven
714 parsing and could lead to a more optimized generated code.
715