xref: /openbmc/qemu/docs/devel/tcg-plugins.rst (revision 93e0932b)
1..
2   Copyright (C) 2017, Emilio G. Cota <cota@braap.org>
3   Copyright (c) 2019, Linaro Limited
4   Written by Emilio Cota and Alex Bennée
5
6.. _TCG Plugins:
7
8QEMU TCG Plugins
9================
10
11QEMU TCG plugins provide a way for users to run experiments taking
12advantage of the total system control emulation can have over a guest.
13It provides a mechanism for plugins to subscribe to events during
14translation and execution and optionally callback into the plugin
15during these events. TCG plugins are unable to change the system state
16only monitor it passively. However they can do this down to an
17individual instruction granularity including potentially subscribing
18to all load and store operations.
19
20Usage
21-----
22
23Any QEMU binary with TCG support has plugins enabled by default.
24Earlier releases needed to be explicitly enabled with::
25
26  configure --enable-plugins
27
28Once built a program can be run with multiple plugins loaded each with
29their own arguments::
30
31  $QEMU $OTHER_QEMU_ARGS \
32      -plugin contrib/plugin/libhowvec.so,inline=on,count=hint \
33      -plugin contrib/plugin/libhotblocks.so
34
35Arguments are plugin specific and can be used to modify their
36behaviour. In this case the howvec plugin is being asked to use inline
37ops to count and break down the hint instructions by type.
38
39Linux user-mode emulation also evaluates the environment variable
40``QEMU_PLUGIN``::
41
42  QEMU_PLUGIN="file=contrib/plugins/libhowvec.so,inline=on,count=hint" $QEMU
43
44Writing plugins
45---------------
46
47API versioning
48~~~~~~~~~~~~~~
49
50This is a new feature for QEMU and it does allow people to develop
51out-of-tree plugins that can be dynamically linked into a running QEMU
52process. However the project reserves the right to change or break the
53API should it need to do so. The best way to avoid this is to submit
54your plugin upstream so they can be updated if/when the API changes.
55
56All plugins need to declare a symbol which exports the plugin API
57version they were built against. This can be done simply by::
58
59  QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
60
61The core code will refuse to load a plugin that doesn't export a
62``qemu_plugin_version`` symbol or if plugin version is outside of QEMU's
63supported range of API versions.
64
65Additionally the ``qemu_info_t`` structure which is passed to the
66``qemu_plugin_install`` method of a plugin will detail the minimum and
67current API versions supported by QEMU. The API version will be
68incremented if new APIs are added. The minimum API version will be
69incremented if existing APIs are changed or removed.
70
71Lifetime of the query handle
72~~~~~~~~~~~~~~~~~~~~~~~~~~~~
73
74Each callback provides an opaque anonymous information handle which
75can usually be further queried to find out information about a
76translation, instruction or operation. The handles themselves are only
77valid during the lifetime of the callback so it is important that any
78information that is needed is extracted during the callback and saved
79by the plugin.
80
81Plugin life cycle
82~~~~~~~~~~~~~~~~~
83
84First the plugin is loaded and the public qemu_plugin_install function
85is called. The plugin will then register callbacks for various plugin
86events. Generally plugins will register a handler for the *atexit*
87if they want to dump a summary of collected information once the
88program/system has finished running.
89
90When a registered event occurs the plugin callback is invoked. The
91callbacks may provide additional information. In the case of a
92translation event the plugin has an option to enumerate the
93instructions in a block of instructions and optionally register
94callbacks to some or all instructions when they are executed.
95
96There is also a facility to add an inline event where code to
97increment a counter can be directly inlined with the translation.
98Currently only a simple increment is supported. This is not atomic so
99can miss counts. If you want absolute precision you should use a
100callback which can then ensure atomicity itself.
101
102Finally when QEMU exits all the registered *atexit* callbacks are
103invoked.
104
105Exposure of QEMU internals
106~~~~~~~~~~~~~~~~~~~~~~~~~~
107
108The plugin architecture actively avoids leaking implementation details
109about how QEMU's translation works to the plugins. While there are
110conceptions such as translation time and translation blocks the
111details are opaque to plugins. The plugin is able to query select
112details of instructions and system configuration only through the
113exported *qemu_plugin* functions.
114
115Internals
116---------
117
118Locking
119~~~~~~~
120
121We have to ensure we cannot deadlock, particularly under MTTCG. For
122this we acquire a lock when called from plugin code. We also keep the
123list of callbacks under RCU so that we do not have to hold the lock
124when calling the callbacks. This is also for performance, since some
125callbacks (e.g. memory access callbacks) might be called very
126frequently.
127
128  * A consequence of this is that we keep our own list of CPUs, so that
129    we do not have to worry about locking order wrt cpu_list_lock.
130  * Use a recursive lock, since we can get registration calls from
131    callbacks.
132
133As a result registering/unregistering callbacks is "slow", since it
134takes a lock. But this is very infrequent; we want performance when
135calling (or not calling) callbacks, not when registering them. Using
136RCU is great for this.
137
138We support the uninstallation of a plugin at any time (e.g. from
139plugin callbacks). This allows plugins to remove themselves if they no
140longer want to instrument the code. This operation is asynchronous
141which means callbacks may still occur after the uninstall operation is
142requested. The plugin isn't completely uninstalled until the safe work
143has executed while all vCPUs are quiescent.
144
145Example Plugins
146---------------
147
148There are a number of plugins included with QEMU and you are
149encouraged to contribute your own plugins plugins upstream. There is a
150``contrib/plugins`` directory where they can go. There are also some
151basic plugins that are used to test and exercise the API during the
152``make check-tcg`` target in ``tests\plugins``.
153
154- tests/plugins/empty.c
155
156Purely a test plugin for measuring the overhead of the plugins system
157itself. Does no instrumentation.
158
159- tests/plugins/bb.c
160
161A very basic plugin which will measure execution in course terms as
162each basic block is executed. By default the results are shown once
163execution finishes::
164
165  $ qemu-aarch64 -plugin tests/plugin/libbb.so \
166      -d plugin ./tests/tcg/aarch64-linux-user/sha1
167  SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
168  bb's: 2277338, insns: 158483046
169
170Behaviour can be tweaked with the following arguments:
171
172 * inline=true|false
173
174 Use faster inline addition of a single counter. Not per-cpu and not
175 thread safe.
176
177 * idle=true|false
178
179 Dump the current execution stats whenever the guest vCPU idles
180
181- tests/plugins/insn.c
182
183This is a basic instruction level instrumentation which can count the
184number of instructions executed on each core/thread::
185
186  $ qemu-aarch64 -plugin tests/plugin/libinsn.so \
187      -d plugin ./tests/tcg/aarch64-linux-user/threadcount
188  Created 10 threads
189  Done
190  cpu 0 insns: 46765
191  cpu 1 insns: 3694
192  cpu 2 insns: 3694
193  cpu 3 insns: 2994
194  cpu 4 insns: 1497
195  cpu 5 insns: 1497
196  cpu 6 insns: 1497
197  cpu 7 insns: 1497
198  total insns: 63135
199
200Behaviour can be tweaked with the following arguments:
201
202 * inline=true|false
203
204 Use faster inline addition of a single counter. Not per-cpu and not
205 thread safe.
206
207 * sizes=true|false
208
209 Give a summary of the instruction sizes for the execution
210
211 * match=<string>
212
213 Only instrument instructions matching the string prefix. Will show
214 some basic stats including how many instructions have executed since
215 the last execution. For example::
216
217   $ qemu-aarch64 -plugin tests/plugin/libinsn.so,match=bl \
218       -d plugin ./tests/tcg/aarch64-linux-user/sha512-vector
219   ...
220   0x40069c, 'bl #0x4002b0', 10 hits, 1093 match hits, Δ+1257 since last match, 98 avg insns/match
221   0x4006ac, 'bl #0x403690', 10 hits, 1094 match hits, Δ+47 since last match, 98 avg insns/match
222   0x4037fc, 'bl #0x4002b0', 18 hits, 1095 match hits, Δ+22 since last match, 98 avg insns/match
223   0x400720, 'bl #0x403690', 10 hits, 1096 match hits, Δ+58 since last match, 98 avg insns/match
224   0x4037fc, 'bl #0x4002b0', 19 hits, 1097 match hits, Δ+22 since last match, 98 avg insns/match
225   0x400730, 'bl #0x403690', 10 hits, 1098 match hits, Δ+33 since last match, 98 avg insns/match
226   0x4037ac, 'bl #0x4002b0', 12 hits, 1099 match hits, Δ+20 since last match, 98 avg insns/match
227   ...
228
229For more detailed execution tracing see the ``execlog`` plugin for
230other options.
231
232- tests/plugins/mem.c
233
234Basic instruction level memory instrumentation::
235
236  $ qemu-aarch64 -plugin tests/plugin/libmem.so,inline=true \
237      -d plugin ./tests/tcg/aarch64-linux-user/sha1
238  SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
239  inline mem accesses: 79525013
240
241Behaviour can be tweaked with the following arguments:
242
243 * inline=true|false
244
245 Use faster inline addition of a single counter. Not per-cpu and not
246 thread safe.
247
248 * callback=true|false
249
250 Use callbacks on each memory instrumentation.
251
252 * hwaddr=true|false
253
254 Count IO accesses (only for system emulation)
255
256- tests/plugins/syscall.c
257
258A basic syscall tracing plugin. This only works for user-mode. By
259default it will give a summary of syscall stats at the end of the
260run::
261
262  $ qemu-aarch64 -plugin tests/plugin/libsyscall \
263      -d plugin ./tests/tcg/aarch64-linux-user/threadcount
264  Created 10 threads
265  Done
266  syscall no.  calls  errors
267  226          12     0
268  99           11     11
269  115          11     0
270  222          11     0
271  93           10     0
272  220          10     0
273  233          10     0
274  215          8      0
275  214          4      0
276  134          2      0
277  64           2      0
278  96           1      0
279  94           1      0
280  80           1      0
281  261          1      0
282  78           1      0
283  160          1      0
284  135          1      0
285
286- contrib/plugins/hotblocks.c
287
288The hotblocks plugin allows you to examine the where hot paths of
289execution are in your program. Once the program has finished you will
290get a sorted list of blocks reporting the starting PC, translation
291count, number of instructions and execution count. This will work best
292with linux-user execution as system emulation tends to generate
293re-translations as blocks from different programs get swapped in and
294out of system memory.
295
296If your program is single-threaded you can use the ``inline`` option for
297slightly faster (but not thread safe) counters.
298
299Example::
300
301  $ qemu-aarch64 \
302    -plugin contrib/plugins/libhotblocks.so -d plugin \
303    ./tests/tcg/aarch64-linux-user/sha1
304  SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
305  collected 903 entries in the hash table
306  pc, tcount, icount, ecount
307  0x0000000041ed10, 1, 5, 66087
308  0x000000004002b0, 1, 4, 66087
309  ...
310
311- contrib/plugins/hotpages.c
312
313Similar to hotblocks but this time tracks memory accesses::
314
315  $ qemu-aarch64 \
316    -plugin contrib/plugins/libhotpages.so -d plugin \
317    ./tests/tcg/aarch64-linux-user/sha1
318  SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
319  Addr, RCPUs, Reads, WCPUs, Writes
320  0x000055007fe000, 0x0001, 31747952, 0x0001, 8835161
321  0x000055007ff000, 0x0001, 29001054, 0x0001, 8780625
322  0x00005500800000, 0x0001, 687465, 0x0001, 335857
323  0x0000000048b000, 0x0001, 130594, 0x0001, 355
324  0x0000000048a000, 0x0001, 1826, 0x0001, 11
325
326The hotpages plugin can be configured using the following arguments:
327
328  * sortby=reads|writes|address
329
330  Log the data sorted by either the number of reads, the number of writes, or
331  memory address. (Default: entries are sorted by the sum of reads and writes)
332
333  * io=on
334
335  Track IO addresses. Only relevant to full system emulation. (Default: off)
336
337  * pagesize=N
338
339  The page size used. (Default: N = 4096)
340
341- contrib/plugins/howvec.c
342
343This is an instruction classifier so can be used to count different
344types of instructions. It has a number of options to refine which get
345counted. You can give a value to the ``count`` argument for a class of
346instructions to break it down fully, so for example to see all the system
347registers accesses::
348
349  $ qemu-system-aarch64 $(QEMU_ARGS) \
350    -append "root=/dev/sda2 systemd.unit=benchmark.service" \
351    -smp 4 -plugin ./contrib/plugins/libhowvec.so,count=sreg -d plugin
352
353which will lead to a sorted list after the class breakdown::
354
355  Instruction Classes:
356  Class:   UDEF                   not counted
357  Class:   SVE                    (68 hits)
358  Class:   PCrel addr             (47789483 hits)
359  Class:   Add/Sub (imm)          (192817388 hits)
360  Class:   Logical (imm)          (93852565 hits)
361  Class:   Move Wide (imm)        (76398116 hits)
362  Class:   Bitfield               (44706084 hits)
363  Class:   Extract                (5499257 hits)
364  Class:   Cond Branch (imm)      (147202932 hits)
365  Class:   Exception Gen          (193581 hits)
366  Class:     NOP                  not counted
367  Class:   Hints                  (6652291 hits)
368  Class:   Barriers               (8001661 hits)
369  Class:   PSTATE                 (1801695 hits)
370  Class:   System Insn            (6385349 hits)
371  Class:   System Reg             counted individually
372  Class:   Branch (reg)           (69497127 hits)
373  Class:   Branch (imm)           (84393665 hits)
374  Class:   Cmp & Branch           (110929659 hits)
375  Class:   Tst & Branch           (44681442 hits)
376  Class:   AdvSimd ldstmult       (736 hits)
377  Class:   ldst excl              (9098783 hits)
378  Class:   Load Reg (lit)         (87189424 hits)
379  Class:   ldst noalloc pair      (3264433 hits)
380  Class:   ldst pair              (412526434 hits)
381  Class:   ldst reg (imm)         (314734576 hits)
382  Class: Loads & Stores           (2117774 hits)
383  Class: Data Proc Reg            (223519077 hits)
384  Class: Scalar FP                (31657954 hits)
385  Individual Instructions:
386  Instr: mrs x0, sp_el0           (2682661 hits)  (op=0xd5384100/  System Reg)
387  Instr: mrs x1, tpidr_el2        (1789339 hits)  (op=0xd53cd041/  System Reg)
388  Instr: mrs x2, tpidr_el2        (1513494 hits)  (op=0xd53cd042/  System Reg)
389  Instr: mrs x0, tpidr_el2        (1490823 hits)  (op=0xd53cd040/  System Reg)
390  Instr: mrs x1, sp_el0           (933793 hits)   (op=0xd5384101/  System Reg)
391  Instr: mrs x2, sp_el0           (699516 hits)   (op=0xd5384102/  System Reg)
392  Instr: mrs x4, tpidr_el2        (528437 hits)   (op=0xd53cd044/  System Reg)
393  Instr: mrs x30, ttbr1_el1       (480776 hits)   (op=0xd538203e/  System Reg)
394  Instr: msr ttbr1_el1, x30       (480713 hits)   (op=0xd518203e/  System Reg)
395  Instr: msr vbar_el1, x30        (480671 hits)   (op=0xd518c01e/  System Reg)
396  ...
397
398To find the argument shorthand for the class you need to examine the
399source code of the plugin at the moment, specifically the ``*opt``
400argument in the InsnClassExecCount tables.
401
402- contrib/plugins/lockstep.c
403
404This is a debugging tool for developers who want to find out when and
405where execution diverges after a subtle change to TCG code generation.
406It is not an exact science and results are likely to be mixed once
407asynchronous events are introduced. While the use of -icount can
408introduce determinism to the execution flow it doesn't always follow
409the translation sequence will be exactly the same. Typically this is
410caused by a timer firing to service the GUI causing a block to end
411early. However in some cases it has proved to be useful in pointing
412people at roughly where execution diverges. The only argument you need
413for the plugin is a path for the socket the two instances will
414communicate over::
415
416
417  $ qemu-system-sparc -monitor none -parallel none \
418    -net none -M SS-20 -m 256 -kernel day11/zImage.elf \
419    -plugin ./contrib/plugins/liblockstep.so,sockpath=lockstep-sparc.sock \
420    -d plugin,nochain
421
422which will eventually report::
423
424  qemu-system-sparc: warning: nic lance.0 has no peer
425  @ 0x000000ffd06678 vs 0x000000ffd001e0 (2/1 since last)
426  @ 0x000000ffd07d9c vs 0x000000ffd06678 (3/1 since last)
427  Δ insn_count @ 0x000000ffd07d9c (809900609) vs 0x000000ffd06678 (809900612)
428    previously @ 0x000000ffd06678/10 (809900609 insns)
429    previously @ 0x000000ffd001e0/4 (809900599 insns)
430    previously @ 0x000000ffd080ac/2 (809900595 insns)
431    previously @ 0x000000ffd08098/5 (809900593 insns)
432    previously @ 0x000000ffd080c0/1 (809900588 insns)
433
434- contrib/plugins/hwprofile.c
435
436The hwprofile tool can only be used with system emulation and allows
437the user to see what hardware is accessed how often. It has a number of options:
438
439 * track=read or track=write
440
441 By default the plugin tracks both reads and writes. You can use one
442 of these options to limit the tracking to just one class of accesses.
443
444 * source
445
446 Will include a detailed break down of what the guest PC that made the
447 access was. Not compatible with the pattern option. Example output::
448
449   cirrus-low-memory @ 0xfffffd00000a0000
450    pc:fffffc0000005cdc, 1, 256
451    pc:fffffc0000005ce8, 1, 256
452    pc:fffffc0000005cec, 1, 256
453
454 * pattern
455
456 Instead break down the accesses based on the offset into the HW
457 region. This can be useful for seeing the most used registers of a
458 device. Example output::
459
460    pci0-conf @ 0xfffffd01fe000000
461      off:00000004, 1, 1
462      off:00000010, 1, 3
463      off:00000014, 1, 3
464      off:00000018, 1, 2
465      off:0000001c, 1, 2
466      off:00000020, 1, 2
467      ...
468
469- contrib/plugins/execlog.c
470
471The execlog tool traces executed instructions with memory access. It can be used
472for debugging and security analysis purposes.
473Please be aware that this will generate a lot of output.
474
475The plugin needs default argument::
476
477  $ qemu-system-arm $(QEMU_ARGS) \
478    -plugin ./contrib/plugins/libexeclog.so -d plugin
479
480which will output an execution trace following this structure::
481
482  # vCPU, vAddr, opcode, disassembly[, load/store, memory addr, device]...
483  0, 0xa12, 0xf8012400, "movs r4, #0"
484  0, 0xa14, 0xf87f42b4, "cmp r4, r6"
485  0, 0xa16, 0xd206, "bhs #0xa26"
486  0, 0xa18, 0xfff94803, "ldr r0, [pc, #0xc]", load, 0x00010a28, RAM
487  0, 0xa1a, 0xf989f000, "bl #0xd30"
488  0, 0xd30, 0xfff9b510, "push {r4, lr}", store, 0x20003ee0, RAM, store, 0x20003ee4, RAM
489  0, 0xd32, 0xf9893014, "adds r0, #0x14"
490  0, 0xd34, 0xf9c8f000, "bl #0x10c8"
491  0, 0x10c8, 0xfff96c43, "ldr r3, [r0, #0x44]", load, 0x200000e4, RAM
492
493the output can be filtered to only track certain instructions or
494addresses using the ``ifilter`` or ``afilter`` options. You can stack the
495arguments if required::
496
497  $ qemu-system-arm $(QEMU_ARGS) \
498    -plugin ./contrib/plugins/libexeclog.so,ifilter=st1w,afilter=0x40001808 -d plugin
499
500- contrib/plugins/cache.c
501
502Cache modelling plugin that measures the performance of a given L1 cache
503configuration, and optionally a unified L2 per-core cache when a given working
504set is run::
505
506  $ qemu-x86_64 -plugin ./contrib/plugins/libcache.so \
507      -d plugin -D cache.log ./tests/tcg/x86_64-linux-user/float_convs
508
509will report the following::
510
511    core #, data accesses, data misses, dmiss rate, insn accesses, insn misses, imiss rate
512    0       996695         508             0.0510%  2642799        18617           0.7044%
513
514    address, data misses, instruction
515    0x424f1e (_int_malloc), 109, movq %rax, 8(%rcx)
516    0x41f395 (_IO_default_xsputn), 49, movb %dl, (%rdi, %rax)
517    0x42584d (ptmalloc_init.part.0), 33, movaps %xmm0, (%rax)
518    0x454d48 (__tunables_init), 20, cmpb $0, (%r8)
519    ...
520
521    address, fetch misses, instruction
522    0x4160a0 (__vfprintf_internal), 744, movl $1, %ebx
523    0x41f0a0 (_IO_setb), 744, endbr64
524    0x415882 (__vfprintf_internal), 744, movq %r12, %rdi
525    0x4268a0 (__malloc), 696, andq $0xfffffffffffffff0, %rax
526    ...
527
528The plugin has a number of arguments, all of them are optional:
529
530  * limit=N
531
532  Print top N icache and dcache thrashing instructions along with their
533  address, number of misses, and its disassembly. (default: 32)
534
535  * icachesize=N
536  * iblksize=B
537  * iassoc=A
538
539  Instruction cache configuration arguments. They specify the cache size, block
540  size, and associativity of the instruction cache, respectively.
541  (default: N = 16384, B = 64, A = 8)
542
543  * dcachesize=N
544  * dblksize=B
545  * dassoc=A
546
547  Data cache configuration arguments. They specify the cache size, block size,
548  and associativity of the data cache, respectively.
549  (default: N = 16384, B = 64, A = 8)
550
551  * evict=POLICY
552
553  Sets the eviction policy to POLICY. Available policies are: :code:`lru`,
554  :code:`fifo`, and :code:`rand`. The plugin will use the specified policy for
555  both instruction and data caches. (default: POLICY = :code:`lru`)
556
557  * cores=N
558
559  Sets the number of cores for which we maintain separate icache and dcache.
560  (default: for linux-user, N = 1, for full system emulation: N = cores
561  available to guest)
562
563  * l2=on
564
565  Simulates a unified L2 cache (stores blocks for both instructions and data)
566  using the default L2 configuration (cache size = 2MB, associativity = 16-way,
567  block size = 64B).
568
569  * l2cachesize=N
570  * l2blksize=B
571  * l2assoc=A
572
573  L2 cache configuration arguments. They specify the cache size, block size, and
574  associativity of the L2 cache, respectively. Setting any of the L2
575  configuration arguments implies ``l2=on``.
576  (default: N = 2097152 (2MB), B = 64, A = 16)
577
578API
579---
580
581The following API is generated from the inline documentation in
582``include/qemu/qemu-plugin.h``. Please ensure any updates to the API
583include the full kernel-doc annotations.
584
585.. kernel-doc:: include/qemu/qemu-plugin.h
586
587