1.. _tracing: 2 3======= 4Tracing 5======= 6 7Introduction 8============ 9 10This document describes the tracing infrastructure in QEMU and how to use it 11for debugging, profiling, and observing execution. 12 13Quickstart 14========== 15 16Enable tracing of ``memory_region_ops_read`` and ``memory_region_ops_write`` 17events:: 18 19 $ qemu --trace "memory_region_ops_*" ... 20 ... 21 719585@1608130130.441188:memory_region_ops_read cpu 0 mr 0x562fdfbb3820 addr 0x3cc value 0x67 size 1 22 719585@1608130130.441190:memory_region_ops_write cpu 0 mr 0x562fdfbd2f00 addr 0x3d4 value 0x70e size 2 23 24This output comes from the "log" trace backend that is enabled by default when 25``./configure --enable-trace-backends=BACKENDS`` was not explicitly specified. 26 27Multiple patterns can be specified by repeating the ``--trace`` option:: 28 29 $ qemu --trace "kvm_*" --trace "virtio_*" ... 30 31When patterns are used frequently it is more convenient to store them in a 32file to avoid long command-line options:: 33 34 $ echo "memory_region_ops_*" >/tmp/events 35 $ echo "kvm_*" >>/tmp/events 36 $ qemu --trace events=/tmp/events ... 37 38Trace events 39============ 40 41Sub-directory setup 42------------------- 43 44Each directory in the source tree can declare a set of trace events in a local 45"trace-events" file. All directories which contain "trace-events" files must be 46listed in the "trace_events_subdirs" variable in the top level meson.build 47file. During build, the "trace-events" file in each listed subdirectory will be 48processed by the "tracetool" script to generate code for the trace events. 49 50The individual "trace-events" files are merged into a "trace-events-all" file, 51which is also installed into "/usr/share/qemu" with the name "trace-events". 52This merged file is to be used by the "simpletrace.py" script to later analyse 53traces in the simpletrace data format. 54 55The following files are automatically generated in <builddir>/trace/ during the 56build: 57 58 - trace-<subdir>.c - the trace event state declarations 59 - trace-<subdir>.h - the trace event enums and probe functions 60 - trace-dtrace-<subdir>.h - DTrace event probe specification 61 - trace-dtrace-<subdir>.dtrace - DTrace event probe helper declaration 62 - trace-dtrace-<subdir>.o - binary DTrace provider (generated by dtrace) 63 - trace-ust-<subdir>.h - UST event probe helper declarations 64 65Here <subdir> is the sub-directory path with '/' replaced by '_'. For example, 66"accel/kvm" becomes "accel_kvm" and the final filename for "trace-<subdir>.c" 67becomes "trace-accel_kvm.c". 68 69Source files in the source tree do not directly include generated files in 70"<builddir>/trace/". Instead they #include the local "trace.h" file, without 71any sub-directory path prefix. eg io/channel-buffer.c would do:: 72 73 #include "trace.h" 74 75The "io/trace.h" file must be created manually with an #include of the 76corresponding "trace/trace-<subdir>.h" file that will be generated in the 77builddir:: 78 79 $ echo '#include "trace/trace-io.h"' >io/trace.h 80 81While it is possible to include a trace.h file from outside a source file's own 82sub-directory, this is discouraged in general. It is strongly preferred that 83all events be declared directly in the sub-directory that uses them. The only 84exception is where there are some shared trace events defined in the top level 85directory trace-events file. The top level directory generates trace files 86with a filename prefix of "trace/trace-root" instead of just "trace". This is 87to avoid ambiguity between a trace.h in the current directory, vs the top level 88directory. 89 90Using trace events 91------------------ 92 93Trace events are invoked directly from source code like this:: 94 95 #include "trace.h" /* needed for trace event prototype */ 96 97 void *qemu_vmalloc(size_t size) 98 { 99 void *ptr; 100 size_t align = QEMU_VMALLOC_ALIGN; 101 102 if (size < align) { 103 align = getpagesize(); 104 } 105 ptr = qemu_memalign(align, size); 106 trace_qemu_vmalloc(size, ptr); 107 return ptr; 108 } 109 110Declaring trace events 111---------------------- 112 113The "tracetool" script produces the trace.h header file which is included by 114every source file that uses trace events. Since many source files include 115trace.h, it uses a minimum of types and other header files included to keep the 116namespace clean and compile times and dependencies down. 117 118Trace events should use types as follows: 119 120 * Use stdint.h types for fixed-size types. Most offsets and guest memory 121 addresses are best represented with uint32_t or uint64_t. Use fixed-size 122 types over primitive types whose size may change depending on the host 123 (32-bit versus 64-bit) so trace events don't truncate values or break 124 the build. 125 126 * Use void * for pointers to structs or for arrays. The trace.h header 127 cannot include all user-defined struct declarations and it is therefore 128 necessary to use void * for pointers to structs. 129 130 * For everything else, use primitive scalar types (char, int, long) with the 131 appropriate signedness. 132 133 * Avoid floating point types (float and double) because SystemTap does not 134 support them. In most cases it is possible to round to an integer type 135 instead. This may require scaling the value first by multiplying it by 1000 136 or the like when digits after the decimal point need to be preserved. 137 138Format strings should reflect the types defined in the trace event. Take 139special care to use PRId64 and PRIu64 for int64_t and uint64_t types, 140respectively. This ensures portability between 32- and 64-bit platforms. 141Format strings must not end with a newline character. It is the responsibility 142of backends to adapt line ending for proper logging. 143 144Each event declaration will start with the event name, then its arguments, 145finally a format string for pretty-printing. For example:: 146 147 qemu_vmalloc(size_t size, void *ptr) "size %zu ptr %p" 148 qemu_vfree(void *ptr) "ptr %p" 149 150 151Hints for adding new trace events 152--------------------------------- 153 1541. Trace state changes in the code. Interesting points in the code usually 155 involve a state change like starting, stopping, allocating, freeing. State 156 changes are good trace events because they can be used to understand the 157 execution of the system. 158 1592. Trace guest operations. Guest I/O accesses like reading device registers 160 are good trace events because they can be used to understand guest 161 interactions. 162 1633. Use correlator fields so the context of an individual line of trace output 164 can be understood. For example, trace the pointer returned by malloc and 165 used as an argument to free. This way mallocs and frees can be matched up. 166 Trace events with no context are not very useful. 167 1684. Name trace events after their function. If there are multiple trace events 169 in one function, append a unique distinguisher at the end of the name. 170 171Generic interface and monitor commands 172====================================== 173 174You can programmatically query and control the state of trace events through a 175backend-agnostic interface provided by the header "trace/control.h". 176 177Note that some of the backends do not provide an implementation for some parts 178of this interface, in which case QEMU will just print a warning (please refer to 179header "trace/control.h" to see which routines are backend-dependent). 180 181The state of events can also be queried and modified through monitor commands: 182 183* ``info trace-events`` 184 View available trace events and their state. State 1 means enabled, state 0 185 means disabled. 186 187* ``trace-event NAME on|off`` 188 Enable/disable a given trace event or a group of events (using wildcards). 189 190The "--trace events=<file>" command line argument can be used to enable the 191events listed in <file> from the very beginning of the program. This file must 192contain one event name per line. 193 194If a line in the "--trace events=<file>" file begins with a '-', the trace event 195will be disabled instead of enabled. This is useful when a wildcard was used 196to enable an entire family of events but one noisy event needs to be disabled. 197 198Wildcard matching is supported in both the monitor command "trace-event" and the 199events list file. That means you can enable/disable the events having a common 200prefix in a batch. For example, virtio-blk trace events could be enabled using 201the following monitor command:: 202 203 trace-event virtio_blk_* on 204 205Trace backends 206============== 207 208The "tracetool" script automates tedious trace event code generation and also 209keeps the trace event declarations independent of the trace backend. The trace 210events are not tightly coupled to a specific trace backend, such as LTTng or 211SystemTap. Support for trace backends can be added by extending the "tracetool" 212script. 213 214The trace backends are chosen at configure time:: 215 216 ./configure --enable-trace-backends=simple,dtrace 217 218For a list of supported trace backends, try ./configure --help or see below. 219If multiple backends are enabled, the trace is sent to them all. 220 221If no backends are explicitly selected, configure will default to the 222"log" backend. 223 224The following subsections describe the supported trace backends. 225 226Nop 227--- 228 229The "nop" backend generates empty trace event functions so that the compiler 230can optimize out trace events completely. This imposes no performance 231penalty. 232 233Note that regardless of the selected trace backend, events with the "disable" 234property will be generated with the "nop" backend. 235 236Log 237--- 238 239The "log" backend sends trace events directly to standard error. This 240effectively turns trace events into debug printfs. 241 242This is the simplest backend and can be used together with existing code that 243uses DPRINTF(). 244 245The -msg timestamp=on|off command-line option controls whether or not to print 246the tid/timestamp prefix for each trace event. 247 248Simpletrace 249----------- 250 251The "simple" backend writes binary trace logs to a file from a thread, making 252it lower overhead than the "log" backend. A Python API is available for writing 253offline trace file analysis scripts. It may not be as powerful as 254platform-specific or third-party trace backends but it is portable and has no 255special library dependencies. 256 257Monitor commands 258~~~~~~~~~~~~~~~~ 259 260* ``trace-file on|off|flush|set <path>`` 261 Enable/disable/flush the trace file or set the trace file name. 262 263Analyzing trace files 264~~~~~~~~~~~~~~~~~~~~~ 265 266The "simple" backend produces binary trace files that can be formatted with the 267simpletrace.py script. The script takes the "trace-events-all" file and the 268binary trace:: 269 270 ./scripts/simpletrace.py trace-events-all trace-12345 271 272You must ensure that the same "trace-events-all" file was used to build QEMU, 273otherwise trace event declarations may have changed and output will not be 274consistent. 275 276Ftrace 277------ 278 279The "ftrace" backend writes trace data to ftrace marker. This effectively 280sends trace events to ftrace ring buffer, and you can compare qemu trace 281data and kernel(especially kvm.ko when using KVM) trace data. 282 283if you use KVM, enable kvm events in ftrace:: 284 285 # echo 1 > /sys/kernel/debug/tracing/events/kvm/enable 286 287After running qemu by root user, you can get the trace:: 288 289 # cat /sys/kernel/debug/tracing/trace 290 291Restriction: "ftrace" backend is restricted to Linux only. 292 293Syslog 294------ 295 296The "syslog" backend sends trace events using the POSIX syslog API. The log 297is opened specifying the LOG_DAEMON facility and LOG_PID option (so events 298are tagged with the pid of the particular QEMU process that generated 299them). All events are logged at LOG_INFO level. 300 301NOTE: syslog may squash duplicate consecutive trace events and apply rate 302 limiting. 303 304Restriction: "syslog" backend is restricted to POSIX compliant OS. 305 306LTTng Userspace Tracer 307---------------------- 308 309The "ust" backend uses the LTTng Userspace Tracer library. There are no 310monitor commands built into QEMU, instead UST utilities should be used to list, 311enable/disable, and dump traces. 312 313Package lttng-tools is required for userspace tracing. You must ensure that the 314current user belongs to the "tracing" group, or manually launch the 315lttng-sessiond daemon for the current user prior to running any instance of 316QEMU. 317 318While running an instrumented QEMU, LTTng should be able to list all available 319events:: 320 321 lttng list -u 322 323Create tracing session:: 324 325 lttng create mysession 326 327Enable events:: 328 329 lttng enable-event qemu:g_malloc -u 330 331Where the events can either be a comma-separated list of events, or "-a" to 332enable all tracepoint events. Start and stop tracing as needed:: 333 334 lttng start 335 lttng stop 336 337View the trace:: 338 339 lttng view 340 341Destroy tracing session:: 342 343 lttng destroy 344 345Babeltrace can be used at any later time to view the trace:: 346 347 babeltrace $HOME/lttng-traces/mysession-<date>-<time> 348 349SystemTap 350--------- 351 352The "dtrace" backend uses DTrace sdt probes but has only been tested with 353SystemTap. When SystemTap support is detected a .stp file with wrapper probes 354is generated to make use in scripts more convenient. This step can also be 355performed manually after a build in order to change the binary name in the .stp 356probes:: 357 358 scripts/tracetool.py --backends=dtrace --format=stap \ 359 --binary path/to/qemu-binary \ 360 --target-type system \ 361 --target-name x86_64 \ 362 --group=all \ 363 trace-events-all \ 364 qemu.stp 365 366To facilitate simple usage of systemtap where there merely needs to be printf 367logging of certain probes, a helper script "qemu-trace-stap" is provided. 368Consult its manual page for guidance on its usage. 369 370Trace event properties 371====================== 372 373Each event in the "trace-events-all" file can be prefixed with a space-separated 374list of zero or more of the following event properties. 375 376"disable" 377--------- 378 379If a specific trace event is going to be invoked a huge number of times, this 380might have a noticeable performance impact even when the event is 381programmatically disabled. 382 383In this case you should declare such event with the "disable" property. This 384will effectively disable the event at compile time (by using the "nop" backend), 385thus having no performance impact at all on regular builds (i.e., unless you 386edit the "trace-events-all" file). 387 388In addition, there might be cases where relatively complex computations must be 389performed to generate values that are only used as arguments for a trace 390function. In these cases you can use 'trace_event_get_state_backends()' to 391guard such computations, so they are skipped if the event has been either 392compile-time disabled or run-time disabled. If the event is compile-time 393disabled, this check will have no performance impact. 394 395:: 396 397 #include "trace.h" /* needed for trace event prototype */ 398 399 void *qemu_vmalloc(size_t size) 400 { 401 void *ptr; 402 size_t align = QEMU_VMALLOC_ALIGN; 403 404 if (size < align) { 405 align = getpagesize(); 406 } 407 ptr = qemu_memalign(align, size); 408 if (trace_event_get_state_backends(TRACE_QEMU_VMALLOC)) { 409 void *complex; 410 /* some complex computations to produce the 'complex' value */ 411 trace_qemu_vmalloc(size, ptr, complex); 412 } 413 return ptr; 414 } 415 416"tcg" 417----- 418 419Guest code generated by TCG can be traced by defining an event with the "tcg" 420event property. Internally, this property generates two events: 421"<eventname>_trans" to trace the event at translation time, and 422"<eventname>_exec" to trace the event at execution time. 423 424Instead of using these two events, you should instead use the function 425"trace_<eventname>_tcg" during translation (TCG code generation). This function 426will automatically call "trace_<eventname>_trans", and will generate the 427necessary TCG code to call "trace_<eventname>_exec" during guest code execution. 428 429Events with the "tcg" property can be declared in the "trace-events" file with a 430mix of native and TCG types, and "trace_<eventname>_tcg" will gracefully forward 431them to the "<eventname>_trans" and "<eventname>_exec" events. Since TCG values 432are not known at translation time, these are ignored by the "<eventname>_trans" 433event. Because of this, the entry in the "trace-events" file needs two printing 434formats (separated by a comma):: 435 436 tcg foo(uint8_t a1, TCGv_i32 a2) "a1=%d", "a1=%d a2=%d" 437 438For example:: 439 440 #include "trace-tcg.h" 441 442 void some_disassembly_func (...) 443 { 444 uint8_t a1 = ...; 445 TCGv_i32 a2 = ...; 446 trace_foo_tcg(a1, a2); 447 } 448 449This will immediately call:: 450 451 void trace_foo_trans(uint8_t a1); 452 453and will generate the TCG code to call:: 454 455 void trace_foo(uint8_t a1, uint32_t a2); 456 457"vcpu" 458------ 459 460Identifies events that trace vCPU-specific information. It implicitly adds a 461"CPUState*" argument, and extends the tracing print format to show the vCPU 462information. If used together with the "tcg" property, it adds a second 463"TCGv_env" argument that must point to the per-target global TCG register that 464points to the vCPU when guest code is executed (usually the "cpu_env" variable). 465 466The "tcg" and "vcpu" properties are currently only honored in the root 467./trace-events file. 468 469The following example events:: 470 471 foo(uint32_t a) "a=%x" 472 vcpu bar(uint32_t a) "a=%x" 473 tcg vcpu baz(uint32_t a) "a=%x", "a=%x" 474 475Can be used as:: 476 477 #include "trace-tcg.h" 478 479 CPUArchState *env; 480 TCGv_ptr cpu_env; 481 482 void some_disassembly_func(...) 483 { 484 /* trace emitted at this point */ 485 trace_foo(0xd1); 486 /* trace emitted at this point */ 487 trace_bar(env_cpu(env), 0xd2); 488 /* trace emitted at this point (env) and when guest code is executed (cpu_env) */ 489 trace_baz_tcg(env_cpu(env), cpu_env, 0xd3); 490 } 491 492If the translating vCPU has address 0xc1 and code is later executed by vCPU 4930xc2, this would be an example output:: 494 495 // at guest code translation 496 foo a=0xd1 497 bar cpu=0xc1 a=0xd2 498 baz_trans cpu=0xc1 a=0xd3 499 // at guest code execution 500 baz_exec cpu=0xc2 a=0xd3 501