xref: /openbmc/qemu/docs/devel/testing/fuzzing.rst (revision c0f86125)
1ff41da50SThomas Huth========
2ff41da50SThomas HuthFuzzing
3ff41da50SThomas Huth========
4ff41da50SThomas Huth
5ff41da50SThomas HuthThis document describes the virtual-device fuzzing infrastructure in QEMU and
6ff41da50SThomas Huthhow to use it to implement additional fuzzers.
7ff41da50SThomas Huth
8ff41da50SThomas HuthBasics
9ff41da50SThomas Huth------
10ff41da50SThomas Huth
11ff41da50SThomas HuthFuzzing operates by passing inputs to an entry point/target function. The
12ff41da50SThomas Huthfuzzer tracks the code coverage triggered by the input. Based on these
13ff41da50SThomas Huthfindings, the fuzzer mutates the input and repeats the fuzzing.
14ff41da50SThomas Huth
15ff41da50SThomas HuthTo fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer
16ff41da50SThomas Huthis an *in-process* fuzzer. For the developer, this means that it is their
17ff41da50SThomas Huthresponsibility to ensure that state is reset between fuzzing-runs.
18ff41da50SThomas Huth
19ff41da50SThomas HuthBuilding the fuzzers
20ff41da50SThomas Huth--------------------
21ff41da50SThomas Huth
22ff41da50SThomas HuthTo build the fuzzers, install a recent version of clang:
23ff41da50SThomas HuthConfigure with (substitute the clang binaries with the version you installed).
24*c0f86125SMatheus Tavares BernardinoHere, enable-asan and enable-ubsan are optional but they allow us to reliably
25*c0f86125SMatheus Tavares Bernardinodetect bugs such as out-of-bounds accesses, uses-after-free, double-frees
26*c0f86125SMatheus Tavares Bernardinoetc.::
27ff41da50SThomas Huth
28cb771ac1SRichard Henderson    CC=clang-8 CXX=clang++-8 /path/to/configure \
29cb771ac1SRichard Henderson        --enable-fuzzing --enable-asan --enable-ubsan
30ff41da50SThomas Huth
31ff41da50SThomas HuthFuzz targets are built similarly to system targets::
32ff41da50SThomas Huth
33ff41da50SThomas Huth    make qemu-fuzz-i386
34ff41da50SThomas Huth
35ff41da50SThomas HuthThis builds ``./qemu-fuzz-i386``
36ff41da50SThomas Huth
37ff41da50SThomas HuthThe first option to this command is: ``--fuzz-target=FUZZ_NAME``
38ff41da50SThomas HuthTo list all of the available fuzzers run ``qemu-fuzz-i386`` with no arguments.
39ff41da50SThomas Huth
40ff41da50SThomas HuthFor example::
41ff41da50SThomas Huth
42ff41da50SThomas Huth    ./qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz
43ff41da50SThomas Huth
44ff41da50SThomas HuthInternally, libfuzzer parses all arguments that do not begin with ``"--"``.
45ff41da50SThomas HuthInformation about these is available by passing ``-help=1``
46ff41da50SThomas Huth
47ff41da50SThomas HuthNow the only thing left to do is wait for the fuzzer to trigger potential
48ff41da50SThomas Huthcrashes.
49ff41da50SThomas Huth
50ff41da50SThomas HuthUseful libFuzzer flags
51ff41da50SThomas Huth----------------------
52ff41da50SThomas Huth
53ff41da50SThomas HuthAs mentioned above, libFuzzer accepts some arguments. Passing ``-help=1`` will
54ff41da50SThomas Huthlist the available arguments. In particular, these arguments might be helpful:
55ff41da50SThomas Huth
56ff41da50SThomas Huth* ``CORPUS_DIR/`` : Specify a directory as the last argument to libFuzzer.
57ff41da50SThomas Huth  libFuzzer stores each "interesting" input in this corpus directory. The next
58ff41da50SThomas Huth  time you run libFuzzer, it will read all of the inputs from the corpus, and
59ff41da50SThomas Huth  continue fuzzing from there. You can also specify multiple directories.
60ff41da50SThomas Huth  libFuzzer loads existing inputs from all specified directories, but will only
61ff41da50SThomas Huth  write new ones to the first one specified.
62ff41da50SThomas Huth
63ff41da50SThomas Huth* ``-max_len=4096`` : specify the maximum byte-length of the inputs libFuzzer
64ff41da50SThomas Huth  will generate.
65ff41da50SThomas Huth
66ff41da50SThomas Huth* ``-close_fd_mask={1,2,3}`` : close, stderr, or both. Useful for targets that
67ff41da50SThomas Huth  trigger many debug/error messages, or create output on the serial console.
68ff41da50SThomas Huth
69ff41da50SThomas Huth* ``-jobs=4 -workers=4`` : These arguments configure libFuzzer to run 4 fuzzers in
70ff41da50SThomas Huth  parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only
71ff41da50SThomas Huth  ``-jobs=N``, libFuzzer automatically spawns a number of workers less than or equal
72ff41da50SThomas Huth  to half the available CPU cores. Replace 4 with a number appropriate for your
73ff41da50SThomas Huth  machine. Make sure to specify a ``CORPUS_DIR``, which will allow the parallel
74ff41da50SThomas Huth  fuzzers to share information about the interesting inputs they find.
75ff41da50SThomas Huth
76ff41da50SThomas Huth* ``-use_value_profile=1`` : For each comparison operation, libFuzzer computes
77ff41da50SThomas Huth  ``(caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12)`` and places this in the
78ff41da50SThomas Huth  coverage table. Useful for targets with "magic" constants. If Arg1 came from
79ff41da50SThomas Huth  the fuzzer's input and Arg2 is a magic constant, then each time the Hamming
80ff41da50SThomas Huth  distance between Arg1 and Arg2 decreases, libFuzzer adds the input to the
81ff41da50SThomas Huth  corpus.
82ff41da50SThomas Huth
83ff41da50SThomas Huth* ``-shrink=1`` : Tries to make elements of the corpus "smaller". Might lead to
84ff41da50SThomas Huth  better coverage performance, depending on the target.
85ff41da50SThomas Huth
86ff41da50SThomas HuthNote that libFuzzer's exact behavior will depend on the version of
87ff41da50SThomas Huthclang and libFuzzer used to build the device fuzzers.
88ff41da50SThomas Huth
89ff41da50SThomas HuthGenerating Coverage Reports
90ff41da50SThomas Huth---------------------------
91ff41da50SThomas Huth
92ff41da50SThomas HuthCode coverage is a crucial metric for evaluating a fuzzer's performance.
93ff41da50SThomas HuthlibFuzzer's output provides a "cov: " column that provides a total number of
94ff41da50SThomas Huthunique blocks/edges covered. To examine coverage on a line-by-line basis we
95ff41da50SThomas Huthcan use Clang coverage:
96ff41da50SThomas Huth
97ff41da50SThomas Huth 1. Configure libFuzzer to store a corpus of all interesting inputs (see
98ff41da50SThomas Huth    CORPUS_DIR above)
99ff41da50SThomas Huth 2. ``./configure`` the QEMU build with ::
100ff41da50SThomas Huth
101ff41da50SThomas Huth    --enable-fuzzing \
102ff41da50SThomas Huth    --extra-cflags="-fprofile-instr-generate -fcoverage-mapping"
103ff41da50SThomas Huth
104ff41da50SThomas Huth 3. Re-run the fuzzer. Specify $CORPUS_DIR/* as an argument, telling libfuzzer
105ff41da50SThomas Huth    to execute all of the inputs in $CORPUS_DIR and exit. Once the process
106ff41da50SThomas Huth    exits, you should find a file, "default.profraw" in the working directory.
107ff41da50SThomas Huth 4. Execute these commands to generate a detailed HTML coverage-report::
108ff41da50SThomas Huth
109ff41da50SThomas Huth      llvm-profdata merge -output=default.profdata default.profraw
110ff41da50SThomas Huth      llvm-cov show ./path/to/qemu-fuzz-i386 -instr-profile=default.profdata \
111ff41da50SThomas Huth      --format html -output-dir=/path/to/output/report
112ff41da50SThomas Huth
113ff41da50SThomas HuthAdding a new fuzzer
114ff41da50SThomas Huth-------------------
115ff41da50SThomas Huth
116ff41da50SThomas HuthCoverage over virtual devices can be improved by adding additional fuzzers.
117ff41da50SThomas HuthFuzzers are kept in ``tests/qtest/fuzz/`` and should be added to
118ff41da50SThomas Huth``tests/qtest/fuzz/meson.build``
119ff41da50SThomas Huth
120ff41da50SThomas HuthFuzzers can rely on both qtest and libqos to communicate with virtual devices.
121ff41da50SThomas Huth
122ff41da50SThomas Huth1. Create a new source file. For example ``tests/qtest/fuzz/foo-device-fuzz.c``.
123ff41da50SThomas Huth
124ff41da50SThomas Huth2. Write the fuzzing code using the libqtest/libqos API. See existing fuzzers
125ff41da50SThomas Huth   for reference.
126ff41da50SThomas Huth
127ff41da50SThomas Huth3. Add the fuzzer to ``tests/qtest/fuzz/meson.build``.
128ff41da50SThomas Huth
129ff41da50SThomas HuthFuzzers can be more-or-less thought of as special qtest programs which can
130ff41da50SThomas Huthmodify the qtest commands and/or qtest command arguments based on inputs
131ff41da50SThomas Huthprovided by libfuzzer. Libfuzzer passes a byte array and length. Commonly the
132ff41da50SThomas Huthfuzzer loops over the byte-array interpreting it as a list of qtest commands,
133ff41da50SThomas Huthaddresses, or values.
134ff41da50SThomas Huth
135ff41da50SThomas HuthThe Generic Fuzzer
136ff41da50SThomas Huth------------------
137ff41da50SThomas Huth
138ff41da50SThomas HuthWriting a fuzz target can be a lot of effort (especially if a device driver has
139ff41da50SThomas Huthnot be built-out within libqos). Many devices can be fuzzed to some degree,
140ff41da50SThomas Huthwithout any device-specific code, using the generic-fuzz target.
141ff41da50SThomas Huth
142ff41da50SThomas HuthThe generic-fuzz target is capable of fuzzing devices over their PIO, MMIO,
143ff41da50SThomas Huthand DMA input-spaces. To apply the generic-fuzz to a device, we need to define
144ff41da50SThomas Huthtwo env-variables, at minimum:
145ff41da50SThomas Huth
146ff41da50SThomas Huth* ``QEMU_FUZZ_ARGS=`` is the set of QEMU arguments used to configure a machine, with
147ff41da50SThomas Huth  the device attached. For example, if we want to fuzz the virtio-net device
148ff41da50SThomas Huth  attached to a pc-i440fx machine, we can specify::
149ff41da50SThomas Huth
150ff41da50SThomas Huth    QEMU_FUZZ_ARGS="-M pc -nodefaults -netdev user,id=user0 \
151ff41da50SThomas Huth    -device virtio-net,netdev=user0"
152ff41da50SThomas Huth
153ff41da50SThomas Huth* ``QEMU_FUZZ_OBJECTS=`` is a set of space-delimited strings used to identify
154ff41da50SThomas Huth  the MemoryRegions that will be fuzzed. These strings are compared against
155ff41da50SThomas Huth  MemoryRegion names and MemoryRegion owner names, to decide whether each
156ff41da50SThomas Huth  MemoryRegion should be fuzzed. These strings support globbing. For the
157ff41da50SThomas Huth  virtio-net example, we could use one of ::
158ff41da50SThomas Huth
159ff41da50SThomas Huth    QEMU_FUZZ_OBJECTS='virtio-net'
160ff41da50SThomas Huth    QEMU_FUZZ_OBJECTS='virtio*'
161ff41da50SThomas Huth    QEMU_FUZZ_OBJECTS='virtio* pcspk' # Fuzz the virtio devices and the speaker
162ff41da50SThomas Huth    QEMU_FUZZ_OBJECTS='*' # Fuzz the whole machine``
163ff41da50SThomas Huth
164ff41da50SThomas HuthThe ``"info mtree"`` and ``"info qom-tree"`` monitor commands can be especially
165ff41da50SThomas Huthuseful for identifying the ``MemoryRegion`` and ``Object`` names used for
166ff41da50SThomas Huthmatching.
167ff41da50SThomas Huth
168ff41da50SThomas HuthAs a generic rule-of-thumb, the more ``MemoryRegions``/Devices we match, the
169ff41da50SThomas Huthgreater the input-space, and the smaller the probability of finding crashing
170ff41da50SThomas Huthinputs for individual devices. As such, it is usually a good idea to limit the
171ff41da50SThomas Huthfuzzer to only a few ``MemoryRegions``.
172ff41da50SThomas Huth
173ff41da50SThomas HuthTo ensure that these env variables have been configured correctly, we can use::
174ff41da50SThomas Huth
175ff41da50SThomas Huth    ./qemu-fuzz-i386 --fuzz-target=generic-fuzz -runs=0
176ff41da50SThomas Huth
177ff41da50SThomas HuthThe output should contain a complete list of matched MemoryRegions.
178ff41da50SThomas Huth
179ff41da50SThomas HuthOSS-Fuzz
180ff41da50SThomas Huth--------
181ff41da50SThomas HuthQEMU is continuously fuzzed on `OSS-Fuzz
182ff41da50SThomas Huth<https://github.com/google/oss-fuzz>`_.  By default, the OSS-Fuzz build
183ff41da50SThomas Huthwill try to fuzz every fuzz-target. Since the generic-fuzz target
184ff41da50SThomas Huthrequires additional information provided in environment variables, we
185ff41da50SThomas Huthpre-define some generic-fuzz configs in
186ff41da50SThomas Huth``tests/qtest/fuzz/generic_fuzz_configs.h``. Each config must specify:
187ff41da50SThomas Huth
188ff41da50SThomas Huth- ``.name``: To identify the fuzzer config
189ff41da50SThomas Huth
190ff41da50SThomas Huth- ``.args`` OR ``.argfunc``: A string or pointer to a function returning a
191ff41da50SThomas Huth  string.  These strings are used to specify the ``QEMU_FUZZ_ARGS``
192ff41da50SThomas Huth  environment variable.  ``argfunc`` is useful when the config relies on e.g.
193ff41da50SThomas Huth  a dynamically created temp directory, or a free tcp/udp port.
194ff41da50SThomas Huth
195ff41da50SThomas Huth- ``.objects``: A string that specifies the ``QEMU_FUZZ_OBJECTS`` environment
196ff41da50SThomas Huth  variable.
197ff41da50SThomas Huth
198ff41da50SThomas HuthTo fuzz additional devices/device configuration on OSS-Fuzz, send patches for
199ff41da50SThomas Hutheither a new device-specific fuzzer or a new generic-fuzz config.
200ff41da50SThomas Huth
201ff41da50SThomas HuthBuild details:
202ff41da50SThomas Huth
203ff41da50SThomas Huth- The Dockerfile that sets up the environment for building QEMU's
204ff41da50SThomas Huth  fuzzers on OSS-Fuzz can be fund in the OSS-Fuzz repository
205ff41da50SThomas Huth  __(https://github.com/google/oss-fuzz/blob/master/projects/qemu/Dockerfile)
206ff41da50SThomas Huth
207ff41da50SThomas Huth- The script responsible for building the fuzzers can be found in the
208ff41da50SThomas Huth  QEMU source tree at ``scripts/oss-fuzz/build.sh``
209ff41da50SThomas Huth
210ff41da50SThomas HuthBuilding Crash Reproducers
211ff41da50SThomas Huth-----------------------------------------
212ff41da50SThomas HuthWhen we find a crash, we should try to create an independent reproducer, that
213ff41da50SThomas Huthcan be used on a non-fuzzer build of QEMU. This filters out any potential
214ff41da50SThomas Huthfalse-positives, and improves the debugging experience for developers.
215ff41da50SThomas HuthHere are the steps for building a reproducer for a crash found by the
216ff41da50SThomas Huthgeneric-fuzz target.
217ff41da50SThomas Huth
218ff41da50SThomas Huth- Ensure the crash reproduces::
219ff41da50SThomas Huth
220ff41da50SThomas Huth    qemu-fuzz-i386 --fuzz-target... ./crash-...
221ff41da50SThomas Huth
222ff41da50SThomas Huth- Gather the QTest output for the crash::
223ff41da50SThomas Huth
224ff41da50SThomas Huth    QEMU_FUZZ_TIMEOUT=0 QTEST_LOG=1 FUZZ_SERIALIZE_QTEST=1 \
225ff41da50SThomas Huth    qemu-fuzz-i386 --fuzz-target... ./crash-... &> /tmp/trace
226ff41da50SThomas Huth
227ff41da50SThomas Huth- Reorder and clean-up the resulting trace::
228ff41da50SThomas Huth
229ff41da50SThomas Huth    scripts/oss-fuzz/reorder_fuzzer_qtest_trace.py /tmp/trace > /tmp/reproducer
230ff41da50SThomas Huth
231ff41da50SThomas Huth- Get the arguments needed to start qemu, and provide a path to qemu::
232ff41da50SThomas Huth
233ff41da50SThomas Huth    less /tmp/trace # The args should be logged at the top of this file
234ff41da50SThomas Huth    export QEMU_ARGS="-machine ..."
235ff41da50SThomas Huth    export QEMU_PATH="path/to/qemu-system"
236ff41da50SThomas Huth
237ff41da50SThomas Huth- Ensure the crash reproduces in qemu-system::
238ff41da50SThomas Huth
239ff41da50SThomas Huth    $QEMU_PATH $QEMU_ARGS -qtest stdio < /tmp/reproducer
240ff41da50SThomas Huth
241ff41da50SThomas Huth- From the crash output, obtain some string that identifies the crash. This
242ff41da50SThomas Huth  can be a line in the stack-trace, for example::
243ff41da50SThomas Huth
244ff41da50SThomas Huth    export CRASH_TOKEN="hw/usb/hcd-xhci.c:1865"
245ff41da50SThomas Huth
246ff41da50SThomas Huth- Minimize the reproducer::
247ff41da50SThomas Huth
248ff41da50SThomas Huth    scripts/oss-fuzz/minimize_qtest_trace.py -M1 -M2 \
249ff41da50SThomas Huth      /tmp/reproducer /tmp/reproducer-minimized
250ff41da50SThomas Huth
251ff41da50SThomas Huth- Confirm that the minimized reproducer still crashes::
252ff41da50SThomas Huth
253ff41da50SThomas Huth    $QEMU_PATH $QEMU_ARGS -qtest stdio < /tmp/reproducer-minimized
254ff41da50SThomas Huth
255ff41da50SThomas Huth- Create a one-liner reproducer that can be sent over email::
256ff41da50SThomas Huth
257ff41da50SThomas Huth    ./scripts/oss-fuzz/output_reproducer.py -bash /tmp/reproducer-minimized
258ff41da50SThomas Huth
259ff41da50SThomas Huth- Output the C source code for a test case that will reproduce the bug::
260ff41da50SThomas Huth
261ff41da50SThomas Huth    ./scripts/oss-fuzz/output_reproducer.py -owner "John Smith <john@smith.com>"\
262ff41da50SThomas Huth      -name "test_function_name" /tmp/reproducer-minimized
263ff41da50SThomas Huth
264ff41da50SThomas Huth- Report the bug and send a patch with the C reproducer upstream
265ff41da50SThomas Huth
266ff41da50SThomas HuthImplementation Details / Fuzzer Lifecycle
267ff41da50SThomas Huth-----------------------------------------
268ff41da50SThomas Huth
269ff41da50SThomas HuthThe fuzzer has two entrypoints that libfuzzer calls. libfuzzer provides it's
270ff41da50SThomas Huthown ``main()``, which performs some setup, and calls the entrypoints:
271ff41da50SThomas Huth
272ff41da50SThomas Huth``LLVMFuzzerInitialize``: called prior to fuzzing. Used to initialize all of the
273ff41da50SThomas Huthnecessary state
274ff41da50SThomas Huth
275ff41da50SThomas Huth``LLVMFuzzerTestOneInput``: called for each fuzzing run. Processes the input and
276ff41da50SThomas Huthresets the state at the end of each run.
277ff41da50SThomas Huth
278ff41da50SThomas HuthIn more detail:
279ff41da50SThomas Huth
280ff41da50SThomas Huth``LLVMFuzzerInitialize`` parses the arguments to the fuzzer (must start with two
281ff41da50SThomas Huthdashes, so they are ignored by libfuzzer ``main()``). Currently, the arguments
282ff41da50SThomas Huthselect the fuzz target. Then, the qtest client is initialized. If the target
283ff41da50SThomas Huthrequires qos, qgraph is set up and the QOM/LIBQOS modules are initialized.
284ff41da50SThomas HuthThen the QGraph is walked and the QEMU cmd_line is determined and saved.
285ff41da50SThomas Huth
286ff41da50SThomas HuthAfter this, the ``vl.c:main`` is called to set up the guest. There are
287ff41da50SThomas Huthtarget-specific hooks that can be called before and after main, for
288ff41da50SThomas Huthadditional setup(e.g. PCI setup, or VM snapshotting).
289ff41da50SThomas Huth
290ff41da50SThomas Huth``LLVMFuzzerTestOneInput``: Uses qtest/qos functions to act based on the fuzz
291ff41da50SThomas Huthinput. It is also responsible for manually calling ``main_loop_wait`` to ensure
292ff41da50SThomas Huththat bottom halves are executed and any cleanup required before the next input.
293ff41da50SThomas Huth
294ff41da50SThomas HuthSince the same process is reused for many fuzzing runs, QEMU state needs to
295ff41da50SThomas Huthbe reset at the end of each run. For example, this can be done by rebooting the
296ff41da50SThomas HuthVM, after each run.
297ff41da50SThomas Huth
298ff41da50SThomas Huth  - *Pros*: Straightforward and fast for simple fuzz targets.
299ff41da50SThomas Huth
300ff41da50SThomas Huth  - *Cons*: Depending on the device, does not reset all device state. If the
301ff41da50SThomas Huth    device requires some initialization prior to being ready for fuzzing (common
302ff41da50SThomas Huth    for QOS-based targets), this initialization needs to be done after each
303ff41da50SThomas Huth    reboot.
304ff41da50SThomas Huth
305ff41da50SThomas Huth  - *Example target*: ``i440fx-qtest-reboot-fuzz``
306