xref: /openbmc/qemu/docs/devel/testing/blkdebug.rst (revision d5f42aac)
1*d5f42aacSPeter MaydellBlock I/O error injection using ``blkdebug``
2*d5f42aacSPeter Maydell============================================
3*d5f42aacSPeter Maydell
4*d5f42aacSPeter Maydell..
5*d5f42aacSPeter Maydell   Copyright (C) 2014-2015 Red Hat Inc
6*d5f42aacSPeter Maydell
7*d5f42aacSPeter Maydell   This work is licensed under the terms of the GNU GPL, version 2 or later.  See
8*d5f42aacSPeter Maydell   the COPYING file in the top-level directory.
9*d5f42aacSPeter Maydell
10*d5f42aacSPeter MaydellThe ``blkdebug`` block driver is a rule-based error injection engine.  It can be
11*d5f42aacSPeter Maydellused to exercise error code paths in block drivers including ``ENOSPC`` (out of
12*d5f42aacSPeter Maydellspace) and ``EIO``.
13*d5f42aacSPeter Maydell
14*d5f42aacSPeter MaydellThis document gives an overview of the features available in ``blkdebug``.
15*d5f42aacSPeter Maydell
16*d5f42aacSPeter MaydellBackground
17*d5f42aacSPeter Maydell----------
18*d5f42aacSPeter MaydellBlock drivers have many error code paths that handle I/O errors.  Image formats
19*d5f42aacSPeter Maydellare especially complex since metadata I/O errors during cluster allocation or
20*d5f42aacSPeter Maydellwhile updating tables happen halfway through request processing and require
21*d5f42aacSPeter Maydelldiscipline to keep image files consistent.
22*d5f42aacSPeter Maydell
23*d5f42aacSPeter MaydellError injection allows test cases to trigger I/O errors at specific points.
24*d5f42aacSPeter MaydellThis way, all error paths can be tested to make sure they are correct.
25*d5f42aacSPeter Maydell
26*d5f42aacSPeter MaydellRules
27*d5f42aacSPeter Maydell-----
28*d5f42aacSPeter MaydellThe ``blkdebug`` block driver takes a list of "rules" that tell the error injection
29*d5f42aacSPeter Maydellengine when to fail an I/O request.
30*d5f42aacSPeter Maydell
31*d5f42aacSPeter MaydellEach I/O request is evaluated against the rules.  If a rule matches the request
32*d5f42aacSPeter Maydellthen its "action" is executed.
33*d5f42aacSPeter Maydell
34*d5f42aacSPeter MaydellRules can be placed in a configuration file; the configuration file
35*d5f42aacSPeter Maydellfollows the same .ini-like format used by QEMU's ``-readconfig`` option, and
36*d5f42aacSPeter Maydelleach section of the file represents a rule.
37*d5f42aacSPeter Maydell
38*d5f42aacSPeter MaydellThe following configuration file defines a single rule::
39*d5f42aacSPeter Maydell
40*d5f42aacSPeter Maydell  $ cat blkdebug.conf
41*d5f42aacSPeter Maydell  [inject-error]
42*d5f42aacSPeter Maydell  event = "read_aio"
43*d5f42aacSPeter Maydell  errno = "28"
44*d5f42aacSPeter Maydell
45*d5f42aacSPeter MaydellThis rule fails all aio read requests with ``ENOSPC`` (28).  Note that the errno
46*d5f42aacSPeter Maydellvalue depends on the host.  On Linux, see
47*d5f42aacSPeter Maydell``/usr/include/asm-generic/errno-base.h`` for errno values.
48*d5f42aacSPeter Maydell
49*d5f42aacSPeter MaydellInvoke QEMU as follows::
50*d5f42aacSPeter Maydell
51*d5f42aacSPeter Maydell  $ qemu-system-x86_64
52*d5f42aacSPeter Maydell        -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
53*d5f42aacSPeter Maydell        -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
54*d5f42aacSPeter Maydell
55*d5f42aacSPeter MaydellRules support the following attributes:
56*d5f42aacSPeter Maydell
57*d5f42aacSPeter Maydell``event``
58*d5f42aacSPeter Maydell  which type of operation to match (e.g. ``read_aio``, ``write_aio``,
59*d5f42aacSPeter Maydell  ``flush_to_os``, ``flush_to_disk``).  See `Events`_ for
60*d5f42aacSPeter Maydell  information on events.
61*d5f42aacSPeter Maydell
62*d5f42aacSPeter Maydell``state``
63*d5f42aacSPeter Maydell  (optional) the engine must be in this state number in order for this
64*d5f42aacSPeter Maydell  rule to match.  See `State transitions`_ for information
65*d5f42aacSPeter Maydell  on states.
66*d5f42aacSPeter Maydell
67*d5f42aacSPeter Maydell``errno``
68*d5f42aacSPeter Maydell  the numeric errno value to return when a request matches this rule.
69*d5f42aacSPeter Maydell  The errno values depend on the host since the numeric values are not
70*d5f42aacSPeter Maydell  standardized in the POSIX specification.
71*d5f42aacSPeter Maydell
72*d5f42aacSPeter Maydell``sector``
73*d5f42aacSPeter Maydell  (optional) a sector number that the request must overlap in order to
74*d5f42aacSPeter Maydell  match this rule
75*d5f42aacSPeter Maydell
76*d5f42aacSPeter Maydell``once``
77*d5f42aacSPeter Maydell  (optional, default ``off``) only execute this action on the first
78*d5f42aacSPeter Maydell  matching request
79*d5f42aacSPeter Maydell
80*d5f42aacSPeter Maydell``immediately``
81*d5f42aacSPeter Maydell  (optional, default ``off``) return a NULL ``BlockAIOCB``
82*d5f42aacSPeter Maydell  pointer and fail without an errno instead.  This
83*d5f42aacSPeter Maydell  exercises the code path where ``BlockAIOCB`` fails and the
84*d5f42aacSPeter Maydell  caller's ``BlockCompletionFunc`` is not invoked.
85*d5f42aacSPeter Maydell
86*d5f42aacSPeter MaydellEvents
87*d5f42aacSPeter Maydell------
88*d5f42aacSPeter MaydellBlock drivers provide information about the type of I/O request they are about
89*d5f42aacSPeter Maydellto make so rules can match specific types of requests.  For example, the ``qcow2``
90*d5f42aacSPeter Maydellblock driver tells ``blkdebug`` when it accesses the L1 table so rules can match
91*d5f42aacSPeter Maydellonly L1 table accesses and not other metadata or guest data requests.
92*d5f42aacSPeter Maydell
93*d5f42aacSPeter MaydellThe core events are:
94*d5f42aacSPeter Maydell
95*d5f42aacSPeter Maydell``read_aio``
96*d5f42aacSPeter Maydell  guest data read
97*d5f42aacSPeter Maydell
98*d5f42aacSPeter Maydell``write_aio``
99*d5f42aacSPeter Maydell  guest data write
100*d5f42aacSPeter Maydell
101*d5f42aacSPeter Maydell``flush_to_os``
102*d5f42aacSPeter Maydell  write out unwritten block driver state (e.g. cached metadata)
103*d5f42aacSPeter Maydell
104*d5f42aacSPeter Maydell``flush_to_disk``
105*d5f42aacSPeter Maydell  flush the host block device's disk cache
106*d5f42aacSPeter Maydell
107*d5f42aacSPeter MaydellSee ``qapi/block-core.json:BlkdebugEvent`` for the full list of events.
108*d5f42aacSPeter MaydellYou may need to grep block driver source code to understand the
109*d5f42aacSPeter Maydellmeaning of specific events.
110*d5f42aacSPeter Maydell
111*d5f42aacSPeter MaydellState transitions
112*d5f42aacSPeter Maydell-----------------
113*d5f42aacSPeter MaydellThere are cases where more power is needed to match a particular I/O request in
114*d5f42aacSPeter Maydella longer sequence of requests.  For example::
115*d5f42aacSPeter Maydell
116*d5f42aacSPeter Maydell  write_aio
117*d5f42aacSPeter Maydell  flush_to_disk
118*d5f42aacSPeter Maydell  write_aio
119*d5f42aacSPeter Maydell
120*d5f42aacSPeter MaydellHow do we match the 2nd ``write_aio`` but not the first?  This is where state
121*d5f42aacSPeter Maydelltransitions come in.
122*d5f42aacSPeter Maydell
123*d5f42aacSPeter MaydellThe error injection engine has an integer called the "state" that always starts
124*d5f42aacSPeter Maydellinitialized to 1.  The state integer is internal to ``blkdebug`` and cannot be
125*d5f42aacSPeter Maydellobserved from outside but rules can interact with it for powerful matching
126*d5f42aacSPeter Maydellbehavior.
127*d5f42aacSPeter Maydell
128*d5f42aacSPeter MaydellRules can be conditional on the current state and they can transition to a new
129*d5f42aacSPeter Maydellstate.
130*d5f42aacSPeter Maydell
131*d5f42aacSPeter MaydellWhen a rule's "state" attribute is non-zero then the current state must equal
132*d5f42aacSPeter Maydellthe attribute in order for the rule to match.
133*d5f42aacSPeter Maydell
134*d5f42aacSPeter MaydellFor example, to match the 2nd write_aio::
135*d5f42aacSPeter Maydell
136*d5f42aacSPeter Maydell  [set-state]
137*d5f42aacSPeter Maydell  event = "write_aio"
138*d5f42aacSPeter Maydell  state = "1"
139*d5f42aacSPeter Maydell  new_state = "2"
140*d5f42aacSPeter Maydell
141*d5f42aacSPeter Maydell  [inject-error]
142*d5f42aacSPeter Maydell  event = "write_aio"
143*d5f42aacSPeter Maydell  state = "2"
144*d5f42aacSPeter Maydell  errno = "5"
145*d5f42aacSPeter Maydell
146*d5f42aacSPeter MaydellThe first ``write_aio`` request matches the ``set-state`` rule and transitions from
147*d5f42aacSPeter Maydellstate 1 to state 2.  Once state 2 has been entered, the ``set-state`` rule no
148*d5f42aacSPeter Maydelllonger matches since it requires state 1.  But the ``inject-error`` rule now
149*d5f42aacSPeter Maydellmatches the next ``write_aio`` request and injects ``EIO`` (5).
150*d5f42aacSPeter Maydell
151*d5f42aacSPeter MaydellState transition rules support the following attributes:
152*d5f42aacSPeter Maydell
153*d5f42aacSPeter Maydell``event``
154*d5f42aacSPeter Maydell  which type of operation to match (e.g. ``read_aio``, ``write_aio``,
155*d5f42aacSPeter Maydell  ``flush_to_os`, ``flush_to_disk``).  See `Events`_ for
156*d5f42aacSPeter Maydell  information on events.
157*d5f42aacSPeter Maydell
158*d5f42aacSPeter Maydell``state``
159*d5f42aacSPeter Maydell  (optional) the engine must be in this state number in order for this
160*d5f42aacSPeter Maydell  rule to match
161*d5f42aacSPeter Maydell
162*d5f42aacSPeter Maydell``new_state``
163*d5f42aacSPeter Maydell  transition to this state number
164*d5f42aacSPeter Maydell
165*d5f42aacSPeter MaydellSuspend and resume
166*d5f42aacSPeter Maydell------------------
167*d5f42aacSPeter MaydellExercising code paths in block drivers may require specific ordering amongst
168*d5f42aacSPeter Maydellconcurrent requests.  The "breakpoint" feature allows requests to be halted on
169*d5f42aacSPeter Maydella ``blkdebug`` event and resumed later.  This makes it possible to achieve
170*d5f42aacSPeter Maydelldeterministic ordering when multiple requests are in flight.
171*d5f42aacSPeter Maydell
172*d5f42aacSPeter MaydellBreakpoints on ``blkdebug`` events are associated with a user-defined ``tag`` string.
173*d5f42aacSPeter MaydellThis tag serves as an identifier by which the request can be resumed at a later
174*d5f42aacSPeter Maydellpoint.
175*d5f42aacSPeter Maydell
176*d5f42aacSPeter MaydellSee the ``qemu-io(1)`` ``break``, ``resume``, ``remove_break``, and ``wait_break``
177*d5f42aacSPeter Maydellcommands for details.
178