1===========================================
2Fault injection capabilities infrastructure
3===========================================
4
5See also drivers/md/md-faulty.c and "every_nth" module option for scsi_debug.
6
7
8Available fault injection capabilities
9--------------------------------------
10
11- failslab
12
13  injects slab allocation failures. (kmalloc(), kmem_cache_alloc(), ...)
14
15- fail_page_alloc
16
17  injects page allocation failures. (alloc_pages(), get_free_pages(), ...)
18
19- fail_usercopy
20
21  injects failures in user memory access functions. (copy_from_user(), get_user(), ...)
22
23- fail_futex
24
25  injects futex deadlock and uaddr fault errors.
26
27- fail_sunrpc
28
29  injects kernel RPC client and server failures.
30
31- fail_make_request
32
33  injects disk IO errors on devices permitted by setting
34  /sys/block/<device>/make-it-fail or
35  /sys/block/<device>/<partition>/make-it-fail. (submit_bio_noacct())
36
37- fail_mmc_request
38
39  injects MMC data errors on devices permitted by setting
40  debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
41
42- fail_function
43
44  injects error return on specific functions, which are marked by
45  ALLOW_ERROR_INJECTION() macro, by setting debugfs entries
46  under /sys/kernel/debug/fail_function. No boot option supported.
47
48- NVMe fault injection
49
50  inject NVMe status code and retry flag on devices permitted by setting
51  debugfs entries under /sys/kernel/debug/nvme*/fault_inject. The default
52  status code is NVME_SC_INVALID_OPCODE with no retry. The status code and
53  retry flag can be set via the debugfs.
54
55
56Configure fault-injection capabilities behavior
57-----------------------------------------------
58
59debugfs entries
60^^^^^^^^^^^^^^^
61
62fault-inject-debugfs kernel module provides some debugfs entries for runtime
63configuration of fault-injection capabilities.
64
65- /sys/kernel/debug/fail*/probability:
66
67	likelihood of failure injection, in percent.
68
69	Format: <percent>
70
71	Note that one-failure-per-hundred is a very high error rate
72	for some testcases.  Consider setting probability=100 and configure
73	/sys/kernel/debug/fail*/interval for such testcases.
74
75- /sys/kernel/debug/fail*/interval:
76
77	specifies the interval between failures, for calls to
78	should_fail() that pass all the other tests.
79
80	Note that if you enable this, by setting interval>1, you will
81	probably want to set probability=100.
82
83- /sys/kernel/debug/fail*/times:
84
85	specifies how many times failures may happen at most. A value of -1
86	means "no limit".
87
88- /sys/kernel/debug/fail*/space:
89
90	specifies an initial resource "budget", decremented by "size"
91	on each call to should_fail(,size).  Failure injection is
92	suppressed until "space" reaches zero.
93
94- /sys/kernel/debug/fail*/verbose
95
96	Format: { 0 | 1 | 2 }
97
98	specifies the verbosity of the messages when failure is
99	injected.  '0' means no messages; '1' will print only a single
100	log line per failure; '2' will print a call trace too -- useful
101	to debug the problems revealed by fault injection.
102
103- /sys/kernel/debug/fail*/task-filter:
104
105	Format: { 'Y' | 'N' }
106
107	A value of 'N' disables filtering by process (default).
108	Any positive value limits failures to only processes indicated by
109	/proc/<pid>/make-it-fail==1.
110
111- /sys/kernel/debug/fail*/require-start,
112  /sys/kernel/debug/fail*/require-end,
113  /sys/kernel/debug/fail*/reject-start,
114  /sys/kernel/debug/fail*/reject-end:
115
116	specifies the range of virtual addresses tested during
117	stacktrace walking.  Failure is injected only if some caller
118	in the walked stacktrace lies within the required range, and
119	none lies within the rejected range.
120	Default required range is [0,ULONG_MAX) (whole of virtual address space).
121	Default rejected range is [0,0).
122
123- /sys/kernel/debug/fail*/stacktrace-depth:
124
125	specifies the maximum stacktrace depth walked during search
126	for a caller within [require-start,require-end) OR
127	[reject-start,reject-end).
128
129- /sys/kernel/debug/fail_page_alloc/ignore-gfp-highmem:
130
131	Format: { 'Y' | 'N' }
132
133	default is 'Y', setting it to 'N' will also inject failures into
134	highmem/user allocations (__GFP_HIGHMEM allocations).
135
136- /sys/kernel/debug/failslab/ignore-gfp-wait:
137- /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait:
138
139	Format: { 'Y' | 'N' }
140
141	default is 'Y', setting it to 'N' will also inject failures
142	into allocations that can sleep (__GFP_DIRECT_RECLAIM allocations).
143
144- /sys/kernel/debug/fail_page_alloc/min-order:
145
146	specifies the minimum page allocation order to be injected
147	failures.
148
149- /sys/kernel/debug/fail_futex/ignore-private:
150
151	Format: { 'Y' | 'N' }
152
153	default is 'N', setting it to 'Y' will disable failure injections
154	when dealing with private (address space) futexes.
155
156- /sys/kernel/debug/fail_sunrpc/ignore-client-disconnect:
157
158	Format: { 'Y' | 'N' }
159
160	default is 'N', setting it to 'Y' will disable disconnect
161	injection on the RPC client.
162
163- /sys/kernel/debug/fail_sunrpc/ignore-server-disconnect:
164
165	Format: { 'Y' | 'N' }
166
167	default is 'N', setting it to 'Y' will disable disconnect
168	injection on the RPC server.
169
170- /sys/kernel/debug/fail_sunrpc/ignore-cache-wait:
171
172	Format: { 'Y' | 'N' }
173
174	default is 'N', setting it to 'Y' will disable cache wait
175	injection on the RPC server.
176
177- /sys/kernel/debug/fail_function/inject:
178
179	Format: { 'function-name' | '!function-name' | '' }
180
181	specifies the target function of error injection by name.
182	If the function name leads '!' prefix, given function is
183	removed from injection list. If nothing specified ('')
184	injection list is cleared.
185
186- /sys/kernel/debug/fail_function/injectable:
187
188	(read only) shows error injectable functions and what type of
189	error values can be specified. The error type will be one of
190	below;
191	- NULL:	retval must be 0.
192	- ERRNO: retval must be -1 to -MAX_ERRNO (-4096).
193	- ERR_NULL: retval must be 0 or -1 to -MAX_ERRNO (-4096).
194
195- /sys/kernel/debug/fail_function/<function-name>/retval:
196
197	specifies the "error" return value to inject to the given function.
198	This will be created when the user specifies a new injection entry.
199	Note that this file only accepts unsigned values. So, if you want to
200	use a negative errno, you better use 'printf' instead of 'echo', e.g.:
201	$ printf %#x -12 > retval
202
203Boot option
204^^^^^^^^^^^
205
206In order to inject faults while debugfs is not available (early boot time),
207use the boot option::
208
209	failslab=
210	fail_page_alloc=
211	fail_usercopy=
212	fail_make_request=
213	fail_futex=
214	mmc_core.fail_request=<interval>,<probability>,<space>,<times>
215
216proc entries
217^^^^^^^^^^^^
218
219- /proc/<pid>/fail-nth,
220  /proc/self/task/<tid>/fail-nth:
221
222	Write to this file of integer N makes N-th call in the task fail.
223	Read from this file returns a integer value. A value of '0' indicates
224	that the fault setup with a previous write to this file was injected.
225	A positive integer N indicates that the fault wasn't yet injected.
226	Note that this file enables all types of faults (slab, futex, etc).
227	This setting takes precedence over all other generic debugfs settings
228	like probability, interval, times, etc. But per-capability settings
229	(e.g. fail_futex/ignore-private) take precedence over it.
230
231	This feature is intended for systematic testing of faults in a single
232	system call. See an example below.
233
234
235Error Injectable Functions
236--------------------------
237
238This part is for the kenrel developers considering to add a function to
239ALLOW_ERROR_INJECTION() macro.
240
241Requirements for the Error Injectable Functions
242^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
243
244Since the function-level error injection forcibly changes the code path
245and returns an error even if the input and conditions are proper, this can
246cause unexpected kernel crash if you allow error injection on the function
247which is NOT error injectable. Thus, you (and reviewers) must ensure;
248
249- The function returns an error code if it fails, and the callers must check
250  it correctly (need to recover from it).
251
252- The function does not execute any code which can change any state before
253  the first error return. The state includes global or local, or input
254  variable. For example, clear output address storage (e.g. `*ret = NULL`),
255  increments/decrements counter, set a flag, preempt/irq disable or get
256  a lock (if those are recovered before returning error, that will be OK.)
257
258The first requirement is important, and it will result in that the release
259(free objects) functions are usually harder to inject errors than allocate
260functions. If errors of such release functions are not correctly handled
261it will cause a memory leak easily (the caller will confuse that the object
262has been released or corrupted.)
263
264The second one is for the caller which expects the function should always
265does something. Thus if the function error injection skips whole of the
266function, the expectation is betrayed and causes an unexpected error.
267
268Type of the Error Injectable Functions
269^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
270
271Each error injectable functions will have the error type specified by the
272ALLOW_ERROR_INJECTION() macro. You have to choose it carefully if you add
273a new error injectable function. If the wrong error type is chosen, the
274kernel may crash because it may not be able to handle the error.
275There are 4 types of errors defined in include/asm-generic/error-injection.h
276
277EI_ETYPE_NULL
278  This function will return `NULL` if it fails. e.g. return an allocateed
279  object address.
280
281EI_ETYPE_ERRNO
282  This function will return an `-errno` error code if it fails. e.g. return
283  -EINVAL if the input is wrong. This will include the functions which will
284  return an address which encodes `-errno` by ERR_PTR() macro.
285
286EI_ETYPE_ERRNO_NULL
287  This function will return an `-errno` or `NULL` if it fails. If the caller
288  of this function checks the return value with IS_ERR_OR_NULL() macro, this
289  type will be appropriate.
290
291EI_ETYPE_TRUE
292  This function will return `true` (non-zero positive value) if it fails.
293
294If you specifies a wrong type, for example, EI_TYPE_ERRNO for the function
295which returns an allocated object, it may cause a problem because the returned
296value is not an object address and the caller can not access to the address.
297
298
299How to add new fault injection capability
300-----------------------------------------
301
302- #include <linux/fault-inject.h>
303
304- define the fault attributes
305
306  DECLARE_FAULT_ATTR(name);
307
308  Please see the definition of struct fault_attr in fault-inject.h
309  for details.
310
311- provide a way to configure fault attributes
312
313- boot option
314
315  If you need to enable the fault injection capability from boot time, you can
316  provide boot option to configure it. There is a helper function for it:
317
318	setup_fault_attr(attr, str);
319
320- debugfs entries
321
322  failslab, fail_page_alloc, fail_usercopy, and fail_make_request use this way.
323  Helper functions:
324
325	fault_create_debugfs_attr(name, parent, attr);
326
327- module parameters
328
329  If the scope of the fault injection capability is limited to a
330  single kernel module, it is better to provide module parameters to
331  configure the fault attributes.
332
333- add a hook to insert failures
334
335  Upon should_fail() returning true, client code should inject a failure:
336
337	should_fail(attr, size);
338
339Application Examples
340--------------------
341
342- Inject slab allocation failures into module init/exit code::
343
344    #!/bin/bash
345
346    FAILTYPE=failslab
347    echo Y > /sys/kernel/debug/$FAILTYPE/task-filter
348    echo 10 > /sys/kernel/debug/$FAILTYPE/probability
349    echo 100 > /sys/kernel/debug/$FAILTYPE/interval
350    echo -1 > /sys/kernel/debug/$FAILTYPE/times
351    echo 0 > /sys/kernel/debug/$FAILTYPE/space
352    echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
353    echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
354
355    faulty_system()
356    {
357	bash -c "echo 1 > /proc/self/make-it-fail && exec $*"
358    }
359
360    if [ $# -eq 0 ]
361    then
362	echo "Usage: $0 modulename [ modulename ... ]"
363	exit 1
364    fi
365
366    for m in $*
367    do
368	echo inserting $m...
369	faulty_system modprobe $m
370
371	echo removing $m...
372	faulty_system modprobe -r $m
373    done
374
375------------------------------------------------------------------------------
376
377- Inject page allocation failures only for a specific module::
378
379    #!/bin/bash
380
381    FAILTYPE=fail_page_alloc
382    module=$1
383
384    if [ -z $module ]
385    then
386	echo "Usage: $0 <modulename>"
387	exit 1
388    fi
389
390    modprobe $module
391
392    if [ ! -d /sys/module/$module/sections ]
393    then
394	echo Module $module is not loaded
395	exit 1
396    fi
397
398    cat /sys/module/$module/sections/.text > /sys/kernel/debug/$FAILTYPE/require-start
399    cat /sys/module/$module/sections/.data > /sys/kernel/debug/$FAILTYPE/require-end
400
401    echo N > /sys/kernel/debug/$FAILTYPE/task-filter
402    echo 10 > /sys/kernel/debug/$FAILTYPE/probability
403    echo 100 > /sys/kernel/debug/$FAILTYPE/interval
404    echo -1 > /sys/kernel/debug/$FAILTYPE/times
405    echo 0 > /sys/kernel/debug/$FAILTYPE/space
406    echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
407    echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
408    echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem
409    echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth
410
411    trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
412
413    echo "Injecting errors into the module $module... (interrupt to stop)"
414    sleep 1000000
415
416------------------------------------------------------------------------------
417
418- Inject open_ctree error while btrfs mount::
419
420    #!/bin/bash
421
422    rm -f testfile.img
423    dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
424    DEVICE=$(losetup --show -f testfile.img)
425    mkfs.btrfs -f $DEVICE
426    mkdir -p tmpmnt
427
428    FAILTYPE=fail_function
429    FAILFUNC=open_ctree
430    echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject
431    printf %#x -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval
432    echo N > /sys/kernel/debug/$FAILTYPE/task-filter
433    echo 100 > /sys/kernel/debug/$FAILTYPE/probability
434    echo 0 > /sys/kernel/debug/$FAILTYPE/interval
435    echo -1 > /sys/kernel/debug/$FAILTYPE/times
436    echo 0 > /sys/kernel/debug/$FAILTYPE/space
437    echo 1 > /sys/kernel/debug/$FAILTYPE/verbose
438
439    mount -t btrfs $DEVICE tmpmnt
440    if [ $? -ne 0 ]
441    then
442	echo "SUCCESS!"
443    else
444	echo "FAILED!"
445	umount tmpmnt
446    fi
447
448    echo > /sys/kernel/debug/$FAILTYPE/inject
449
450    rmdir tmpmnt
451    losetup -d $DEVICE
452    rm testfile.img
453
454
455Tool to run command with failslab or fail_page_alloc
456----------------------------------------------------
457In order to make it easier to accomplish the tasks mentioned above, we can use
458tools/testing/fault-injection/failcmd.sh.  Please run a command
459"./tools/testing/fault-injection/failcmd.sh --help" for more information and
460see the following examples.
461
462Examples:
463
464Run a command "make -C tools/testing/selftests/ run_tests" with injecting slab
465allocation failure::
466
467	# ./tools/testing/fault-injection/failcmd.sh \
468		-- make -C tools/testing/selftests/ run_tests
469
470Same as above except to specify 100 times failures at most instead of one time
471at most by default::
472
473	# ./tools/testing/fault-injection/failcmd.sh --times=100 \
474		-- make -C tools/testing/selftests/ run_tests
475
476Same as above except to inject page allocation failure instead of slab
477allocation failure::
478
479	# env FAILCMD_TYPE=fail_page_alloc \
480		./tools/testing/fault-injection/failcmd.sh --times=100 \
481		-- make -C tools/testing/selftests/ run_tests
482
483Systematic faults using fail-nth
484---------------------------------
485
486The following code systematically faults 0-th, 1-st, 2-nd and so on
487capabilities in the socketpair() system call::
488
489  #include <sys/types.h>
490  #include <sys/stat.h>
491  #include <sys/socket.h>
492  #include <sys/syscall.h>
493  #include <fcntl.h>
494  #include <unistd.h>
495  #include <string.h>
496  #include <stdlib.h>
497  #include <stdio.h>
498  #include <errno.h>
499
500  int main()
501  {
502	int i, err, res, fail_nth, fds[2];
503	char buf[128];
504
505	system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
506	sprintf(buf, "/proc/self/task/%ld/fail-nth", syscall(SYS_gettid));
507	fail_nth = open(buf, O_RDWR);
508	for (i = 1;; i++) {
509		sprintf(buf, "%d", i);
510		write(fail_nth, buf, strlen(buf));
511		res = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
512		err = errno;
513		pread(fail_nth, buf, sizeof(buf), 0);
514		if (res == 0) {
515			close(fds[0]);
516			close(fds[1]);
517		}
518		printf("%d-th fault %c: res=%d/%d\n", i, atoi(buf) ? 'N' : 'Y',
519			res, err);
520		if (atoi(buf))
521			break;
522	}
523	return 0;
524  }
525
526An example output::
527
528	1-th fault Y: res=-1/23
529	2-th fault Y: res=-1/23
530	3-th fault Y: res=-1/12
531	4-th fault Y: res=-1/12
532	5-th fault Y: res=-1/23
533	6-th fault Y: res=-1/23
534	7-th fault Y: res=-1/23
535	8-th fault Y: res=-1/12
536	9-th fault Y: res=-1/12
537	10-th fault Y: res=-1/12
538	11-th fault Y: res=-1/12
539	12-th fault Y: res=-1/12
540	13-th fault Y: res=-1/12
541	14-th fault Y: res=-1/12
542	15-th fault Y: res=-1/12
543	16-th fault N: res=0/12
544