1c3123552SMauro Carvalho Chehab================
2c3123552SMauro Carvalho ChehabDelay accounting
3c3123552SMauro Carvalho Chehab================
4c3123552SMauro Carvalho Chehab
5c3123552SMauro Carvalho ChehabTasks encounter delays in execution when they wait
6c3123552SMauro Carvalho Chehabfor some kernel resource to become available e.g. a
7c3123552SMauro Carvalho Chehabrunnable task may wait for a free CPU to run on.
8c3123552SMauro Carvalho Chehab
9c3123552SMauro Carvalho ChehabThe per-task delay accounting functionality measures
10c3123552SMauro Carvalho Chehabthe delays experienced by a task while
11c3123552SMauro Carvalho Chehab
12c3123552SMauro Carvalho Chehaba) waiting for a CPU (while being runnable)
13c3123552SMauro Carvalho Chehabb) completion of synchronous block I/O initiated by the task
14c3123552SMauro Carvalho Chehabc) swapping in pages
15c3123552SMauro Carvalho Chehabd) memory reclaim
16f347c9d2SYang Yange) thrashing
17ec710aa8Swangyongf) direct compact
18662ce1dcSYang Yangg) write-protect copy
19*a3b2aeacSYang Yangh) IRQ/SOFTIRQ
20c3123552SMauro Carvalho Chehab
21c3123552SMauro Carvalho Chehaband makes these statistics available to userspace through
22c3123552SMauro Carvalho Chehabthe taskstats interface.
23c3123552SMauro Carvalho Chehab
24c3123552SMauro Carvalho ChehabSuch delays provide feedback for setting a task's cpu priority,
25c3123552SMauro Carvalho Chehabio priority and rss limit values appropriately. Long delays for
26c3123552SMauro Carvalho Chehabimportant tasks could be a trigger for raising its corresponding priority.
27c3123552SMauro Carvalho Chehab
28c3123552SMauro Carvalho ChehabThe functionality, through its use of the taskstats interface, also provides
29c3123552SMauro Carvalho Chehabdelay statistics aggregated for all tasks (or threads) belonging to a
30c3123552SMauro Carvalho Chehabthread group (corresponding to a traditional Unix process). This is a commonly
31c3123552SMauro Carvalho Chehabneeded aggregation that is more efficiently done by the kernel.
32c3123552SMauro Carvalho Chehab
33c3123552SMauro Carvalho ChehabUserspace utilities, particularly resource management applications, can also
34c3123552SMauro Carvalho Chehabaggregate delay statistics into arbitrary groups. To enable this, delay
35c3123552SMauro Carvalho Chehabstatistics of a task are available both during its lifetime as well as on its
36c3123552SMauro Carvalho Chehabexit, ensuring continuous and complete monitoring can be done.
37c3123552SMauro Carvalho Chehab
38c3123552SMauro Carvalho Chehab
39c3123552SMauro Carvalho ChehabInterface
40c3123552SMauro Carvalho Chehab---------
41c3123552SMauro Carvalho Chehab
42c3123552SMauro Carvalho ChehabDelay accounting uses the taskstats interface which is described
43c3123552SMauro Carvalho Chehabin detail in a separate document in this directory. Taskstats returns a
44c3123552SMauro Carvalho Chehabgeneric data structure to userspace corresponding to per-pid and per-tgid
45c3123552SMauro Carvalho Chehabstatistics. The delay accounting functionality populates specific fields of
46c3123552SMauro Carvalho Chehabthis structure. See
47c3123552SMauro Carvalho Chehab
48ec710aa8Swangyong     include/uapi/linux/taskstats.h
49c3123552SMauro Carvalho Chehab
50c3123552SMauro Carvalho Chehabfor a description of the fields pertaining to delay accounting.
51c3123552SMauro Carvalho ChehabIt will generally be in the form of counters returning the cumulative
52ec710aa8Swangyongdelay seen for cpu, sync block I/O, swapin, memory reclaim, thrash page
53*a3b2aeacSYang Yangcache, direct compact, write-protect copy, IRQ/SOFTIRQ etc.
54c3123552SMauro Carvalho Chehab
55c3123552SMauro Carvalho ChehabTaking the difference of two successive readings of a given
56c3123552SMauro Carvalho Chehabcounter (say cpu_delay_total) for a task will give the delay
57c3123552SMauro Carvalho Chehabexperienced by the task waiting for the corresponding resource
58c3123552SMauro Carvalho Chehabin that interval.
59c3123552SMauro Carvalho Chehab
60c3123552SMauro Carvalho ChehabWhen a task exits, records containing the per-task statistics
61c3123552SMauro Carvalho Chehabare sent to userspace without requiring a command. If it is the last exiting
62c3123552SMauro Carvalho Chehabtask of a thread group, the per-tgid statistics are also sent. More details
63c3123552SMauro Carvalho Chehabare given in the taskstats interface description.
64c3123552SMauro Carvalho Chehab
65c3123552SMauro Carvalho ChehabThe getdelays.c userspace utility in tools/accounting directory allows simple
66c3123552SMauro Carvalho Chehabcommands to be run and the corresponding delay statistics to be displayed. It
67c3123552SMauro Carvalho Chehabalso serves as an example of using the taskstats interface.
68c3123552SMauro Carvalho Chehab
69c3123552SMauro Carvalho ChehabUsage
70c3123552SMauro Carvalho Chehab-----
71c3123552SMauro Carvalho Chehab
72c3123552SMauro Carvalho ChehabCompile the kernel with::
73c3123552SMauro Carvalho Chehab
74c3123552SMauro Carvalho Chehab	CONFIG_TASK_DELAY_ACCT=y
75c3123552SMauro Carvalho Chehab	CONFIG_TASKSTATS=y
76c3123552SMauro Carvalho Chehab
77e4042ad4SPeter ZijlstraDelay accounting is disabled by default at boot up.
78e4042ad4SPeter ZijlstraTo enable, add::
79c3123552SMauro Carvalho Chehab
80e4042ad4SPeter Zijlstra   delayacct
81c3123552SMauro Carvalho Chehab
820cd7c741SPeter Zijlstrato the kernel boot options. The rest of the instructions below assume this has
830cd7c741SPeter Zijlstrabeen done. Alternatively, use sysctl kernel.task_delayacct to switch the state
840cd7c741SPeter Zijlstraat runtime. Note however that only tasks started after enabling it will have
850cd7c741SPeter Zijlstradelayacct information.
86c3123552SMauro Carvalho Chehab
87c3123552SMauro Carvalho ChehabAfter the system has booted up, use a utility
88c3123552SMauro Carvalho Chehabsimilar to  getdelays.c to access the delays
89c3123552SMauro Carvalho Chehabseen by a given task or a task group (tgid).
90c3123552SMauro Carvalho ChehabThe utility also allows a given command to be
91c3123552SMauro Carvalho Chehabexecuted and the corresponding delays to be
92c3123552SMauro Carvalho Chehabseen.
93c3123552SMauro Carvalho Chehab
94c3123552SMauro Carvalho ChehabGeneral format of the getdelays command::
95c3123552SMauro Carvalho Chehab
96ec710aa8Swangyong	getdelays [-dilv] [-t tgid] [-p pid]
97c3123552SMauro Carvalho Chehab
98c3123552SMauro Carvalho ChehabGet delays, since system boot, for pid 10::
99c3123552SMauro Carvalho Chehab
100ec710aa8Swangyong	# ./getdelays -d -p 10
101c3123552SMauro Carvalho Chehab	(output similar to next case)
102c3123552SMauro Carvalho Chehab
103c3123552SMauro Carvalho ChehabGet sum of delays, since system boot, for all pids with tgid 5::
104c3123552SMauro Carvalho Chehab
105ec710aa8Swangyong	# ./getdelays -d -t 5
106ec710aa8Swangyong	print delayacct stats ON
107ec710aa8Swangyong	TGID	5
108c3123552SMauro Carvalho Chehab
109c3123552SMauro Carvalho Chehab
110ec710aa8Swangyong	CPU             count     real total  virtual total    delay total  delay average
111ec710aa8Swangyong	                    8        7000000        6872122        3382277          0.423ms
112ec710aa8Swangyong	IO              count    delay total  delay average
113eca7de7cSWang Yong                   0              0          0.000ms
114ec710aa8Swangyong	SWAP            count    delay total  delay average
115eca7de7cSWang Yong                       0              0          0.000ms
116ec710aa8Swangyong	RECLAIM         count    delay total  delay average
117eca7de7cSWang Yong                   0              0          0.000ms
118ec710aa8Swangyong	THRASHING       count    delay total  delay average
119eca7de7cSWang Yong                       0              0          0.000ms
120ec710aa8Swangyong	COMPACT         count    delay total  delay average
121eca7de7cSWang Yong                       0              0          0.000ms
122662ce1dcSYang Yang	WPCOPY          count    delay total  delay average
123eca7de7cSWang Yong                       0              0          0.000ms
124*a3b2aeacSYang Yang	IRQ             count    delay total  delay average
125*a3b2aeacSYang Yang                       0              0          0.000ms
126c3123552SMauro Carvalho Chehab
127ec710aa8SwangyongGet IO accounting for pid 1, it works only with -p::
128c3123552SMauro Carvalho Chehab
129ec710aa8Swangyong	# ./getdelays -i -p 1
130ec710aa8Swangyong	printing IO accounting
131ec710aa8Swangyong	linuxrc: read=65536, write=0, cancelled_write=0
132c3123552SMauro Carvalho Chehab
133ec710aa8SwangyongThe above command can be used with -v to get more debug information.
134