xref: /openbmc/linux/Documentation/admin-guide/cgroup-v1/blkio-controller.rst (revision 9a87ffc99ec8eb8d35eed7c4f816d75f5cc9662e)
1da82c92fSMauro Carvalho Chehab===================
2da82c92fSMauro Carvalho ChehabBlock IO Controller
3da82c92fSMauro Carvalho Chehab===================
4da82c92fSMauro Carvalho Chehab
5da82c92fSMauro Carvalho ChehabOverview
6da82c92fSMauro Carvalho Chehab========
7da82c92fSMauro Carvalho Chehabcgroup subsys "blkio" implements the block io controller. There seems to be
8da82c92fSMauro Carvalho Chehaba need of various kinds of IO control policies (like proportional BW, max BW)
9da82c92fSMauro Carvalho Chehabboth at leaf nodes as well as at intermediate nodes in a storage hierarchy.
10da82c92fSMauro Carvalho ChehabPlan is to use the same cgroup based management interface for blkio controller
11da82c92fSMauro Carvalho Chehaband based on user options switch IO policies in the background.
12da82c92fSMauro Carvalho Chehab
13da82c92fSMauro Carvalho ChehabOne IO control policy is throttling policy which can be used to
14da82c92fSMauro Carvalho Chehabspecify upper IO rate limits on devices. This policy is implemented in
15da82c92fSMauro Carvalho Chehabgeneric block layer and can be used on leaf nodes as well as higher
16da82c92fSMauro Carvalho Chehablevel logical devices like device mapper.
17da82c92fSMauro Carvalho Chehab
18da82c92fSMauro Carvalho ChehabHOWTO
19da82c92fSMauro Carvalho Chehab=====
2037fe4038SKir Kolyshkin
21da82c92fSMauro Carvalho ChehabThrottling/Upper Limit policy
22da82c92fSMauro Carvalho Chehab-----------------------------
2337fe4038SKir KolyshkinEnable Block IO controller::
24da82c92fSMauro Carvalho Chehab
25da82c92fSMauro Carvalho Chehab	CONFIG_BLK_CGROUP=y
26da82c92fSMauro Carvalho Chehab
2737fe4038SKir KolyshkinEnable throttling in block layer::
28da82c92fSMauro Carvalho Chehab
29da82c92fSMauro Carvalho Chehab	CONFIG_BLK_DEV_THROTTLING=y
30da82c92fSMauro Carvalho Chehab
3137fe4038SKir KolyshkinMount blkio controller (see cgroups.txt, Why are cgroups needed?)::
32da82c92fSMauro Carvalho Chehab
33da82c92fSMauro Carvalho Chehab        mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
34da82c92fSMauro Carvalho Chehab
3537fe4038SKir KolyshkinSpecify a bandwidth rate on particular device for root group. The format
36da82c92fSMauro Carvalho Chehabfor policy is "<major>:<minor>  <bytes_per_second>"::
37da82c92fSMauro Carvalho Chehab
38da82c92fSMauro Carvalho Chehab        echo "8:16  1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
39da82c92fSMauro Carvalho Chehab
4037fe4038SKir KolyshkinThis will put a limit of 1MB/second on reads happening for root group
41da82c92fSMauro Carvalho Chehabon device having major/minor number 8:16.
42da82c92fSMauro Carvalho Chehab
4337fe4038SKir KolyshkinRun dd to read a file and see if rate is throttled to 1MB/s or not::
44da82c92fSMauro Carvalho Chehab
45da82c92fSMauro Carvalho Chehab        # dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
46da82c92fSMauro Carvalho Chehab        1024+0 records in
47da82c92fSMauro Carvalho Chehab        1024+0 records out
48da82c92fSMauro Carvalho Chehab        4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
49da82c92fSMauro Carvalho Chehab
50da82c92fSMauro Carvalho ChehabLimits for writes can be put using blkio.throttle.write_bps_device file.
51da82c92fSMauro Carvalho Chehab
52da82c92fSMauro Carvalho ChehabHierarchical Cgroups
53da82c92fSMauro Carvalho Chehab====================
54da82c92fSMauro Carvalho Chehab
55da82c92fSMauro Carvalho ChehabThrottling implements hierarchy support; however,
56da82c92fSMauro Carvalho Chehabthrottling's hierarchy support is enabled iff "sane_behavior" is
57da82c92fSMauro Carvalho Chehabenabled from cgroup side, which currently is a development option and
58da82c92fSMauro Carvalho Chehabnot publicly available.
59da82c92fSMauro Carvalho Chehab
60da82c92fSMauro Carvalho ChehabIf somebody created a hierarchy like as follows::
61da82c92fSMauro Carvalho Chehab
62da82c92fSMauro Carvalho Chehab			root
63da82c92fSMauro Carvalho Chehab			/  \
64da82c92fSMauro Carvalho Chehab		     test1 test2
65da82c92fSMauro Carvalho Chehab			|
66da82c92fSMauro Carvalho Chehab		     test3
67da82c92fSMauro Carvalho Chehab
68da82c92fSMauro Carvalho ChehabThrottling with "sane_behavior" will handle the
69da82c92fSMauro Carvalho Chehabhierarchy correctly. For throttling, all limits apply
70da82c92fSMauro Carvalho Chehabto the whole subtree while all statistics are local to the IOs
71da82c92fSMauro Carvalho Chehabdirectly generated by tasks in that cgroup.
72da82c92fSMauro Carvalho Chehab
73da82c92fSMauro Carvalho ChehabThrottling without "sane_behavior" enabled from cgroup side will
74da82c92fSMauro Carvalho Chehabpractically treat all groups at same level as if it looks like the
75da82c92fSMauro Carvalho Chehabfollowing::
76da82c92fSMauro Carvalho Chehab
77da82c92fSMauro Carvalho Chehab				pivot
78da82c92fSMauro Carvalho Chehab			     /  /   \  \
79da82c92fSMauro Carvalho Chehab			root  test1 test2  test3
80da82c92fSMauro Carvalho Chehab
81da82c92fSMauro Carvalho ChehabVarious user visible config options
82da82c92fSMauro Carvalho Chehab===================================
8337fe4038SKir Kolyshkin
84da82c92fSMauro Carvalho Chehab  CONFIG_BLK_CGROUP
8537fe4038SKir Kolyshkin	  Block IO controller.
86da82c92fSMauro Carvalho Chehab
87da82c92fSMauro Carvalho Chehab  CONFIG_BFQ_CGROUP_DEBUG
8837fe4038SKir Kolyshkin	  Debug help. Right now some additional stats file show up in cgroup
89da82c92fSMauro Carvalho Chehab	  if this option is enabled.
90da82c92fSMauro Carvalho Chehab
91da82c92fSMauro Carvalho Chehab  CONFIG_BLK_DEV_THROTTLING
9237fe4038SKir Kolyshkin	  Enable block device throttling support in block layer.
93da82c92fSMauro Carvalho Chehab
94da82c92fSMauro Carvalho ChehabDetails of cgroup files
95da82c92fSMauro Carvalho Chehab=======================
9637fe4038SKir Kolyshkin
97da82c92fSMauro Carvalho ChehabProportional weight policy files
98da82c92fSMauro Carvalho Chehab--------------------------------
9937fe4038SKir Kolyshkin
10082861595SKir Kolyshkin  blkio.bfq.weight
10137fe4038SKir Kolyshkin	  Specifies per cgroup weight. This is default weight of the group
10282861595SKir Kolyshkin	  on all the devices until and unless overridden by per device rule
10382861595SKir Kolyshkin	  (see `blkio.bfq.weight_device` below).
104da82c92fSMauro Carvalho Chehab
10582861595SKir Kolyshkin	  Currently allowed range of weights is from 1 to 1000. For more details,
10682861595SKir Kolyshkin          see Documentation/block/bfq-iosched.rst.
10782861595SKir Kolyshkin
10882861595SKir Kolyshkin  blkio.bfq.weight_device
109*dbeb56feSRandy Dunlap          Specifies per cgroup per device weights, overriding the default group
11082861595SKir Kolyshkin          weight. For more details, see Documentation/block/bfq-iosched.rst.
111da82c92fSMauro Carvalho Chehab
112da82c92fSMauro Carvalho Chehab	  Following is the format::
113da82c92fSMauro Carvalho Chehab
11482861595SKir Kolyshkin	    # echo dev_maj:dev_minor weight > blkio.bfq.weight_device
115da82c92fSMauro Carvalho Chehab
116da82c92fSMauro Carvalho Chehab	  Configure weight=300 on /dev/sdb (8:16) in this cgroup::
117da82c92fSMauro Carvalho Chehab
11882861595SKir Kolyshkin	    # echo 8:16 300 > blkio.bfq.weight_device
11982861595SKir Kolyshkin	    # cat blkio.bfq.weight_device
120da82c92fSMauro Carvalho Chehab	    dev     weight
121da82c92fSMauro Carvalho Chehab	    8:16    300
122da82c92fSMauro Carvalho Chehab
123da82c92fSMauro Carvalho Chehab	  Configure weight=500 on /dev/sda (8:0) in this cgroup::
124da82c92fSMauro Carvalho Chehab
12582861595SKir Kolyshkin	    # echo 8:0 500 > blkio.bfq.weight_device
12682861595SKir Kolyshkin	    # cat blkio.bfq.weight_device
127da82c92fSMauro Carvalho Chehab	    dev     weight
128da82c92fSMauro Carvalho Chehab	    8:0     500
129da82c92fSMauro Carvalho Chehab	    8:16    300
130da82c92fSMauro Carvalho Chehab
131da82c92fSMauro Carvalho Chehab	  Remove specific weight for /dev/sda in this cgroup::
132da82c92fSMauro Carvalho Chehab
13382861595SKir Kolyshkin	    # echo 8:0 0 > blkio.bfq.weight_device
13482861595SKir Kolyshkin	    # cat blkio.bfq.weight_device
135da82c92fSMauro Carvalho Chehab	    dev     weight
136da82c92fSMauro Carvalho Chehab	    8:16    300
137da82c92fSMauro Carvalho Chehab
13837fe4038SKir Kolyshkin  blkio.time
13937fe4038SKir Kolyshkin	  Disk time allocated to cgroup per device in milliseconds. First
140da82c92fSMauro Carvalho Chehab	  two fields specify the major and minor number of the device and
141da82c92fSMauro Carvalho Chehab	  third field specifies the disk time allocated to group in
142da82c92fSMauro Carvalho Chehab	  milliseconds.
143da82c92fSMauro Carvalho Chehab
14437fe4038SKir Kolyshkin  blkio.sectors
14537fe4038SKir Kolyshkin	  Number of sectors transferred to/from disk by the group. First
146da82c92fSMauro Carvalho Chehab	  two fields specify the major and minor number of the device and
147da82c92fSMauro Carvalho Chehab	  third field specifies the number of sectors transferred by the
148da82c92fSMauro Carvalho Chehab	  group to/from the device.
149da82c92fSMauro Carvalho Chehab
15037fe4038SKir Kolyshkin  blkio.io_service_bytes
15137fe4038SKir Kolyshkin	  Number of bytes transferred to/from the disk by the group. These
152da82c92fSMauro Carvalho Chehab	  are further divided by the type of operation - read or write, sync
153da82c92fSMauro Carvalho Chehab	  or async. First two fields specify the major and minor number of the
154da82c92fSMauro Carvalho Chehab	  device, third field specifies the operation type and the fourth field
155da82c92fSMauro Carvalho Chehab	  specifies the number of bytes.
156da82c92fSMauro Carvalho Chehab
15737fe4038SKir Kolyshkin  blkio.io_serviced
15837fe4038SKir Kolyshkin	  Number of IOs (bio) issued to the disk by the group. These
159da82c92fSMauro Carvalho Chehab	  are further divided by the type of operation - read or write, sync
160da82c92fSMauro Carvalho Chehab	  or async. First two fields specify the major and minor number of the
161da82c92fSMauro Carvalho Chehab	  device, third field specifies the operation type and the fourth field
162da82c92fSMauro Carvalho Chehab	  specifies the number of IOs.
163da82c92fSMauro Carvalho Chehab
16437fe4038SKir Kolyshkin  blkio.io_service_time
16537fe4038SKir Kolyshkin	  Total amount of time between request dispatch and request completion
166da82c92fSMauro Carvalho Chehab	  for the IOs done by this cgroup. This is in nanoseconds to make it
167da82c92fSMauro Carvalho Chehab	  meaningful for flash devices too. For devices with queue depth of 1,
168da82c92fSMauro Carvalho Chehab	  this time represents the actual service time. When queue_depth > 1,
169da82c92fSMauro Carvalho Chehab	  that is no longer true as requests may be served out of order. This
170da82c92fSMauro Carvalho Chehab	  may cause the service time for a given IO to include the service time
171da82c92fSMauro Carvalho Chehab	  of multiple IOs when served out of order which may result in total
172da82c92fSMauro Carvalho Chehab	  io_service_time > actual time elapsed. This time is further divided by
173da82c92fSMauro Carvalho Chehab	  the type of operation - read or write, sync or async. First two fields
174da82c92fSMauro Carvalho Chehab	  specify the major and minor number of the device, third field
175da82c92fSMauro Carvalho Chehab	  specifies the operation type and the fourth field specifies the
176da82c92fSMauro Carvalho Chehab	  io_service_time in ns.
177da82c92fSMauro Carvalho Chehab
17837fe4038SKir Kolyshkin  blkio.io_wait_time
17937fe4038SKir Kolyshkin	  Total amount of time the IOs for this cgroup spent waiting in the
180da82c92fSMauro Carvalho Chehab	  scheduler queues for service. This can be greater than the total time
181da82c92fSMauro Carvalho Chehab	  elapsed since it is cumulative io_wait_time for all IOs. It is not a
182da82c92fSMauro Carvalho Chehab	  measure of total time the cgroup spent waiting but rather a measure of
183da82c92fSMauro Carvalho Chehab	  the wait_time for its individual IOs. For devices with queue_depth > 1
184da82c92fSMauro Carvalho Chehab	  this metric does not include the time spent waiting for service once
185da82c92fSMauro Carvalho Chehab	  the IO is dispatched to the device but till it actually gets serviced
186da82c92fSMauro Carvalho Chehab	  (there might be a time lag here due to re-ordering of requests by the
187da82c92fSMauro Carvalho Chehab	  device). This is in nanoseconds to make it meaningful for flash
188da82c92fSMauro Carvalho Chehab	  devices too. This time is further divided by the type of operation -
189da82c92fSMauro Carvalho Chehab	  read or write, sync or async. First two fields specify the major and
190da82c92fSMauro Carvalho Chehab	  minor number of the device, third field specifies the operation type
191da82c92fSMauro Carvalho Chehab	  and the fourth field specifies the io_wait_time in ns.
192da82c92fSMauro Carvalho Chehab
19337fe4038SKir Kolyshkin  blkio.io_merged
19437fe4038SKir Kolyshkin	  Total number of bios/requests merged into requests belonging to this
195da82c92fSMauro Carvalho Chehab	  cgroup. This is further divided by the type of operation - read or
196da82c92fSMauro Carvalho Chehab	  write, sync or async.
197da82c92fSMauro Carvalho Chehab
19837fe4038SKir Kolyshkin  blkio.io_queued
19937fe4038SKir Kolyshkin	  Total number of requests queued up at any given instant for this
200da82c92fSMauro Carvalho Chehab	  cgroup. This is further divided by the type of operation - read or
201da82c92fSMauro Carvalho Chehab	  write, sync or async.
202da82c92fSMauro Carvalho Chehab
20337fe4038SKir Kolyshkin  blkio.avg_queue_size
20437fe4038SKir Kolyshkin	  Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
205da82c92fSMauro Carvalho Chehab	  The average queue size for this cgroup over the entire time of this
206da82c92fSMauro Carvalho Chehab	  cgroup's existence. Queue size samples are taken each time one of the
207da82c92fSMauro Carvalho Chehab	  queues of this cgroup gets a timeslice.
208da82c92fSMauro Carvalho Chehab
20937fe4038SKir Kolyshkin  blkio.group_wait_time
21037fe4038SKir Kolyshkin	  Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
211da82c92fSMauro Carvalho Chehab	  This is the amount of time the cgroup had to wait since it became busy
212da82c92fSMauro Carvalho Chehab	  (i.e., went from 0 to 1 request queued) to get a timeslice for one of
213da82c92fSMauro Carvalho Chehab	  its queues. This is different from the io_wait_time which is the
214da82c92fSMauro Carvalho Chehab	  cumulative total of the amount of time spent by each IO in that cgroup
215da82c92fSMauro Carvalho Chehab	  waiting in the scheduler queue. This is in nanoseconds. If this is
216da82c92fSMauro Carvalho Chehab	  read when the cgroup is in a waiting (for timeslice) state, the stat
217da82c92fSMauro Carvalho Chehab	  will only report the group_wait_time accumulated till the last time it
218da82c92fSMauro Carvalho Chehab	  got a timeslice and will not include the current delta.
219da82c92fSMauro Carvalho Chehab
22037fe4038SKir Kolyshkin  blkio.empty_time
22137fe4038SKir Kolyshkin	  Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
222da82c92fSMauro Carvalho Chehab	  This is the amount of time a cgroup spends without any pending
223da82c92fSMauro Carvalho Chehab	  requests when not being served, i.e., it does not include any time
224da82c92fSMauro Carvalho Chehab	  spent idling for one of the queues of the cgroup. This is in
225da82c92fSMauro Carvalho Chehab	  nanoseconds. If this is read when the cgroup is in an empty state,
226da82c92fSMauro Carvalho Chehab	  the stat will only report the empty_time accumulated till the last
227da82c92fSMauro Carvalho Chehab	  time it had a pending request and will not include the current delta.
228da82c92fSMauro Carvalho Chehab
22937fe4038SKir Kolyshkin  blkio.idle_time
23037fe4038SKir Kolyshkin	  Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
231da82c92fSMauro Carvalho Chehab	  This is the amount of time spent by the IO scheduler idling for a
232da82c92fSMauro Carvalho Chehab	  given cgroup in anticipation of a better request than the existing ones
233da82c92fSMauro Carvalho Chehab	  from other queues/cgroups. This is in nanoseconds. If this is read
234da82c92fSMauro Carvalho Chehab	  when the cgroup is in an idling state, the stat will only report the
235da82c92fSMauro Carvalho Chehab	  idle_time accumulated till the last idle period and will not include
236da82c92fSMauro Carvalho Chehab	  the current delta.
237da82c92fSMauro Carvalho Chehab
23837fe4038SKir Kolyshkin  blkio.dequeue
23937fe4038SKir Kolyshkin	  Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
240da82c92fSMauro Carvalho Chehab	  gives the statistics about how many a times a group was dequeued
241da82c92fSMauro Carvalho Chehab	  from service tree of the device. First two fields specify the major
242da82c92fSMauro Carvalho Chehab	  and minor number of the device and third field specifies the number
243da82c92fSMauro Carvalho Chehab	  of times a group was dequeued from a particular device.
244da82c92fSMauro Carvalho Chehab
24537fe4038SKir Kolyshkin  blkio.*_recursive
24637fe4038SKir Kolyshkin	  Recursive version of various stats. These files show the
247da82c92fSMauro Carvalho Chehab          same information as their non-recursive counterparts but
248da82c92fSMauro Carvalho Chehab          include stats from all the descendant cgroups.
249da82c92fSMauro Carvalho Chehab
250da82c92fSMauro Carvalho ChehabThrottling/Upper limit policy files
251da82c92fSMauro Carvalho Chehab-----------------------------------
25237fe4038SKir Kolyshkin  blkio.throttle.read_bps_device
25337fe4038SKir Kolyshkin	  Specifies upper limit on READ rate from the device. IO rate is
254da82c92fSMauro Carvalho Chehab	  specified in bytes per second. Rules are per device. Following is
255da82c92fSMauro Carvalho Chehab	  the format::
256da82c92fSMauro Carvalho Chehab
257da82c92fSMauro Carvalho Chehab	    echo "<major>:<minor>  <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device
258da82c92fSMauro Carvalho Chehab
25937fe4038SKir Kolyshkin  blkio.throttle.write_bps_device
26037fe4038SKir Kolyshkin	  Specifies upper limit on WRITE rate to the device. IO rate is
261da82c92fSMauro Carvalho Chehab	  specified in bytes per second. Rules are per device. Following is
262da82c92fSMauro Carvalho Chehab	  the format::
263da82c92fSMauro Carvalho Chehab
264da82c92fSMauro Carvalho Chehab	    echo "<major>:<minor>  <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device
265da82c92fSMauro Carvalho Chehab
26637fe4038SKir Kolyshkin  blkio.throttle.read_iops_device
26737fe4038SKir Kolyshkin	  Specifies upper limit on READ rate from the device. IO rate is
268da82c92fSMauro Carvalho Chehab	  specified in IO per second. Rules are per device. Following is
269da82c92fSMauro Carvalho Chehab	  the format::
270da82c92fSMauro Carvalho Chehab
271da82c92fSMauro Carvalho Chehab	   echo "<major>:<minor>  <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device
272da82c92fSMauro Carvalho Chehab
27337fe4038SKir Kolyshkin  blkio.throttle.write_iops_device
27437fe4038SKir Kolyshkin	  Specifies upper limit on WRITE rate to the device. IO rate is
275da82c92fSMauro Carvalho Chehab	  specified in io per second. Rules are per device. Following is
276da82c92fSMauro Carvalho Chehab	  the format::
277da82c92fSMauro Carvalho Chehab
278da82c92fSMauro Carvalho Chehab	    echo "<major>:<minor>  <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device
279da82c92fSMauro Carvalho Chehab
280da82c92fSMauro Carvalho Chehab          Note: If both BW and IOPS rules are specified for a device, then IO is
281da82c92fSMauro Carvalho Chehab          subjected to both the constraints.
282da82c92fSMauro Carvalho Chehab
28337fe4038SKir Kolyshkin  blkio.throttle.io_serviced
28437fe4038SKir Kolyshkin	  Number of IOs (bio) issued to the disk by the group. These
285da82c92fSMauro Carvalho Chehab	  are further divided by the type of operation - read or write, sync
286da82c92fSMauro Carvalho Chehab	  or async. First two fields specify the major and minor number of the
287da82c92fSMauro Carvalho Chehab	  device, third field specifies the operation type and the fourth field
288da82c92fSMauro Carvalho Chehab	  specifies the number of IOs.
289da82c92fSMauro Carvalho Chehab
29037fe4038SKir Kolyshkin  blkio.throttle.io_service_bytes
29137fe4038SKir Kolyshkin	  Number of bytes transferred to/from the disk by the group. These
292da82c92fSMauro Carvalho Chehab	  are further divided by the type of operation - read or write, sync
293da82c92fSMauro Carvalho Chehab	  or async. First two fields specify the major and minor number of the
294da82c92fSMauro Carvalho Chehab	  device, third field specifies the operation type and the fourth field
295da82c92fSMauro Carvalho Chehab	  specifies the number of bytes.
296da82c92fSMauro Carvalho Chehab
297da82c92fSMauro Carvalho ChehabCommon files among various policies
298da82c92fSMauro Carvalho Chehab-----------------------------------
29937fe4038SKir Kolyshkin  blkio.reset_stats
30037fe4038SKir Kolyshkin	  Writing an int to this file will result in resetting all the stats
301da82c92fSMauro Carvalho Chehab	  for that cgroup.
302