1da82c92fSMauro Carvalho Chehab=================== 2da82c92fSMauro Carvalho ChehabBlock IO Controller 3da82c92fSMauro Carvalho Chehab=================== 4da82c92fSMauro Carvalho Chehab 5da82c92fSMauro Carvalho ChehabOverview 6da82c92fSMauro Carvalho Chehab======== 7da82c92fSMauro Carvalho Chehabcgroup subsys "blkio" implements the block io controller. There seems to be 8da82c92fSMauro Carvalho Chehaba need of various kinds of IO control policies (like proportional BW, max BW) 9da82c92fSMauro Carvalho Chehabboth at leaf nodes as well as at intermediate nodes in a storage hierarchy. 10da82c92fSMauro Carvalho ChehabPlan is to use the same cgroup based management interface for blkio controller 11da82c92fSMauro Carvalho Chehaband based on user options switch IO policies in the background. 12da82c92fSMauro Carvalho Chehab 13da82c92fSMauro Carvalho ChehabOne IO control policy is throttling policy which can be used to 14da82c92fSMauro Carvalho Chehabspecify upper IO rate limits on devices. This policy is implemented in 15da82c92fSMauro Carvalho Chehabgeneric block layer and can be used on leaf nodes as well as higher 16da82c92fSMauro Carvalho Chehablevel logical devices like device mapper. 17da82c92fSMauro Carvalho Chehab 18da82c92fSMauro Carvalho ChehabHOWTO 19da82c92fSMauro Carvalho Chehab===== 2037fe4038SKir Kolyshkin 21da82c92fSMauro Carvalho ChehabThrottling/Upper Limit policy 22da82c92fSMauro Carvalho Chehab----------------------------- 2337fe4038SKir KolyshkinEnable Block IO controller:: 24da82c92fSMauro Carvalho Chehab 25da82c92fSMauro Carvalho Chehab CONFIG_BLK_CGROUP=y 26da82c92fSMauro Carvalho Chehab 2737fe4038SKir KolyshkinEnable throttling in block layer:: 28da82c92fSMauro Carvalho Chehab 29da82c92fSMauro Carvalho Chehab CONFIG_BLK_DEV_THROTTLING=y 30da82c92fSMauro Carvalho Chehab 3137fe4038SKir KolyshkinMount blkio controller (see cgroups.txt, Why are cgroups needed?):: 32da82c92fSMauro Carvalho Chehab 33da82c92fSMauro Carvalho Chehab mount -t cgroup -o blkio none /sys/fs/cgroup/blkio 34da82c92fSMauro Carvalho Chehab 3537fe4038SKir KolyshkinSpecify a bandwidth rate on particular device for root group. The format 36da82c92fSMauro Carvalho Chehabfor policy is "<major>:<minor> <bytes_per_second>":: 37da82c92fSMauro Carvalho Chehab 38da82c92fSMauro Carvalho Chehab echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device 39da82c92fSMauro Carvalho Chehab 4037fe4038SKir KolyshkinThis will put a limit of 1MB/second on reads happening for root group 41da82c92fSMauro Carvalho Chehabon device having major/minor number 8:16. 42da82c92fSMauro Carvalho Chehab 4337fe4038SKir KolyshkinRun dd to read a file and see if rate is throttled to 1MB/s or not:: 44da82c92fSMauro Carvalho Chehab 45da82c92fSMauro Carvalho Chehab # dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024 46da82c92fSMauro Carvalho Chehab 1024+0 records in 47da82c92fSMauro Carvalho Chehab 1024+0 records out 48da82c92fSMauro Carvalho Chehab 4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s 49da82c92fSMauro Carvalho Chehab 50da82c92fSMauro Carvalho ChehabLimits for writes can be put using blkio.throttle.write_bps_device file. 51da82c92fSMauro Carvalho Chehab 52da82c92fSMauro Carvalho ChehabHierarchical Cgroups 53da82c92fSMauro Carvalho Chehab==================== 54da82c92fSMauro Carvalho Chehab 55da82c92fSMauro Carvalho ChehabThrottling implements hierarchy support; however, 56da82c92fSMauro Carvalho Chehabthrottling's hierarchy support is enabled iff "sane_behavior" is 57da82c92fSMauro Carvalho Chehabenabled from cgroup side, which currently is a development option and 58da82c92fSMauro Carvalho Chehabnot publicly available. 59da82c92fSMauro Carvalho Chehab 60da82c92fSMauro Carvalho ChehabIf somebody created a hierarchy like as follows:: 61da82c92fSMauro Carvalho Chehab 62da82c92fSMauro Carvalho Chehab root 63da82c92fSMauro Carvalho Chehab / \ 64da82c92fSMauro Carvalho Chehab test1 test2 65da82c92fSMauro Carvalho Chehab | 66da82c92fSMauro Carvalho Chehab test3 67da82c92fSMauro Carvalho Chehab 68da82c92fSMauro Carvalho ChehabThrottling with "sane_behavior" will handle the 69da82c92fSMauro Carvalho Chehabhierarchy correctly. For throttling, all limits apply 70da82c92fSMauro Carvalho Chehabto the whole subtree while all statistics are local to the IOs 71da82c92fSMauro Carvalho Chehabdirectly generated by tasks in that cgroup. 72da82c92fSMauro Carvalho Chehab 73da82c92fSMauro Carvalho ChehabThrottling without "sane_behavior" enabled from cgroup side will 74da82c92fSMauro Carvalho Chehabpractically treat all groups at same level as if it looks like the 75da82c92fSMauro Carvalho Chehabfollowing:: 76da82c92fSMauro Carvalho Chehab 77da82c92fSMauro Carvalho Chehab pivot 78da82c92fSMauro Carvalho Chehab / / \ \ 79da82c92fSMauro Carvalho Chehab root test1 test2 test3 80da82c92fSMauro Carvalho Chehab 81da82c92fSMauro Carvalho ChehabVarious user visible config options 82da82c92fSMauro Carvalho Chehab=================================== 8337fe4038SKir Kolyshkin 84da82c92fSMauro Carvalho Chehab CONFIG_BLK_CGROUP 8537fe4038SKir Kolyshkin Block IO controller. 86da82c92fSMauro Carvalho Chehab 87da82c92fSMauro Carvalho Chehab CONFIG_BFQ_CGROUP_DEBUG 8837fe4038SKir Kolyshkin Debug help. Right now some additional stats file show up in cgroup 89da82c92fSMauro Carvalho Chehab if this option is enabled. 90da82c92fSMauro Carvalho Chehab 91da82c92fSMauro Carvalho Chehab CONFIG_BLK_DEV_THROTTLING 9237fe4038SKir Kolyshkin Enable block device throttling support in block layer. 93da82c92fSMauro Carvalho Chehab 94da82c92fSMauro Carvalho ChehabDetails of cgroup files 95da82c92fSMauro Carvalho Chehab======================= 9637fe4038SKir Kolyshkin 97da82c92fSMauro Carvalho ChehabProportional weight policy files 98da82c92fSMauro Carvalho Chehab-------------------------------- 9937fe4038SKir Kolyshkin 10082861595SKir Kolyshkin blkio.bfq.weight 10137fe4038SKir Kolyshkin Specifies per cgroup weight. This is default weight of the group 10282861595SKir Kolyshkin on all the devices until and unless overridden by per device rule 10382861595SKir Kolyshkin (see `blkio.bfq.weight_device` below). 104da82c92fSMauro Carvalho Chehab 10582861595SKir Kolyshkin Currently allowed range of weights is from 1 to 1000. For more details, 10682861595SKir Kolyshkin see Documentation/block/bfq-iosched.rst. 10782861595SKir Kolyshkin 10882861595SKir Kolyshkin blkio.bfq.weight_device 109*dbeb56feSRandy Dunlap Specifies per cgroup per device weights, overriding the default group 11082861595SKir Kolyshkin weight. For more details, see Documentation/block/bfq-iosched.rst. 111da82c92fSMauro Carvalho Chehab 112da82c92fSMauro Carvalho Chehab Following is the format:: 113da82c92fSMauro Carvalho Chehab 11482861595SKir Kolyshkin # echo dev_maj:dev_minor weight > blkio.bfq.weight_device 115da82c92fSMauro Carvalho Chehab 116da82c92fSMauro Carvalho Chehab Configure weight=300 on /dev/sdb (8:16) in this cgroup:: 117da82c92fSMauro Carvalho Chehab 11882861595SKir Kolyshkin # echo 8:16 300 > blkio.bfq.weight_device 11982861595SKir Kolyshkin # cat blkio.bfq.weight_device 120da82c92fSMauro Carvalho Chehab dev weight 121da82c92fSMauro Carvalho Chehab 8:16 300 122da82c92fSMauro Carvalho Chehab 123da82c92fSMauro Carvalho Chehab Configure weight=500 on /dev/sda (8:0) in this cgroup:: 124da82c92fSMauro Carvalho Chehab 12582861595SKir Kolyshkin # echo 8:0 500 > blkio.bfq.weight_device 12682861595SKir Kolyshkin # cat blkio.bfq.weight_device 127da82c92fSMauro Carvalho Chehab dev weight 128da82c92fSMauro Carvalho Chehab 8:0 500 129da82c92fSMauro Carvalho Chehab 8:16 300 130da82c92fSMauro Carvalho Chehab 131da82c92fSMauro Carvalho Chehab Remove specific weight for /dev/sda in this cgroup:: 132da82c92fSMauro Carvalho Chehab 13382861595SKir Kolyshkin # echo 8:0 0 > blkio.bfq.weight_device 13482861595SKir Kolyshkin # cat blkio.bfq.weight_device 135da82c92fSMauro Carvalho Chehab dev weight 136da82c92fSMauro Carvalho Chehab 8:16 300 137da82c92fSMauro Carvalho Chehab 13837fe4038SKir Kolyshkin blkio.time 13937fe4038SKir Kolyshkin Disk time allocated to cgroup per device in milliseconds. First 140da82c92fSMauro Carvalho Chehab two fields specify the major and minor number of the device and 141da82c92fSMauro Carvalho Chehab third field specifies the disk time allocated to group in 142da82c92fSMauro Carvalho Chehab milliseconds. 143da82c92fSMauro Carvalho Chehab 14437fe4038SKir Kolyshkin blkio.sectors 14537fe4038SKir Kolyshkin Number of sectors transferred to/from disk by the group. First 146da82c92fSMauro Carvalho Chehab two fields specify the major and minor number of the device and 147da82c92fSMauro Carvalho Chehab third field specifies the number of sectors transferred by the 148da82c92fSMauro Carvalho Chehab group to/from the device. 149da82c92fSMauro Carvalho Chehab 15037fe4038SKir Kolyshkin blkio.io_service_bytes 15137fe4038SKir Kolyshkin Number of bytes transferred to/from the disk by the group. These 152da82c92fSMauro Carvalho Chehab are further divided by the type of operation - read or write, sync 153da82c92fSMauro Carvalho Chehab or async. First two fields specify the major and minor number of the 154da82c92fSMauro Carvalho Chehab device, third field specifies the operation type and the fourth field 155da82c92fSMauro Carvalho Chehab specifies the number of bytes. 156da82c92fSMauro Carvalho Chehab 15737fe4038SKir Kolyshkin blkio.io_serviced 15837fe4038SKir Kolyshkin Number of IOs (bio) issued to the disk by the group. These 159da82c92fSMauro Carvalho Chehab are further divided by the type of operation - read or write, sync 160da82c92fSMauro Carvalho Chehab or async. First two fields specify the major and minor number of the 161da82c92fSMauro Carvalho Chehab device, third field specifies the operation type and the fourth field 162da82c92fSMauro Carvalho Chehab specifies the number of IOs. 163da82c92fSMauro Carvalho Chehab 16437fe4038SKir Kolyshkin blkio.io_service_time 16537fe4038SKir Kolyshkin Total amount of time between request dispatch and request completion 166da82c92fSMauro Carvalho Chehab for the IOs done by this cgroup. This is in nanoseconds to make it 167da82c92fSMauro Carvalho Chehab meaningful for flash devices too. For devices with queue depth of 1, 168da82c92fSMauro Carvalho Chehab this time represents the actual service time. When queue_depth > 1, 169da82c92fSMauro Carvalho Chehab that is no longer true as requests may be served out of order. This 170da82c92fSMauro Carvalho Chehab may cause the service time for a given IO to include the service time 171da82c92fSMauro Carvalho Chehab of multiple IOs when served out of order which may result in total 172da82c92fSMauro Carvalho Chehab io_service_time > actual time elapsed. This time is further divided by 173da82c92fSMauro Carvalho Chehab the type of operation - read or write, sync or async. First two fields 174da82c92fSMauro Carvalho Chehab specify the major and minor number of the device, third field 175da82c92fSMauro Carvalho Chehab specifies the operation type and the fourth field specifies the 176da82c92fSMauro Carvalho Chehab io_service_time in ns. 177da82c92fSMauro Carvalho Chehab 17837fe4038SKir Kolyshkin blkio.io_wait_time 17937fe4038SKir Kolyshkin Total amount of time the IOs for this cgroup spent waiting in the 180da82c92fSMauro Carvalho Chehab scheduler queues for service. This can be greater than the total time 181da82c92fSMauro Carvalho Chehab elapsed since it is cumulative io_wait_time for all IOs. It is not a 182da82c92fSMauro Carvalho Chehab measure of total time the cgroup spent waiting but rather a measure of 183da82c92fSMauro Carvalho Chehab the wait_time for its individual IOs. For devices with queue_depth > 1 184da82c92fSMauro Carvalho Chehab this metric does not include the time spent waiting for service once 185da82c92fSMauro Carvalho Chehab the IO is dispatched to the device but till it actually gets serviced 186da82c92fSMauro Carvalho Chehab (there might be a time lag here due to re-ordering of requests by the 187da82c92fSMauro Carvalho Chehab device). This is in nanoseconds to make it meaningful for flash 188da82c92fSMauro Carvalho Chehab devices too. This time is further divided by the type of operation - 189da82c92fSMauro Carvalho Chehab read or write, sync or async. First two fields specify the major and 190da82c92fSMauro Carvalho Chehab minor number of the device, third field specifies the operation type 191da82c92fSMauro Carvalho Chehab and the fourth field specifies the io_wait_time in ns. 192da82c92fSMauro Carvalho Chehab 19337fe4038SKir Kolyshkin blkio.io_merged 19437fe4038SKir Kolyshkin Total number of bios/requests merged into requests belonging to this 195da82c92fSMauro Carvalho Chehab cgroup. This is further divided by the type of operation - read or 196da82c92fSMauro Carvalho Chehab write, sync or async. 197da82c92fSMauro Carvalho Chehab 19837fe4038SKir Kolyshkin blkio.io_queued 19937fe4038SKir Kolyshkin Total number of requests queued up at any given instant for this 200da82c92fSMauro Carvalho Chehab cgroup. This is further divided by the type of operation - read or 201da82c92fSMauro Carvalho Chehab write, sync or async. 202da82c92fSMauro Carvalho Chehab 20337fe4038SKir Kolyshkin blkio.avg_queue_size 20437fe4038SKir Kolyshkin Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. 205da82c92fSMauro Carvalho Chehab The average queue size for this cgroup over the entire time of this 206da82c92fSMauro Carvalho Chehab cgroup's existence. Queue size samples are taken each time one of the 207da82c92fSMauro Carvalho Chehab queues of this cgroup gets a timeslice. 208da82c92fSMauro Carvalho Chehab 20937fe4038SKir Kolyshkin blkio.group_wait_time 21037fe4038SKir Kolyshkin Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. 211da82c92fSMauro Carvalho Chehab This is the amount of time the cgroup had to wait since it became busy 212da82c92fSMauro Carvalho Chehab (i.e., went from 0 to 1 request queued) to get a timeslice for one of 213da82c92fSMauro Carvalho Chehab its queues. This is different from the io_wait_time which is the 214da82c92fSMauro Carvalho Chehab cumulative total of the amount of time spent by each IO in that cgroup 215da82c92fSMauro Carvalho Chehab waiting in the scheduler queue. This is in nanoseconds. If this is 216da82c92fSMauro Carvalho Chehab read when the cgroup is in a waiting (for timeslice) state, the stat 217da82c92fSMauro Carvalho Chehab will only report the group_wait_time accumulated till the last time it 218da82c92fSMauro Carvalho Chehab got a timeslice and will not include the current delta. 219da82c92fSMauro Carvalho Chehab 22037fe4038SKir Kolyshkin blkio.empty_time 22137fe4038SKir Kolyshkin Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. 222da82c92fSMauro Carvalho Chehab This is the amount of time a cgroup spends without any pending 223da82c92fSMauro Carvalho Chehab requests when not being served, i.e., it does not include any time 224da82c92fSMauro Carvalho Chehab spent idling for one of the queues of the cgroup. This is in 225da82c92fSMauro Carvalho Chehab nanoseconds. If this is read when the cgroup is in an empty state, 226da82c92fSMauro Carvalho Chehab the stat will only report the empty_time accumulated till the last 227da82c92fSMauro Carvalho Chehab time it had a pending request and will not include the current delta. 228da82c92fSMauro Carvalho Chehab 22937fe4038SKir Kolyshkin blkio.idle_time 23037fe4038SKir Kolyshkin Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. 231da82c92fSMauro Carvalho Chehab This is the amount of time spent by the IO scheduler idling for a 232da82c92fSMauro Carvalho Chehab given cgroup in anticipation of a better request than the existing ones 233da82c92fSMauro Carvalho Chehab from other queues/cgroups. This is in nanoseconds. If this is read 234da82c92fSMauro Carvalho Chehab when the cgroup is in an idling state, the stat will only report the 235da82c92fSMauro Carvalho Chehab idle_time accumulated till the last idle period and will not include 236da82c92fSMauro Carvalho Chehab the current delta. 237da82c92fSMauro Carvalho Chehab 23837fe4038SKir Kolyshkin blkio.dequeue 23937fe4038SKir Kolyshkin Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This 240da82c92fSMauro Carvalho Chehab gives the statistics about how many a times a group was dequeued 241da82c92fSMauro Carvalho Chehab from service tree of the device. First two fields specify the major 242da82c92fSMauro Carvalho Chehab and minor number of the device and third field specifies the number 243da82c92fSMauro Carvalho Chehab of times a group was dequeued from a particular device. 244da82c92fSMauro Carvalho Chehab 24537fe4038SKir Kolyshkin blkio.*_recursive 24637fe4038SKir Kolyshkin Recursive version of various stats. These files show the 247da82c92fSMauro Carvalho Chehab same information as their non-recursive counterparts but 248da82c92fSMauro Carvalho Chehab include stats from all the descendant cgroups. 249da82c92fSMauro Carvalho Chehab 250da82c92fSMauro Carvalho ChehabThrottling/Upper limit policy files 251da82c92fSMauro Carvalho Chehab----------------------------------- 25237fe4038SKir Kolyshkin blkio.throttle.read_bps_device 25337fe4038SKir Kolyshkin Specifies upper limit on READ rate from the device. IO rate is 254da82c92fSMauro Carvalho Chehab specified in bytes per second. Rules are per device. Following is 255da82c92fSMauro Carvalho Chehab the format:: 256da82c92fSMauro Carvalho Chehab 257da82c92fSMauro Carvalho Chehab echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device 258da82c92fSMauro Carvalho Chehab 25937fe4038SKir Kolyshkin blkio.throttle.write_bps_device 26037fe4038SKir Kolyshkin Specifies upper limit on WRITE rate to the device. IO rate is 261da82c92fSMauro Carvalho Chehab specified in bytes per second. Rules are per device. Following is 262da82c92fSMauro Carvalho Chehab the format:: 263da82c92fSMauro Carvalho Chehab 264da82c92fSMauro Carvalho Chehab echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device 265da82c92fSMauro Carvalho Chehab 26637fe4038SKir Kolyshkin blkio.throttle.read_iops_device 26737fe4038SKir Kolyshkin Specifies upper limit on READ rate from the device. IO rate is 268da82c92fSMauro Carvalho Chehab specified in IO per second. Rules are per device. Following is 269da82c92fSMauro Carvalho Chehab the format:: 270da82c92fSMauro Carvalho Chehab 271da82c92fSMauro Carvalho Chehab echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device 272da82c92fSMauro Carvalho Chehab 27337fe4038SKir Kolyshkin blkio.throttle.write_iops_device 27437fe4038SKir Kolyshkin Specifies upper limit on WRITE rate to the device. IO rate is 275da82c92fSMauro Carvalho Chehab specified in io per second. Rules are per device. Following is 276da82c92fSMauro Carvalho Chehab the format:: 277da82c92fSMauro Carvalho Chehab 278da82c92fSMauro Carvalho Chehab echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device 279da82c92fSMauro Carvalho Chehab 280da82c92fSMauro Carvalho Chehab Note: If both BW and IOPS rules are specified for a device, then IO is 281da82c92fSMauro Carvalho Chehab subjected to both the constraints. 282da82c92fSMauro Carvalho Chehab 28337fe4038SKir Kolyshkin blkio.throttle.io_serviced 28437fe4038SKir Kolyshkin Number of IOs (bio) issued to the disk by the group. These 285da82c92fSMauro Carvalho Chehab are further divided by the type of operation - read or write, sync 286da82c92fSMauro Carvalho Chehab or async. First two fields specify the major and minor number of the 287da82c92fSMauro Carvalho Chehab device, third field specifies the operation type and the fourth field 288da82c92fSMauro Carvalho Chehab specifies the number of IOs. 289da82c92fSMauro Carvalho Chehab 29037fe4038SKir Kolyshkin blkio.throttle.io_service_bytes 29137fe4038SKir Kolyshkin Number of bytes transferred to/from the disk by the group. These 292da82c92fSMauro Carvalho Chehab are further divided by the type of operation - read or write, sync 293da82c92fSMauro Carvalho Chehab or async. First two fields specify the major and minor number of the 294da82c92fSMauro Carvalho Chehab device, third field specifies the operation type and the fourth field 295da82c92fSMauro Carvalho Chehab specifies the number of bytes. 296da82c92fSMauro Carvalho Chehab 297da82c92fSMauro Carvalho ChehabCommon files among various policies 298da82c92fSMauro Carvalho Chehab----------------------------------- 29937fe4038SKir Kolyshkin blkio.reset_stats 30037fe4038SKir Kolyshkin Writing an int to this file will result in resetting all the stats 301da82c92fSMauro Carvalho Chehab for that cgroup. 302