1*898bd37aSMauro Carvalho Chehab============================== 2*898bd37aSMauro Carvalho ChehabDeadline IO scheduler tunables 3*898bd37aSMauro Carvalho Chehab============================== 4*898bd37aSMauro Carvalho Chehab 5*898bd37aSMauro Carvalho ChehabThis little file attempts to document how the deadline io scheduler works. 6*898bd37aSMauro Carvalho ChehabIn particular, it will clarify the meaning of the exposed tunables that may be 7*898bd37aSMauro Carvalho Chehabof interest to power users. 8*898bd37aSMauro Carvalho Chehab 9*898bd37aSMauro Carvalho ChehabSelecting IO schedulers 10*898bd37aSMauro Carvalho Chehab----------------------- 11*898bd37aSMauro Carvalho ChehabRefer to Documentation/block/switching-sched.rst for information on 12*898bd37aSMauro Carvalho Chehabselecting an io scheduler on a per-device basis. 13*898bd37aSMauro Carvalho Chehab 14*898bd37aSMauro Carvalho Chehab------------------------------------------------------------------------------ 15*898bd37aSMauro Carvalho Chehab 16*898bd37aSMauro Carvalho Chehabread_expire (in ms) 17*898bd37aSMauro Carvalho Chehab----------------------- 18*898bd37aSMauro Carvalho Chehab 19*898bd37aSMauro Carvalho ChehabThe goal of the deadline io scheduler is to attempt to guarantee a start 20*898bd37aSMauro Carvalho Chehabservice time for a request. As we focus mainly on read latencies, this is 21*898bd37aSMauro Carvalho Chehabtunable. When a read request first enters the io scheduler, it is assigned 22*898bd37aSMauro Carvalho Chehaba deadline that is the current time + the read_expire value in units of 23*898bd37aSMauro Carvalho Chehabmilliseconds. 24*898bd37aSMauro Carvalho Chehab 25*898bd37aSMauro Carvalho Chehab 26*898bd37aSMauro Carvalho Chehabwrite_expire (in ms) 27*898bd37aSMauro Carvalho Chehab----------------------- 28*898bd37aSMauro Carvalho Chehab 29*898bd37aSMauro Carvalho ChehabSimilar to read_expire mentioned above, but for writes. 30*898bd37aSMauro Carvalho Chehab 31*898bd37aSMauro Carvalho Chehab 32*898bd37aSMauro Carvalho Chehabfifo_batch (number of requests) 33*898bd37aSMauro Carvalho Chehab------------------------------------ 34*898bd37aSMauro Carvalho Chehab 35*898bd37aSMauro Carvalho ChehabRequests are grouped into ``batches`` of a particular data direction (read or 36*898bd37aSMauro Carvalho Chehabwrite) which are serviced in increasing sector order. To limit extra seeking, 37*898bd37aSMauro Carvalho Chehabdeadline expiries are only checked between batches. fifo_batch controls the 38*898bd37aSMauro Carvalho Chehabmaximum number of requests per batch. 39*898bd37aSMauro Carvalho Chehab 40*898bd37aSMauro Carvalho ChehabThis parameter tunes the balance between per-request latency and aggregate 41*898bd37aSMauro Carvalho Chehabthroughput. When low latency is the primary concern, smaller is better (where 42*898bd37aSMauro Carvalho Chehaba value of 1 yields first-come first-served behaviour). Increasing fifo_batch 43*898bd37aSMauro Carvalho Chehabgenerally improves throughput, at the cost of latency variation. 44*898bd37aSMauro Carvalho Chehab 45*898bd37aSMauro Carvalho Chehab 46*898bd37aSMauro Carvalho Chehabwrites_starved (number of dispatches) 47*898bd37aSMauro Carvalho Chehab-------------------------------------- 48*898bd37aSMauro Carvalho Chehab 49*898bd37aSMauro Carvalho ChehabWhen we have to move requests from the io scheduler queue to the block 50*898bd37aSMauro Carvalho Chehabdevice dispatch queue, we always give a preference to reads. However, we 51*898bd37aSMauro Carvalho Chehabdon't want to starve writes indefinitely either. So writes_starved controls 52*898bd37aSMauro Carvalho Chehabhow many times we give preference to reads over writes. When that has been 53*898bd37aSMauro Carvalho Chehabdone writes_starved number of times, we dispatch some writes based on the 54*898bd37aSMauro Carvalho Chehabsame criteria as reads. 55*898bd37aSMauro Carvalho Chehab 56*898bd37aSMauro Carvalho Chehab 57*898bd37aSMauro Carvalho Chehabfront_merges (bool) 58*898bd37aSMauro Carvalho Chehab---------------------- 59*898bd37aSMauro Carvalho Chehab 60*898bd37aSMauro Carvalho ChehabSometimes it happens that a request enters the io scheduler that is contiguous 61*898bd37aSMauro Carvalho Chehabwith a request that is already on the queue. Either it fits in the back of that 62*898bd37aSMauro Carvalho Chehabrequest, or it fits at the front. That is called either a back merge candidate 63*898bd37aSMauro Carvalho Chehabor a front merge candidate. Due to the way files are typically laid out, 64*898bd37aSMauro Carvalho Chehabback merges are much more common than front merges. For some work loads, you 65*898bd37aSMauro Carvalho Chehabmay even know that it is a waste of time to spend any time attempting to 66*898bd37aSMauro Carvalho Chehabfront merge requests. Setting front_merges to 0 disables this functionality. 67*898bd37aSMauro Carvalho ChehabFront merges may still occur due to the cached last_merge hint, but since 68*898bd37aSMauro Carvalho Chehabthat comes at basically 0 cost we leave that on. We simply disable the 69*898bd37aSMauro Carvalho Chehabrbtree front sector lookup when the io scheduler merge function is called. 70*898bd37aSMauro Carvalho Chehab 71*898bd37aSMauro Carvalho Chehab 72*898bd37aSMauro Carvalho ChehabNov 11 2002, Jens Axboe <jens.axboe@oracle.com> 73