16cf2a73cSMauro Carvalho Chehab============================= 26cf2a73cSMauro Carvalho ChehabGuidance for writing policies 36cf2a73cSMauro Carvalho Chehab============================= 46cf2a73cSMauro Carvalho Chehab 56cf2a73cSMauro Carvalho ChehabTry to keep transactionality out of it. The core is careful to 66cf2a73cSMauro Carvalho Chehabavoid asking about anything that is migrating. This is a pain, but 76cf2a73cSMauro Carvalho Chehabmakes it easier to write the policies. 86cf2a73cSMauro Carvalho Chehab 96cf2a73cSMauro Carvalho ChehabMappings are loaded into the policy at construction time. 106cf2a73cSMauro Carvalho Chehab 116cf2a73cSMauro Carvalho ChehabEvery bio that is mapped by the target is referred to the policy. 126cf2a73cSMauro Carvalho ChehabThe policy can return a simple HIT or MISS or issue a migration. 136cf2a73cSMauro Carvalho Chehab 146cf2a73cSMauro Carvalho ChehabCurrently there's no way for the policy to issue background work, 156cf2a73cSMauro Carvalho Chehabe.g. to start writing back dirty blocks that are going to be evicted 166cf2a73cSMauro Carvalho Chehabsoon. 176cf2a73cSMauro Carvalho Chehab 186cf2a73cSMauro Carvalho ChehabBecause we map bios, rather than requests it's easy for the policy 196cf2a73cSMauro Carvalho Chehabto get fooled by many small bios. For this reason the core target 206cf2a73cSMauro Carvalho Chehabissues periodic ticks to the policy. It's suggested that the policy 216cf2a73cSMauro Carvalho Chehabdoesn't update states (eg, hit counts) for a block more than once 226cf2a73cSMauro Carvalho Chehabfor each tick. The core ticks by watching bios complete, and so 236cf2a73cSMauro Carvalho Chehabtrying to see when the io scheduler has let the ios run. 246cf2a73cSMauro Carvalho Chehab 256cf2a73cSMauro Carvalho Chehab 266cf2a73cSMauro Carvalho ChehabOverview of supplied cache replacement policies 276cf2a73cSMauro Carvalho Chehab=============================================== 286cf2a73cSMauro Carvalho Chehab 296cf2a73cSMauro Carvalho Chehabmultiqueue (mq) 306cf2a73cSMauro Carvalho Chehab--------------- 316cf2a73cSMauro Carvalho Chehab 326cf2a73cSMauro Carvalho ChehabThis policy is now an alias for smq (see below). 336cf2a73cSMauro Carvalho Chehab 346cf2a73cSMauro Carvalho ChehabThe following tunables are accepted, but have no effect:: 356cf2a73cSMauro Carvalho Chehab 366cf2a73cSMauro Carvalho Chehab 'sequential_threshold <#nr_sequential_ios>' 376cf2a73cSMauro Carvalho Chehab 'random_threshold <#nr_random_ios>' 386cf2a73cSMauro Carvalho Chehab 'read_promote_adjustment <value>' 396cf2a73cSMauro Carvalho Chehab 'write_promote_adjustment <value>' 406cf2a73cSMauro Carvalho Chehab 'discard_promote_adjustment <value>' 416cf2a73cSMauro Carvalho Chehab 426cf2a73cSMauro Carvalho ChehabStochastic multiqueue (smq) 436cf2a73cSMauro Carvalho Chehab--------------------------- 446cf2a73cSMauro Carvalho Chehab 456cf2a73cSMauro Carvalho ChehabThis policy is the default. 466cf2a73cSMauro Carvalho Chehab 476cf2a73cSMauro Carvalho ChehabThe stochastic multi-queue (smq) policy addresses some of the problems 486cf2a73cSMauro Carvalho Chehabwith the multiqueue (mq) policy. 496cf2a73cSMauro Carvalho Chehab 506cf2a73cSMauro Carvalho ChehabThe smq policy (vs mq) offers the promise of less memory utilization, 516cf2a73cSMauro Carvalho Chehabimproved performance and increased adaptability in the face of changing 526cf2a73cSMauro Carvalho Chehabworkloads. smq also does not have any cumbersome tuning knobs. 536cf2a73cSMauro Carvalho Chehab 546cf2a73cSMauro Carvalho ChehabUsers may switch from "mq" to "smq" simply by appropriately reloading a 556cf2a73cSMauro Carvalho ChehabDM table that is using the cache target. Doing so will cause all of the 566cf2a73cSMauro Carvalho Chehabmq policy's hints to be dropped. Also, performance of the cache may 576cf2a73cSMauro Carvalho Chehabdegrade slightly until smq recalculates the origin device's hotspots 586cf2a73cSMauro Carvalho Chehabthat should be cached. 596cf2a73cSMauro Carvalho Chehab 606cf2a73cSMauro Carvalho ChehabMemory usage 616cf2a73cSMauro Carvalho Chehab^^^^^^^^^^^^ 626cf2a73cSMauro Carvalho Chehab 636cf2a73cSMauro Carvalho ChehabThe mq policy used a lot of memory; 88 bytes per cache block on a 64 646cf2a73cSMauro Carvalho Chehabbit machine. 656cf2a73cSMauro Carvalho Chehab 666cf2a73cSMauro Carvalho Chehabsmq uses 28bit indexes to implement its data structures rather than 676cf2a73cSMauro Carvalho Chehabpointers. It avoids storing an explicit hit count for each block. It 686cf2a73cSMauro Carvalho Chehabhas a 'hotspot' queue, rather than a pre-cache, which uses a quarter of 696cf2a73cSMauro Carvalho Chehabthe entries (each hotspot block covers a larger area than a single 706cf2a73cSMauro Carvalho Chehabcache block). 716cf2a73cSMauro Carvalho Chehab 726cf2a73cSMauro Carvalho ChehabAll this means smq uses ~25bytes per cache block. Still a lot of 73*dbeb56feSRandy Dunlapmemory, but a substantial improvement nonetheless. 746cf2a73cSMauro Carvalho Chehab 756cf2a73cSMauro Carvalho ChehabLevel balancing 766cf2a73cSMauro Carvalho Chehab^^^^^^^^^^^^^^^ 776cf2a73cSMauro Carvalho Chehab 786cf2a73cSMauro Carvalho Chehabmq placed entries in different levels of the multiqueue structures 796cf2a73cSMauro Carvalho Chehabbased on their hit count (~ln(hit count)). This meant the bottom 806cf2a73cSMauro Carvalho Chehablevels generally had the most entries, and the top ones had very 816cf2a73cSMauro Carvalho Chehabfew. Having unbalanced levels like this reduced the efficacy of the 826cf2a73cSMauro Carvalho Chehabmultiqueue. 836cf2a73cSMauro Carvalho Chehab 846cf2a73cSMauro Carvalho Chehabsmq does not maintain a hit count, instead it swaps hit entries with 856cf2a73cSMauro Carvalho Chehabthe least recently used entry from the level above. The overall 866cf2a73cSMauro Carvalho Chehabordering being a side effect of this stochastic process. With this 876cf2a73cSMauro Carvalho Chehabscheme we can decide how many entries occupy each multiqueue level, 886cf2a73cSMauro Carvalho Chehabresulting in better promotion/demotion decisions. 896cf2a73cSMauro Carvalho Chehab 906cf2a73cSMauro Carvalho ChehabAdaptability: 916cf2a73cSMauro Carvalho ChehabThe mq policy maintained a hit count for each cache block. For a 926cf2a73cSMauro Carvalho Chehabdifferent block to get promoted to the cache its hit count has to 936cf2a73cSMauro Carvalho Chehabexceed the lowest currently in the cache. This meant it could take a 946cf2a73cSMauro Carvalho Chehablong time for the cache to adapt between varying IO patterns. 956cf2a73cSMauro Carvalho Chehab 966cf2a73cSMauro Carvalho Chehabsmq doesn't maintain hit counts, so a lot of this problem just goes 976cf2a73cSMauro Carvalho Chehabaway. In addition it tracks performance of the hotspot queue, which 986cf2a73cSMauro Carvalho Chehabis used to decide which blocks to promote. If the hotspot queue is 996cf2a73cSMauro Carvalho Chehabperforming badly then it starts moving entries more quickly between 1006cf2a73cSMauro Carvalho Chehablevels. This lets it adapt to new IO patterns very quickly. 1016cf2a73cSMauro Carvalho Chehab 1026cf2a73cSMauro Carvalho ChehabPerformance 1036cf2a73cSMauro Carvalho Chehab^^^^^^^^^^^ 1046cf2a73cSMauro Carvalho Chehab 1056cf2a73cSMauro Carvalho ChehabTesting smq shows substantially better performance than mq. 1066cf2a73cSMauro Carvalho Chehab 1076cf2a73cSMauro Carvalho Chehabcleaner 1086cf2a73cSMauro Carvalho Chehab------- 1096cf2a73cSMauro Carvalho Chehab 1106cf2a73cSMauro Carvalho ChehabThe cleaner writes back all dirty blocks in a cache to decommission it. 1116cf2a73cSMauro Carvalho Chehab 1126cf2a73cSMauro Carvalho ChehabExamples 1136cf2a73cSMauro Carvalho Chehab======== 1146cf2a73cSMauro Carvalho Chehab 1156cf2a73cSMauro Carvalho ChehabThe syntax for a table is:: 1166cf2a73cSMauro Carvalho Chehab 1176cf2a73cSMauro Carvalho Chehab cache <metadata dev> <cache dev> <origin dev> <block size> 1186cf2a73cSMauro Carvalho Chehab <#feature_args> [<feature arg>]* 1196cf2a73cSMauro Carvalho Chehab <policy> <#policy_args> [<policy arg>]* 1206cf2a73cSMauro Carvalho Chehab 1216cf2a73cSMauro Carvalho ChehabThe syntax to send a message using the dmsetup command is:: 1226cf2a73cSMauro Carvalho Chehab 1236cf2a73cSMauro Carvalho Chehab dmsetup message <mapped device> 0 sequential_threshold 1024 1246cf2a73cSMauro Carvalho Chehab dmsetup message <mapped device> 0 random_threshold 8 1256cf2a73cSMauro Carvalho Chehab 1266cf2a73cSMauro Carvalho ChehabUsing dmsetup:: 1276cf2a73cSMauro Carvalho Chehab 1286cf2a73cSMauro Carvalho Chehab dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \ 1296cf2a73cSMauro Carvalho Chehab /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8" 1306cf2a73cSMauro Carvalho Chehab creates a 128GB large mapped device named 'blah' with the 1316cf2a73cSMauro Carvalho Chehab sequential threshold set to 1024 and the random_threshold set to 8. 132