1898bd37aSMauro Carvalho Chehab=================== 2898bd37aSMauro Carvalho ChehabBlock io priorities 3898bd37aSMauro Carvalho Chehab=================== 4898bd37aSMauro Carvalho Chehab 5898bd37aSMauro Carvalho Chehab 6898bd37aSMauro Carvalho ChehabIntro 7898bd37aSMauro Carvalho Chehab----- 8898bd37aSMauro Carvalho Chehab 9898bd37aSMauro Carvalho ChehabWith the introduction of cfq v3 (aka cfq-ts or time sliced cfq), basic io 10898bd37aSMauro Carvalho Chehabpriorities are supported for reads on files. This enables users to io nice 11898bd37aSMauro Carvalho Chehabprocesses or process groups, similar to what has been possible with cpu 12898bd37aSMauro Carvalho Chehabscheduling for ages. This document mainly details the current possibilities 13898bd37aSMauro Carvalho Chehabwith cfq; other io schedulers do not support io priorities thus far. 14898bd37aSMauro Carvalho Chehab 15898bd37aSMauro Carvalho ChehabScheduling classes 16898bd37aSMauro Carvalho Chehab------------------ 17898bd37aSMauro Carvalho Chehab 18898bd37aSMauro Carvalho ChehabCFQ implements three generic scheduling classes that determine how io is 19898bd37aSMauro Carvalho Chehabserved for a process. 20898bd37aSMauro Carvalho Chehab 21898bd37aSMauro Carvalho ChehabIOPRIO_CLASS_RT: This is the realtime io class. This scheduling class is given 22898bd37aSMauro Carvalho Chehabhigher priority than any other in the system, processes from this class are 23898bd37aSMauro Carvalho Chehabgiven first access to the disk every time. Thus it needs to be used with some 24898bd37aSMauro Carvalho Chehabcare, one io RT process can starve the entire system. Within the RT class, 25898bd37aSMauro Carvalho Chehabthere are 8 levels of class data that determine exactly how much time this 26898bd37aSMauro Carvalho Chehabprocess needs the disk for on each service. In the future this might change 27898bd37aSMauro Carvalho Chehabto be more directly mappable to performance, by passing in a wanted data 28898bd37aSMauro Carvalho Chehabrate instead. 29898bd37aSMauro Carvalho Chehab 30898bd37aSMauro Carvalho ChehabIOPRIO_CLASS_BE: This is the best-effort scheduling class, which is the default 31898bd37aSMauro Carvalho Chehabfor any process that hasn't set a specific io priority. The class data 32898bd37aSMauro Carvalho Chehabdetermines how much io bandwidth the process will get, it's directly mappable 33898bd37aSMauro Carvalho Chehabto the cpu nice levels just more coarsely implemented. 0 is the highest 34898bd37aSMauro Carvalho ChehabBE prio level, 7 is the lowest. The mapping between cpu nice level and io 35898bd37aSMauro Carvalho Chehabnice level is determined as: io_nice = (cpu_nice + 20) / 5. 36898bd37aSMauro Carvalho Chehab 37898bd37aSMauro Carvalho ChehabIOPRIO_CLASS_IDLE: This is the idle scheduling class, processes running at this 38898bd37aSMauro Carvalho Chehablevel only get io time when no one else needs the disk. The idle class has no 39898bd37aSMauro Carvalho Chehabclass data, since it doesn't really apply here. 40898bd37aSMauro Carvalho Chehab 41898bd37aSMauro Carvalho ChehabTools 42898bd37aSMauro Carvalho Chehab----- 43898bd37aSMauro Carvalho Chehab 44898bd37aSMauro Carvalho ChehabSee below for a sample ionice tool. Usage:: 45898bd37aSMauro Carvalho Chehab 46898bd37aSMauro Carvalho Chehab # ionice -c<class> -n<level> -p<pid> 47898bd37aSMauro Carvalho Chehab 48898bd37aSMauro Carvalho ChehabIf pid isn't given, the current process is assumed. IO priority settings 49898bd37aSMauro Carvalho Chehabare inherited on fork, so you can use ionice to start the process at a given 50898bd37aSMauro Carvalho Chehablevel:: 51898bd37aSMauro Carvalho Chehab 52898bd37aSMauro Carvalho Chehab # ionice -c2 -n0 /bin/ls 53898bd37aSMauro Carvalho Chehab 54898bd37aSMauro Carvalho Chehabwill run ls at the best-effort scheduling class at the highest priority. 55898bd37aSMauro Carvalho ChehabFor a running process, you can give the pid instead:: 56898bd37aSMauro Carvalho Chehab 57898bd37aSMauro Carvalho Chehab # ionice -c1 -n2 -p100 58898bd37aSMauro Carvalho Chehab 59898bd37aSMauro Carvalho Chehabwill change pid 100 to run at the realtime scheduling class, at priority 2. 60898bd37aSMauro Carvalho Chehab 61898bd37aSMauro Carvalho Chehabionice.c tool:: 62898bd37aSMauro Carvalho Chehab 63898bd37aSMauro Carvalho Chehab #include <stdio.h> 64898bd37aSMauro Carvalho Chehab #include <stdlib.h> 65898bd37aSMauro Carvalho Chehab #include <errno.h> 66898bd37aSMauro Carvalho Chehab #include <getopt.h> 67898bd37aSMauro Carvalho Chehab #include <unistd.h> 68898bd37aSMauro Carvalho Chehab #include <sys/ptrace.h> 69898bd37aSMauro Carvalho Chehab #include <asm/unistd.h> 70898bd37aSMauro Carvalho Chehab 71898bd37aSMauro Carvalho Chehab extern int sys_ioprio_set(int, int, int); 72898bd37aSMauro Carvalho Chehab extern int sys_ioprio_get(int, int); 73898bd37aSMauro Carvalho Chehab 74898bd37aSMauro Carvalho Chehab #if defined(__i386__) 75898bd37aSMauro Carvalho Chehab #define __NR_ioprio_set 289 76898bd37aSMauro Carvalho Chehab #define __NR_ioprio_get 290 77898bd37aSMauro Carvalho Chehab #elif defined(__ppc__) 78898bd37aSMauro Carvalho Chehab #define __NR_ioprio_set 273 79898bd37aSMauro Carvalho Chehab #define __NR_ioprio_get 274 80898bd37aSMauro Carvalho Chehab #elif defined(__x86_64__) 81898bd37aSMauro Carvalho Chehab #define __NR_ioprio_set 251 82898bd37aSMauro Carvalho Chehab #define __NR_ioprio_get 252 83898bd37aSMauro Carvalho Chehab #elif defined(__ia64__) 84898bd37aSMauro Carvalho Chehab #define __NR_ioprio_set 1274 85898bd37aSMauro Carvalho Chehab #define __NR_ioprio_get 1275 86898bd37aSMauro Carvalho Chehab #else 87898bd37aSMauro Carvalho Chehab #error "Unsupported arch" 88898bd37aSMauro Carvalho Chehab #endif 89898bd37aSMauro Carvalho Chehab 90898bd37aSMauro Carvalho Chehab static inline int ioprio_set(int which, int who, int ioprio) 91898bd37aSMauro Carvalho Chehab { 92898bd37aSMauro Carvalho Chehab return syscall(__NR_ioprio_set, which, who, ioprio); 93898bd37aSMauro Carvalho Chehab } 94898bd37aSMauro Carvalho Chehab 95898bd37aSMauro Carvalho Chehab static inline int ioprio_get(int which, int who) 96898bd37aSMauro Carvalho Chehab { 97898bd37aSMauro Carvalho Chehab return syscall(__NR_ioprio_get, which, who); 98898bd37aSMauro Carvalho Chehab } 99898bd37aSMauro Carvalho Chehab 100898bd37aSMauro Carvalho Chehab enum { 101898bd37aSMauro Carvalho Chehab IOPRIO_CLASS_NONE, 102898bd37aSMauro Carvalho Chehab IOPRIO_CLASS_RT, 103898bd37aSMauro Carvalho Chehab IOPRIO_CLASS_BE, 104898bd37aSMauro Carvalho Chehab IOPRIO_CLASS_IDLE, 105898bd37aSMauro Carvalho Chehab }; 106898bd37aSMauro Carvalho Chehab 107898bd37aSMauro Carvalho Chehab enum { 108898bd37aSMauro Carvalho Chehab IOPRIO_WHO_PROCESS = 1, 109898bd37aSMauro Carvalho Chehab IOPRIO_WHO_PGRP, 110898bd37aSMauro Carvalho Chehab IOPRIO_WHO_USER, 111898bd37aSMauro Carvalho Chehab }; 112898bd37aSMauro Carvalho Chehab 113898bd37aSMauro Carvalho Chehab #define IOPRIO_CLASS_SHIFT 13 114898bd37aSMauro Carvalho Chehab 115898bd37aSMauro Carvalho Chehab const char *to_prio[] = { "none", "realtime", "best-effort", "idle", }; 116898bd37aSMauro Carvalho Chehab 117898bd37aSMauro Carvalho Chehab int main(int argc, char *argv[]) 118898bd37aSMauro Carvalho Chehab { 119898bd37aSMauro Carvalho Chehab int ioprio = 4, set = 0, ioprio_class = IOPRIO_CLASS_BE; 120898bd37aSMauro Carvalho Chehab int c, pid = 0; 121898bd37aSMauro Carvalho Chehab 122898bd37aSMauro Carvalho Chehab while ((c = getopt(argc, argv, "+n:c:p:")) != EOF) { 123898bd37aSMauro Carvalho Chehab switch (c) { 124898bd37aSMauro Carvalho Chehab case 'n': 125898bd37aSMauro Carvalho Chehab ioprio = strtol(optarg, NULL, 10); 126898bd37aSMauro Carvalho Chehab set = 1; 127898bd37aSMauro Carvalho Chehab break; 128898bd37aSMauro Carvalho Chehab case 'c': 129898bd37aSMauro Carvalho Chehab ioprio_class = strtol(optarg, NULL, 10); 130898bd37aSMauro Carvalho Chehab set = 1; 131898bd37aSMauro Carvalho Chehab break; 132898bd37aSMauro Carvalho Chehab case 'p': 133898bd37aSMauro Carvalho Chehab pid = strtol(optarg, NULL, 10); 134898bd37aSMauro Carvalho Chehab break; 135898bd37aSMauro Carvalho Chehab } 136898bd37aSMauro Carvalho Chehab } 137898bd37aSMauro Carvalho Chehab 138898bd37aSMauro Carvalho Chehab switch (ioprio_class) { 139898bd37aSMauro Carvalho Chehab case IOPRIO_CLASS_NONE: 140898bd37aSMauro Carvalho Chehab ioprio_class = IOPRIO_CLASS_BE; 141898bd37aSMauro Carvalho Chehab break; 142898bd37aSMauro Carvalho Chehab case IOPRIO_CLASS_RT: 143898bd37aSMauro Carvalho Chehab case IOPRIO_CLASS_BE: 144898bd37aSMauro Carvalho Chehab break; 145898bd37aSMauro Carvalho Chehab case IOPRIO_CLASS_IDLE: 146898bd37aSMauro Carvalho Chehab ioprio = 7; 147898bd37aSMauro Carvalho Chehab break; 148898bd37aSMauro Carvalho Chehab default: 149898bd37aSMauro Carvalho Chehab printf("bad prio class %d\n", ioprio_class); 150898bd37aSMauro Carvalho Chehab return 1; 151898bd37aSMauro Carvalho Chehab } 152898bd37aSMauro Carvalho Chehab 153898bd37aSMauro Carvalho Chehab if (!set) { 154898bd37aSMauro Carvalho Chehab if (!pid && argv[optind]) 155898bd37aSMauro Carvalho Chehab pid = strtol(argv[optind], NULL, 10); 156898bd37aSMauro Carvalho Chehab 157898bd37aSMauro Carvalho Chehab ioprio = ioprio_get(IOPRIO_WHO_PROCESS, pid); 158898bd37aSMauro Carvalho Chehab 159898bd37aSMauro Carvalho Chehab printf("pid=%d, %d\n", pid, ioprio); 160898bd37aSMauro Carvalho Chehab 161898bd37aSMauro Carvalho Chehab if (ioprio == -1) 162898bd37aSMauro Carvalho Chehab perror("ioprio_get"); 163898bd37aSMauro Carvalho Chehab else { 164898bd37aSMauro Carvalho Chehab ioprio_class = ioprio >> IOPRIO_CLASS_SHIFT; 165898bd37aSMauro Carvalho Chehab ioprio = ioprio & 0xff; 166898bd37aSMauro Carvalho Chehab printf("%s: prio %d\n", to_prio[ioprio_class], ioprio); 167898bd37aSMauro Carvalho Chehab } 168898bd37aSMauro Carvalho Chehab } else { 169898bd37aSMauro Carvalho Chehab if (ioprio_set(IOPRIO_WHO_PROCESS, pid, ioprio | ioprio_class << IOPRIO_CLASS_SHIFT) == -1) { 170898bd37aSMauro Carvalho Chehab perror("ioprio_set"); 171898bd37aSMauro Carvalho Chehab return 1; 172898bd37aSMauro Carvalho Chehab } 173898bd37aSMauro Carvalho Chehab 174898bd37aSMauro Carvalho Chehab if (argv[optind]) 175898bd37aSMauro Carvalho Chehab execvp(argv[optind], &argv[optind]); 176898bd37aSMauro Carvalho Chehab } 177898bd37aSMauro Carvalho Chehab 178898bd37aSMauro Carvalho Chehab return 0; 179898bd37aSMauro Carvalho Chehab } 180898bd37aSMauro Carvalho Chehab 181898bd37aSMauro Carvalho Chehab 182898bd37aSMauro Carvalho ChehabMarch 11 2005, Jens Axboe <jens.axboe@oracle.com> 183