1898bd37aSMauro Carvalho Chehab===================
2898bd37aSMauro Carvalho ChehabBlock io priorities
3898bd37aSMauro Carvalho Chehab===================
4898bd37aSMauro Carvalho Chehab
5898bd37aSMauro Carvalho Chehab
6898bd37aSMauro Carvalho ChehabIntro
7898bd37aSMauro Carvalho Chehab-----
8898bd37aSMauro Carvalho Chehab
9898bd37aSMauro Carvalho ChehabWith the introduction of cfq v3 (aka cfq-ts or time sliced cfq), basic io
10898bd37aSMauro Carvalho Chehabpriorities are supported for reads on files.  This enables users to io nice
11898bd37aSMauro Carvalho Chehabprocesses or process groups, similar to what has been possible with cpu
12898bd37aSMauro Carvalho Chehabscheduling for ages.  This document mainly details the current possibilities
13898bd37aSMauro Carvalho Chehabwith cfq; other io schedulers do not support io priorities thus far.
14898bd37aSMauro Carvalho Chehab
15898bd37aSMauro Carvalho ChehabScheduling classes
16898bd37aSMauro Carvalho Chehab------------------
17898bd37aSMauro Carvalho Chehab
18898bd37aSMauro Carvalho ChehabCFQ implements three generic scheduling classes that determine how io is
19898bd37aSMauro Carvalho Chehabserved for a process.
20898bd37aSMauro Carvalho Chehab
21898bd37aSMauro Carvalho ChehabIOPRIO_CLASS_RT: This is the realtime io class. This scheduling class is given
22898bd37aSMauro Carvalho Chehabhigher priority than any other in the system, processes from this class are
23898bd37aSMauro Carvalho Chehabgiven first access to the disk every time. Thus it needs to be used with some
24898bd37aSMauro Carvalho Chehabcare, one io RT process can starve the entire system. Within the RT class,
25898bd37aSMauro Carvalho Chehabthere are 8 levels of class data that determine exactly how much time this
26898bd37aSMauro Carvalho Chehabprocess needs the disk for on each service. In the future this might change
27898bd37aSMauro Carvalho Chehabto be more directly mappable to performance, by passing in a wanted data
28898bd37aSMauro Carvalho Chehabrate instead.
29898bd37aSMauro Carvalho Chehab
30898bd37aSMauro Carvalho ChehabIOPRIO_CLASS_BE: This is the best-effort scheduling class, which is the default
31898bd37aSMauro Carvalho Chehabfor any process that hasn't set a specific io priority. The class data
32898bd37aSMauro Carvalho Chehabdetermines how much io bandwidth the process will get, it's directly mappable
33898bd37aSMauro Carvalho Chehabto the cpu nice levels just more coarsely implemented. 0 is the highest
34898bd37aSMauro Carvalho ChehabBE prio level, 7 is the lowest. The mapping between cpu nice level and io
35898bd37aSMauro Carvalho Chehabnice level is determined as: io_nice = (cpu_nice + 20) / 5.
36898bd37aSMauro Carvalho Chehab
37898bd37aSMauro Carvalho ChehabIOPRIO_CLASS_IDLE: This is the idle scheduling class, processes running at this
38898bd37aSMauro Carvalho Chehablevel only get io time when no one else needs the disk. The idle class has no
39898bd37aSMauro Carvalho Chehabclass data, since it doesn't really apply here.
40898bd37aSMauro Carvalho Chehab
41898bd37aSMauro Carvalho ChehabTools
42898bd37aSMauro Carvalho Chehab-----
43898bd37aSMauro Carvalho Chehab
44898bd37aSMauro Carvalho ChehabSee below for a sample ionice tool. Usage::
45898bd37aSMauro Carvalho Chehab
46898bd37aSMauro Carvalho Chehab	# ionice -c<class> -n<level> -p<pid>
47898bd37aSMauro Carvalho Chehab
48898bd37aSMauro Carvalho ChehabIf pid isn't given, the current process is assumed. IO priority settings
49898bd37aSMauro Carvalho Chehabare inherited on fork, so you can use ionice to start the process at a given
50898bd37aSMauro Carvalho Chehablevel::
51898bd37aSMauro Carvalho Chehab
52898bd37aSMauro Carvalho Chehab	# ionice -c2 -n0 /bin/ls
53898bd37aSMauro Carvalho Chehab
54898bd37aSMauro Carvalho Chehabwill run ls at the best-effort scheduling class at the highest priority.
55898bd37aSMauro Carvalho ChehabFor a running process, you can give the pid instead::
56898bd37aSMauro Carvalho Chehab
57898bd37aSMauro Carvalho Chehab	# ionice -c1 -n2 -p100
58898bd37aSMauro Carvalho Chehab
59898bd37aSMauro Carvalho Chehabwill change pid 100 to run at the realtime scheduling class, at priority 2.
60898bd37aSMauro Carvalho Chehab
61898bd37aSMauro Carvalho Chehabionice.c tool::
62898bd37aSMauro Carvalho Chehab
63898bd37aSMauro Carvalho Chehab  #include <stdio.h>
64898bd37aSMauro Carvalho Chehab  #include <stdlib.h>
65898bd37aSMauro Carvalho Chehab  #include <errno.h>
66898bd37aSMauro Carvalho Chehab  #include <getopt.h>
67898bd37aSMauro Carvalho Chehab  #include <unistd.h>
68898bd37aSMauro Carvalho Chehab  #include <sys/ptrace.h>
69898bd37aSMauro Carvalho Chehab  #include <asm/unistd.h>
70898bd37aSMauro Carvalho Chehab
71898bd37aSMauro Carvalho Chehab  extern int sys_ioprio_set(int, int, int);
72898bd37aSMauro Carvalho Chehab  extern int sys_ioprio_get(int, int);
73898bd37aSMauro Carvalho Chehab
74898bd37aSMauro Carvalho Chehab  #if defined(__i386__)
75898bd37aSMauro Carvalho Chehab  #define __NR_ioprio_set		289
76898bd37aSMauro Carvalho Chehab  #define __NR_ioprio_get		290
77898bd37aSMauro Carvalho Chehab  #elif defined(__ppc__)
78898bd37aSMauro Carvalho Chehab  #define __NR_ioprio_set		273
79898bd37aSMauro Carvalho Chehab  #define __NR_ioprio_get		274
80898bd37aSMauro Carvalho Chehab  #elif defined(__x86_64__)
81898bd37aSMauro Carvalho Chehab  #define __NR_ioprio_set		251
82898bd37aSMauro Carvalho Chehab  #define __NR_ioprio_get		252
83898bd37aSMauro Carvalho Chehab  #elif defined(__ia64__)
84898bd37aSMauro Carvalho Chehab  #define __NR_ioprio_set		1274
85898bd37aSMauro Carvalho Chehab  #define __NR_ioprio_get		1275
86898bd37aSMauro Carvalho Chehab  #else
87898bd37aSMauro Carvalho Chehab  #error "Unsupported arch"
88898bd37aSMauro Carvalho Chehab  #endif
89898bd37aSMauro Carvalho Chehab
90898bd37aSMauro Carvalho Chehab  static inline int ioprio_set(int which, int who, int ioprio)
91898bd37aSMauro Carvalho Chehab  {
92898bd37aSMauro Carvalho Chehab	return syscall(__NR_ioprio_set, which, who, ioprio);
93898bd37aSMauro Carvalho Chehab  }
94898bd37aSMauro Carvalho Chehab
95898bd37aSMauro Carvalho Chehab  static inline int ioprio_get(int which, int who)
96898bd37aSMauro Carvalho Chehab  {
97898bd37aSMauro Carvalho Chehab	return syscall(__NR_ioprio_get, which, who);
98898bd37aSMauro Carvalho Chehab  }
99898bd37aSMauro Carvalho Chehab
100898bd37aSMauro Carvalho Chehab  enum {
101898bd37aSMauro Carvalho Chehab	IOPRIO_CLASS_NONE,
102898bd37aSMauro Carvalho Chehab	IOPRIO_CLASS_RT,
103898bd37aSMauro Carvalho Chehab	IOPRIO_CLASS_BE,
104898bd37aSMauro Carvalho Chehab	IOPRIO_CLASS_IDLE,
105898bd37aSMauro Carvalho Chehab  };
106898bd37aSMauro Carvalho Chehab
107898bd37aSMauro Carvalho Chehab  enum {
108898bd37aSMauro Carvalho Chehab	IOPRIO_WHO_PROCESS = 1,
109898bd37aSMauro Carvalho Chehab	IOPRIO_WHO_PGRP,
110898bd37aSMauro Carvalho Chehab	IOPRIO_WHO_USER,
111898bd37aSMauro Carvalho Chehab  };
112898bd37aSMauro Carvalho Chehab
113898bd37aSMauro Carvalho Chehab  #define IOPRIO_CLASS_SHIFT	13
114898bd37aSMauro Carvalho Chehab
115898bd37aSMauro Carvalho Chehab  const char *to_prio[] = { "none", "realtime", "best-effort", "idle", };
116898bd37aSMauro Carvalho Chehab
117898bd37aSMauro Carvalho Chehab  int main(int argc, char *argv[])
118898bd37aSMauro Carvalho Chehab  {
119898bd37aSMauro Carvalho Chehab	int ioprio = 4, set = 0, ioprio_class = IOPRIO_CLASS_BE;
120898bd37aSMauro Carvalho Chehab	int c, pid = 0;
121898bd37aSMauro Carvalho Chehab
122898bd37aSMauro Carvalho Chehab	while ((c = getopt(argc, argv, "+n:c:p:")) != EOF) {
123898bd37aSMauro Carvalho Chehab		switch (c) {
124898bd37aSMauro Carvalho Chehab		case 'n':
125898bd37aSMauro Carvalho Chehab			ioprio = strtol(optarg, NULL, 10);
126898bd37aSMauro Carvalho Chehab			set = 1;
127898bd37aSMauro Carvalho Chehab			break;
128898bd37aSMauro Carvalho Chehab		case 'c':
129898bd37aSMauro Carvalho Chehab			ioprio_class = strtol(optarg, NULL, 10);
130898bd37aSMauro Carvalho Chehab			set = 1;
131898bd37aSMauro Carvalho Chehab			break;
132898bd37aSMauro Carvalho Chehab		case 'p':
133898bd37aSMauro Carvalho Chehab			pid = strtol(optarg, NULL, 10);
134898bd37aSMauro Carvalho Chehab			break;
135898bd37aSMauro Carvalho Chehab		}
136898bd37aSMauro Carvalho Chehab	}
137898bd37aSMauro Carvalho Chehab
138898bd37aSMauro Carvalho Chehab	switch (ioprio_class) {
139898bd37aSMauro Carvalho Chehab		case IOPRIO_CLASS_NONE:
140898bd37aSMauro Carvalho Chehab			ioprio_class = IOPRIO_CLASS_BE;
141898bd37aSMauro Carvalho Chehab			break;
142898bd37aSMauro Carvalho Chehab		case IOPRIO_CLASS_RT:
143898bd37aSMauro Carvalho Chehab		case IOPRIO_CLASS_BE:
144898bd37aSMauro Carvalho Chehab			break;
145898bd37aSMauro Carvalho Chehab		case IOPRIO_CLASS_IDLE:
146898bd37aSMauro Carvalho Chehab			ioprio = 7;
147898bd37aSMauro Carvalho Chehab			break;
148898bd37aSMauro Carvalho Chehab		default:
149898bd37aSMauro Carvalho Chehab			printf("bad prio class %d\n", ioprio_class);
150898bd37aSMauro Carvalho Chehab			return 1;
151898bd37aSMauro Carvalho Chehab	}
152898bd37aSMauro Carvalho Chehab
153898bd37aSMauro Carvalho Chehab	if (!set) {
154898bd37aSMauro Carvalho Chehab		if (!pid && argv[optind])
155898bd37aSMauro Carvalho Chehab			pid = strtol(argv[optind], NULL, 10);
156898bd37aSMauro Carvalho Chehab
157898bd37aSMauro Carvalho Chehab		ioprio = ioprio_get(IOPRIO_WHO_PROCESS, pid);
158898bd37aSMauro Carvalho Chehab
159898bd37aSMauro Carvalho Chehab		printf("pid=%d, %d\n", pid, ioprio);
160898bd37aSMauro Carvalho Chehab
161898bd37aSMauro Carvalho Chehab		if (ioprio == -1)
162898bd37aSMauro Carvalho Chehab			perror("ioprio_get");
163898bd37aSMauro Carvalho Chehab		else {
164898bd37aSMauro Carvalho Chehab			ioprio_class = ioprio >> IOPRIO_CLASS_SHIFT;
165898bd37aSMauro Carvalho Chehab			ioprio = ioprio & 0xff;
166898bd37aSMauro Carvalho Chehab			printf("%s: prio %d\n", to_prio[ioprio_class], ioprio);
167898bd37aSMauro Carvalho Chehab		}
168898bd37aSMauro Carvalho Chehab	} else {
169898bd37aSMauro Carvalho Chehab		if (ioprio_set(IOPRIO_WHO_PROCESS, pid, ioprio | ioprio_class << IOPRIO_CLASS_SHIFT) == -1) {
170898bd37aSMauro Carvalho Chehab			perror("ioprio_set");
171898bd37aSMauro Carvalho Chehab			return 1;
172898bd37aSMauro Carvalho Chehab		}
173898bd37aSMauro Carvalho Chehab
174898bd37aSMauro Carvalho Chehab		if (argv[optind])
175898bd37aSMauro Carvalho Chehab			execvp(argv[optind], &argv[optind]);
176898bd37aSMauro Carvalho Chehab	}
177898bd37aSMauro Carvalho Chehab
178898bd37aSMauro Carvalho Chehab	return 0;
179898bd37aSMauro Carvalho Chehab  }
180898bd37aSMauro Carvalho Chehab
181898bd37aSMauro Carvalho Chehab
182898bd37aSMauro Carvalho ChehabMarch 11 2005, Jens Axboe <jens.axboe@oracle.com>
183