14f4cfa6cSMauro Carvalho Chehab========
24f4cfa6cSMauro Carvalho ChehabCPU load
34f4cfa6cSMauro Carvalho Chehab========
44f4cfa6cSMauro Carvalho Chehab
54f4cfa6cSMauro Carvalho ChehabLinux exports various bits of information via ``/proc/stat`` and
64f4cfa6cSMauro Carvalho Chehab``/proc/uptime`` that userland tools, such as top(1), use to calculate
74f4cfa6cSMauro Carvalho Chehabthe average time system spent in a particular state, for example::
84f4cfa6cSMauro Carvalho Chehab
94f4cfa6cSMauro Carvalho Chehab    $ iostat
104f4cfa6cSMauro Carvalho Chehab    Linux 2.6.18.3-exp (linmac)     02/20/2007
114f4cfa6cSMauro Carvalho Chehab
124f4cfa6cSMauro Carvalho Chehab    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
134f4cfa6cSMauro Carvalho Chehab              10.01    0.00    2.92    5.44    0.00   81.63
144f4cfa6cSMauro Carvalho Chehab
154f4cfa6cSMauro Carvalho Chehab    ...
164f4cfa6cSMauro Carvalho Chehab
174f4cfa6cSMauro Carvalho ChehabHere the system thinks that over the default sampling period the
184f4cfa6cSMauro Carvalho Chehabsystem spent 10.01% of the time doing work in user space, 2.92% in the
194f4cfa6cSMauro Carvalho Chehabkernel, and was overall 81.63% of the time idle.
204f4cfa6cSMauro Carvalho Chehab
214f4cfa6cSMauro Carvalho ChehabIn most cases the ``/proc/stat``	 information reflects the reality quite
224f4cfa6cSMauro Carvalho Chehabclosely, however due to the nature of how/when the kernel collects
234f4cfa6cSMauro Carvalho Chehabthis data sometimes it can not be trusted at all.
244f4cfa6cSMauro Carvalho Chehab
254f4cfa6cSMauro Carvalho ChehabSo how is this information collected?  Whenever timer interrupt is
264f4cfa6cSMauro Carvalho Chehabsignalled the kernel looks what kind of task was running at this
274f4cfa6cSMauro Carvalho Chehabmoment and increments the counter that corresponds to this tasks
284f4cfa6cSMauro Carvalho Chehabkind/state.  The problem with this is that the system could have
294f4cfa6cSMauro Carvalho Chehabswitched between various states multiple times between two timer
304f4cfa6cSMauro Carvalho Chehabinterrupts yet the counter is incremented only for the last state.
314f4cfa6cSMauro Carvalho Chehab
324f4cfa6cSMauro Carvalho Chehab
334f4cfa6cSMauro Carvalho ChehabExample
344f4cfa6cSMauro Carvalho Chehab-------
354f4cfa6cSMauro Carvalho Chehab
364f4cfa6cSMauro Carvalho ChehabIf we imagine the system with one task that periodically burns cycles
374f4cfa6cSMauro Carvalho Chehabin the following manner::
384f4cfa6cSMauro Carvalho Chehab
394f4cfa6cSMauro Carvalho Chehab     time line between two timer interrupts
404f4cfa6cSMauro Carvalho Chehab    |--------------------------------------|
414f4cfa6cSMauro Carvalho Chehab     ^                                    ^
424f4cfa6cSMauro Carvalho Chehab     |_ something begins working          |
434f4cfa6cSMauro Carvalho Chehab                                          |_ something goes to sleep
444f4cfa6cSMauro Carvalho Chehab                                         (only to be awaken quite soon)
454f4cfa6cSMauro Carvalho Chehab
464f4cfa6cSMauro Carvalho ChehabIn the above situation the system will be 0% loaded according to the
474f4cfa6cSMauro Carvalho Chehab``/proc/stat`` (since the timer interrupt will always happen when the
484f4cfa6cSMauro Carvalho Chehabsystem is executing the idle handler), but in reality the load is
494f4cfa6cSMauro Carvalho Chehabcloser to 99%.
504f4cfa6cSMauro Carvalho Chehab
514f4cfa6cSMauro Carvalho ChehabOne can imagine many more situations where this behavior of the kernel
524f4cfa6cSMauro Carvalho Chehabwill lead to quite erratic information inside ``/proc/stat``::
534f4cfa6cSMauro Carvalho Chehab
544f4cfa6cSMauro Carvalho Chehab
554f4cfa6cSMauro Carvalho Chehab	/* gcc -o hog smallhog.c */
564f4cfa6cSMauro Carvalho Chehab	#include <time.h>
574f4cfa6cSMauro Carvalho Chehab	#include <limits.h>
584f4cfa6cSMauro Carvalho Chehab	#include <signal.h>
594f4cfa6cSMauro Carvalho Chehab	#include <sys/time.h>
604f4cfa6cSMauro Carvalho Chehab	#define HIST 10
614f4cfa6cSMauro Carvalho Chehab
624f4cfa6cSMauro Carvalho Chehab	static volatile sig_atomic_t stop;
634f4cfa6cSMauro Carvalho Chehab
644f4cfa6cSMauro Carvalho Chehab	static void sighandler(int signr)
654f4cfa6cSMauro Carvalho Chehab	{
664f4cfa6cSMauro Carvalho Chehab		(void) signr;
674f4cfa6cSMauro Carvalho Chehab		stop = 1;
684f4cfa6cSMauro Carvalho Chehab	}
69bb7a2c63SHui Su
704f4cfa6cSMauro Carvalho Chehab	static unsigned long hog (unsigned long niters)
714f4cfa6cSMauro Carvalho Chehab	{
724f4cfa6cSMauro Carvalho Chehab		stop = 0;
734f4cfa6cSMauro Carvalho Chehab		while (!stop && --niters);
744f4cfa6cSMauro Carvalho Chehab		return niters;
754f4cfa6cSMauro Carvalho Chehab	}
76bb7a2c63SHui Su
774f4cfa6cSMauro Carvalho Chehab	int main (void)
784f4cfa6cSMauro Carvalho Chehab	{
794f4cfa6cSMauro Carvalho Chehab		int i;
80bb7a2c63SHui Su		struct itimerval it = {
81bb7a2c63SHui Su			.it_interval = { .tv_sec = 0, .tv_usec = 1 },
824f4cfa6cSMauro Carvalho Chehab			.it_value    = { .tv_sec = 0, .tv_usec = 1 } };
834f4cfa6cSMauro Carvalho Chehab		sigset_t set;
844f4cfa6cSMauro Carvalho Chehab		unsigned long v[HIST];
854f4cfa6cSMauro Carvalho Chehab		double tmp = 0.0;
864f4cfa6cSMauro Carvalho Chehab		unsigned long n;
874f4cfa6cSMauro Carvalho Chehab		signal(SIGALRM, &sighandler);
884f4cfa6cSMauro Carvalho Chehab		setitimer(ITIMER_REAL, &it, NULL);
894f4cfa6cSMauro Carvalho Chehab
904f4cfa6cSMauro Carvalho Chehab		hog (ULONG_MAX);
914f4cfa6cSMauro Carvalho Chehab		for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog(ULONG_MAX);
924f4cfa6cSMauro Carvalho Chehab		for (i = 0; i < HIST; ++i) tmp += v[i];
934f4cfa6cSMauro Carvalho Chehab		tmp /= HIST;
944f4cfa6cSMauro Carvalho Chehab		n = tmp - (tmp / 3.0);
954f4cfa6cSMauro Carvalho Chehab
964f4cfa6cSMauro Carvalho Chehab		sigemptyset(&set);
974f4cfa6cSMauro Carvalho Chehab		sigaddset(&set, SIGALRM);
984f4cfa6cSMauro Carvalho Chehab
994f4cfa6cSMauro Carvalho Chehab		for (;;) {
1004f4cfa6cSMauro Carvalho Chehab			hog(n);
1014f4cfa6cSMauro Carvalho Chehab			sigwait(&set, &i);
1024f4cfa6cSMauro Carvalho Chehab		}
1034f4cfa6cSMauro Carvalho Chehab		return 0;
1044f4cfa6cSMauro Carvalho Chehab	}
1054f4cfa6cSMauro Carvalho Chehab
1064f4cfa6cSMauro Carvalho Chehab
1074f4cfa6cSMauro Carvalho ChehabReferences
1084f4cfa6cSMauro Carvalho Chehab----------
1094f4cfa6cSMauro Carvalho Chehab
110*05a5f51cSJoe Perches- https://lore.kernel.org/r/loom.20070212T063225-663@post.gmane.org
1110c1bc6b8SMauro Carvalho Chehab- Documentation/filesystems/proc.rst (1.8)
1124f4cfa6cSMauro Carvalho Chehab
1134f4cfa6cSMauro Carvalho Chehab
1144f4cfa6cSMauro Carvalho ChehabThanks
1154f4cfa6cSMauro Carvalho Chehab------
1164f4cfa6cSMauro Carvalho Chehab
1174f4cfa6cSMauro Carvalho ChehabCon Kolivas, Pavel Machek
118