xref: /openbmc/linux/Documentation/mm/overcommit-accounting.rst (revision 4f2c0a4acffbec01079c28f839422e64ddeff004)
1*ee65728eSMike Rapoport=====================
2*ee65728eSMike RapoportOvercommit Accounting
3*ee65728eSMike Rapoport=====================
4*ee65728eSMike Rapoport
5*ee65728eSMike RapoportThe Linux kernel supports the following overcommit handling modes
6*ee65728eSMike Rapoport
7*ee65728eSMike Rapoport0
8*ee65728eSMike Rapoport	Heuristic overcommit handling. Obvious overcommits of address
9*ee65728eSMike Rapoport	space are refused. Used for a typical system. It ensures a
10*ee65728eSMike Rapoport	seriously wild allocation fails while allowing overcommit to
11*ee65728eSMike Rapoport	reduce swap usage.  root is allowed to allocate slightly more
12*ee65728eSMike Rapoport	memory in this mode. This is the default.
13*ee65728eSMike Rapoport
14*ee65728eSMike Rapoport1
15*ee65728eSMike Rapoport	Always overcommit. Appropriate for some scientific
16*ee65728eSMike Rapoport	applications. Classic example is code using sparse arrays and
17*ee65728eSMike Rapoport	just relying on the virtual memory consisting almost entirely
18*ee65728eSMike Rapoport	of zero pages.
19*ee65728eSMike Rapoport
20*ee65728eSMike Rapoport2
21*ee65728eSMike Rapoport	Don't overcommit. The total address space commit for the
22*ee65728eSMike Rapoport	system is not permitted to exceed swap + a configurable amount
23*ee65728eSMike Rapoport	(default is 50%) of physical RAM.  Depending on the amount you
24*ee65728eSMike Rapoport	use, in most situations this means a process will not be
25*ee65728eSMike Rapoport	killed while accessing pages but will receive errors on memory
26*ee65728eSMike Rapoport	allocation as appropriate.
27*ee65728eSMike Rapoport
28*ee65728eSMike Rapoport	Useful for applications that want to guarantee their memory
29*ee65728eSMike Rapoport	allocations will be available in the future without having to
30*ee65728eSMike Rapoport	initialize every page.
31*ee65728eSMike Rapoport
32*ee65728eSMike RapoportThe overcommit policy is set via the sysctl ``vm.overcommit_memory``.
33*ee65728eSMike Rapoport
34*ee65728eSMike RapoportThe overcommit amount can be set via ``vm.overcommit_ratio`` (percentage)
35*ee65728eSMike Rapoportor ``vm.overcommit_kbytes`` (absolute value). These only have an effect
36*ee65728eSMike Rapoportwhen ``vm.overcommit_memory`` is set to 2.
37*ee65728eSMike Rapoport
38*ee65728eSMike RapoportThe current overcommit limit and amount committed are viewable in
39*ee65728eSMike Rapoport``/proc/meminfo`` as CommitLimit and Committed_AS respectively.
40*ee65728eSMike Rapoport
41*ee65728eSMike RapoportGotchas
42*ee65728eSMike Rapoport=======
43*ee65728eSMike Rapoport
44*ee65728eSMike RapoportThe C language stack growth does an implicit mremap. If you want absolute
45*ee65728eSMike Rapoportguarantees and run close to the edge you MUST mmap your stack for the
46*ee65728eSMike Rapoportlargest size you think you will need. For typical stack usage this does
47*ee65728eSMike Rapoportnot matter much but it's a corner case if you really really care
48*ee65728eSMike Rapoport
49*ee65728eSMike RapoportIn mode 2 the MAP_NORESERVE flag is ignored.
50*ee65728eSMike Rapoport
51*ee65728eSMike Rapoport
52*ee65728eSMike RapoportHow It Works
53*ee65728eSMike Rapoport============
54*ee65728eSMike Rapoport
55*ee65728eSMike RapoportThe overcommit is based on the following rules
56*ee65728eSMike Rapoport
57*ee65728eSMike RapoportFor a file backed map
58*ee65728eSMike Rapoport	| SHARED or READ-only	-	0 cost (the file is the map not swap)
59*ee65728eSMike Rapoport	| PRIVATE WRITABLE	-	size of mapping per instance
60*ee65728eSMike Rapoport
61*ee65728eSMike RapoportFor an anonymous or ``/dev/zero`` map
62*ee65728eSMike Rapoport	| SHARED			-	size of mapping
63*ee65728eSMike Rapoport	| PRIVATE READ-only	-	0 cost (but of little use)
64*ee65728eSMike Rapoport	| PRIVATE WRITABLE	-	size of mapping per instance
65*ee65728eSMike Rapoport
66*ee65728eSMike RapoportAdditional accounting
67*ee65728eSMike Rapoport	| Pages made writable copies by mmap
68*ee65728eSMike Rapoport	| shmfs memory drawn from the same pool
69*ee65728eSMike Rapoport
70*ee65728eSMike RapoportStatus
71*ee65728eSMike Rapoport======
72*ee65728eSMike Rapoport
73*ee65728eSMike Rapoport*	We account mmap memory mappings
74*ee65728eSMike Rapoport*	We account mprotect changes in commit
75*ee65728eSMike Rapoport*	We account mremap changes in size
76*ee65728eSMike Rapoport*	We account brk
77*ee65728eSMike Rapoport*	We account munmap
78*ee65728eSMike Rapoport*	We report the commit status in /proc
79*ee65728eSMike Rapoport*	Account and check on fork
80*ee65728eSMike Rapoport*	Review stack handling/building on exec
81*ee65728eSMike Rapoport*	SHMfs accounting
82*ee65728eSMike Rapoport*	Implement actual limit enforcement
83*ee65728eSMike Rapoport
84*ee65728eSMike RapoportTo Do
85*ee65728eSMike Rapoport=====
86*ee65728eSMike Rapoport*	Account ptrace pages (this is hard)
87