1*ee65728eSMike Rapoport===================== 2*ee65728eSMike RapoportOvercommit Accounting 3*ee65728eSMike Rapoport===================== 4*ee65728eSMike Rapoport 5*ee65728eSMike RapoportThe Linux kernel supports the following overcommit handling modes 6*ee65728eSMike Rapoport 7*ee65728eSMike Rapoport0 8*ee65728eSMike Rapoport Heuristic overcommit handling. Obvious overcommits of address 9*ee65728eSMike Rapoport space are refused. Used for a typical system. It ensures a 10*ee65728eSMike Rapoport seriously wild allocation fails while allowing overcommit to 11*ee65728eSMike Rapoport reduce swap usage. root is allowed to allocate slightly more 12*ee65728eSMike Rapoport memory in this mode. This is the default. 13*ee65728eSMike Rapoport 14*ee65728eSMike Rapoport1 15*ee65728eSMike Rapoport Always overcommit. Appropriate for some scientific 16*ee65728eSMike Rapoport applications. Classic example is code using sparse arrays and 17*ee65728eSMike Rapoport just relying on the virtual memory consisting almost entirely 18*ee65728eSMike Rapoport of zero pages. 19*ee65728eSMike Rapoport 20*ee65728eSMike Rapoport2 21*ee65728eSMike Rapoport Don't overcommit. The total address space commit for the 22*ee65728eSMike Rapoport system is not permitted to exceed swap + a configurable amount 23*ee65728eSMike Rapoport (default is 50%) of physical RAM. Depending on the amount you 24*ee65728eSMike Rapoport use, in most situations this means a process will not be 25*ee65728eSMike Rapoport killed while accessing pages but will receive errors on memory 26*ee65728eSMike Rapoport allocation as appropriate. 27*ee65728eSMike Rapoport 28*ee65728eSMike Rapoport Useful for applications that want to guarantee their memory 29*ee65728eSMike Rapoport allocations will be available in the future without having to 30*ee65728eSMike Rapoport initialize every page. 31*ee65728eSMike Rapoport 32*ee65728eSMike RapoportThe overcommit policy is set via the sysctl ``vm.overcommit_memory``. 33*ee65728eSMike Rapoport 34*ee65728eSMike RapoportThe overcommit amount can be set via ``vm.overcommit_ratio`` (percentage) 35*ee65728eSMike Rapoportor ``vm.overcommit_kbytes`` (absolute value). These only have an effect 36*ee65728eSMike Rapoportwhen ``vm.overcommit_memory`` is set to 2. 37*ee65728eSMike Rapoport 38*ee65728eSMike RapoportThe current overcommit limit and amount committed are viewable in 39*ee65728eSMike Rapoport``/proc/meminfo`` as CommitLimit and Committed_AS respectively. 40*ee65728eSMike Rapoport 41*ee65728eSMike RapoportGotchas 42*ee65728eSMike Rapoport======= 43*ee65728eSMike Rapoport 44*ee65728eSMike RapoportThe C language stack growth does an implicit mremap. If you want absolute 45*ee65728eSMike Rapoportguarantees and run close to the edge you MUST mmap your stack for the 46*ee65728eSMike Rapoportlargest size you think you will need. For typical stack usage this does 47*ee65728eSMike Rapoportnot matter much but it's a corner case if you really really care 48*ee65728eSMike Rapoport 49*ee65728eSMike RapoportIn mode 2 the MAP_NORESERVE flag is ignored. 50*ee65728eSMike Rapoport 51*ee65728eSMike Rapoport 52*ee65728eSMike RapoportHow It Works 53*ee65728eSMike Rapoport============ 54*ee65728eSMike Rapoport 55*ee65728eSMike RapoportThe overcommit is based on the following rules 56*ee65728eSMike Rapoport 57*ee65728eSMike RapoportFor a file backed map 58*ee65728eSMike Rapoport | SHARED or READ-only - 0 cost (the file is the map not swap) 59*ee65728eSMike Rapoport | PRIVATE WRITABLE - size of mapping per instance 60*ee65728eSMike Rapoport 61*ee65728eSMike RapoportFor an anonymous or ``/dev/zero`` map 62*ee65728eSMike Rapoport | SHARED - size of mapping 63*ee65728eSMike Rapoport | PRIVATE READ-only - 0 cost (but of little use) 64*ee65728eSMike Rapoport | PRIVATE WRITABLE - size of mapping per instance 65*ee65728eSMike Rapoport 66*ee65728eSMike RapoportAdditional accounting 67*ee65728eSMike Rapoport | Pages made writable copies by mmap 68*ee65728eSMike Rapoport | shmfs memory drawn from the same pool 69*ee65728eSMike Rapoport 70*ee65728eSMike RapoportStatus 71*ee65728eSMike Rapoport====== 72*ee65728eSMike Rapoport 73*ee65728eSMike Rapoport* We account mmap memory mappings 74*ee65728eSMike Rapoport* We account mprotect changes in commit 75*ee65728eSMike Rapoport* We account mremap changes in size 76*ee65728eSMike Rapoport* We account brk 77*ee65728eSMike Rapoport* We account munmap 78*ee65728eSMike Rapoport* We report the commit status in /proc 79*ee65728eSMike Rapoport* Account and check on fork 80*ee65728eSMike Rapoport* Review stack handling/building on exec 81*ee65728eSMike Rapoport* SHMfs accounting 82*ee65728eSMike Rapoport* Implement actual limit enforcement 83*ee65728eSMike Rapoport 84*ee65728eSMike RapoportTo Do 85*ee65728eSMike Rapoport===== 86*ee65728eSMike Rapoport* Account ptrace pages (this is hard) 87