Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48 |
|
#
8b479335 |
| 22-Aug-2023 |
Stefan Roesch <shr@devkernel.io> |
proc/ksm: add ksm stats to /proc/pid/smaps
With madvise and prctl KSM can be enabled for different VMA's. Once it is enabled we can query how effective KSM is overall. However we cannot easily que
proc/ksm: add ksm stats to /proc/pid/smaps
With madvise and prctl KSM can be enabled for different VMA's. Once it is enabled we can query how effective KSM is overall. However we cannot easily query if an individual VMA benefits from KSM.
This commit adds a KSM section to the /prod/<pid>/smaps file. It reports how many of the pages are KSM pages. Note that KSM-placed zeropages are not included, only actual KSM pages.
Here is a typical output:
7f420a000000-7f421a000000 rw-p 00000000 00:00 0 Size: 262144 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 51212 kB Pss: 8276 kB Shared_Clean: 172 kB Shared_Dirty: 42996 kB Private_Clean: 196 kB Private_Dirty: 7848 kB Referenced: 15388 kB Anonymous: 51212 kB KSM: 41376 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 202016 kB SwapPss: 3882 kB Locked: 0 kB THPeligible: 0 ProtectionKey: 0 ksm_state: 0 ksm_skip_base: 0 ksm_skip_count: 0 VmFlags: rd wr mr mw me nr mg anon
This information also helps with the following workflow: - First enable KSM for all the VMA's of a process with prctl. - Then analyze with the above smaps report which VMA's benefit the most - Change the application (if possible) to add the corresponding madvise calls for the VMA's that benefit the most
[shr@devkernel.io: v5] Link: https://lkml.kernel.org/r/20230823170107.1457915-1-shr@devkernel.io Link: https://lkml.kernel.org/r/20230822180539.1424843-1-shr@devkernel.io Signed-off-by: Stefan Roesch <shr@devkernel.io> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.46 |
|
#
d56b699d |
| 14-Aug-2023 |
Bjorn Helgaas <bhelgaas@google.com> |
Documentation: Fix typos
Fix typos in Documentation.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230814212822.193684-4-helgaas@kernel.org Signed-off-by: Jon
Documentation: Fix typos
Fix typos in Documentation.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230814212822.193684-4-helgaas@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
show more ...
|
Revision tags: v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34 |
|
#
54007f81 |
| 12-Jun-2023 |
Yu-cheng Yu <yu-cheng.yu@intel.com> |
mm: Introduce VM_SHADOW_STACK for shadow stack memory
New hardware extensions implement support for shadow stack memory, such as x86 Control-flow Enforcement Technology (CET). Add a new VM flag to i
mm: Introduce VM_SHADOW_STACK for shadow stack memory
New hardware extensions implement support for shadow stack memory, such as x86 Control-flow Enforcement Technology (CET). Add a new VM flag to identify these areas, for example, to be used to properly indicate shadow stack PTEs to the hardware.
Shadow stack VMA creation will be tightly controlled and limited to anonymous memory to make the implementation simpler and since that is all that is required. The solution will rely on pte_mkwrite() to create the shadow stack PTEs, so it will not be required for vm_get_page_prot() to learn how to create shadow stack memory. For this reason document that VM_SHADOW_STACK should not be mixed with VM_SHARED.
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20230613001108.3040476-15-rick.p.edgecombe%40intel.com
show more ...
|
Revision tags: v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25 |
|
#
522dc4e5 |
| 16-Apr-2023 |
Chunguang Wu <fullspring2018@gmail.com> |
fs/proc: add Kthread flag to /proc/$pid/status
The command `ps -ef ` and `top -c` mark kernel thread by '[' and ']', but sometimes the result is not correct. The task->flags in /proc/$pid/stat is g
fs/proc: add Kthread flag to /proc/$pid/status
The command `ps -ef ` and `top -c` mark kernel thread by '[' and ']', but sometimes the result is not correct. The task->flags in /proc/$pid/stat is good, but we need remember the value of PF_KTHREAD is 0x00200000 and convert dec to hex. If we have no binary program and shell script which read /proc/$pid/stat, we can know it directly by `cat /proc/$pid/status`.
Link: https://lkml.kernel.org/r/20230416052404.2920-1-fullspring2018@gmail.com Signed-off-by: Chunguang Wu <fullspring2018@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.24, v6.1.23, v6.1.22, v6.1.21 |
|
#
bd23024b |
| 21-Mar-2023 |
Tomas Mudrunka <tomas.mudrunka@gmail.com> |
mm/memtest: add results of early memtest to /proc/meminfo
Currently the memtest results were only presented in dmesg.
When running a large fleet of devices without ECC RAM it's currently not easy t
mm/memtest: add results of early memtest to /proc/meminfo
Currently the memtest results were only presented in dmesg.
When running a large fleet of devices without ECC RAM it's currently not easy to do bulk monitoring for memory corruption. You have to parse dmesg, but that's a ring buffer so the error might disappear after some time. In general I do not consider dmesg to be a great API to query RAM status.
In several companies I've seen such errors remain undetected and cause issues for way too long. So I think it makes sense to provide a monitoring API, so that we can safely detect and act upon them.
This adds /proc/meminfo entry which can be easily used by scripts.
Link: https://lkml.kernel.org/r/20230321103430.7130-1-tomas.mudrunka@gmail.com Signed-off-by: Tomas Mudrunka <tomas.mudrunka@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.20 |
|
#
d2ea66a6 |
| 14-Mar-2023 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation: fs/proc: corrections and update
Update URL for the latest online version of this document. Correct "files" to "fields" in a few places. Update /proc/scsi, /proc/stat, and /proc/fs/ext
Documentation: fs/proc: corrections and update
Update URL for the latest online version of this document. Correct "files" to "fields" in a few places. Update /proc/scsi, /proc/stat, and /proc/fs/ext4 information. Drop /usr/src/ from the location of the kernel source tree.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230314060347.605-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
show more ...
|
Revision tags: v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1 |
|
#
8b0a211d |
| 09-Dec-2022 |
Yang Yang <yang.yang29@zte.com.cn> |
docs: proc.rst: add softnet_stat to /proc/net table
/proc/net/softnet_stat exists for a long time, but proc.rst miss it. Softnet_stat shows some statistics of struct softnet_data of online CPUs. Str
docs: proc.rst: add softnet_stat to /proc/net table
/proc/net/softnet_stat exists for a long time, but proc.rst miss it. Softnet_stat shows some statistics of struct softnet_data of online CPUs. Struct softnet_data manages incoming and output packets on per-CPU queues. Note that fastroute and cpu_collision in softnet_stat are obsolete and their value is always 0.
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Zhang Yunkai <zhang.yunkai@zte.com.cn> Link: https://lore.kernel.org/r/202212091421536982085@zte.com.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>
show more ...
|
Revision tags: v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79 |
|
#
d09e8ca6 |
| 14-Nov-2022 |
Pasha Tatashin <pasha.tatashin@soleen.com> |
mm: anonymous shared memory naming
Since commit 9a10064f5625 ("mm: add a field to store names for private anonymous memory"), name for private anonymous memory, but not shared anonymous, can be set.
mm: anonymous shared memory naming
Since commit 9a10064f5625 ("mm: add a field to store names for private anonymous memory"), name for private anonymous memory, but not shared anonymous, can be set. However, naming shared anonymous memory just as useful for tracking purposes.
Extend the functionality to be able to set names for shared anon.
There are two ways to create anonymous shared memory, using memfd or directly via mmap(): 1. fd = memfd_create(...) mem = mmap(..., MAP_SHARED, fd, ...) 2. mem = mmap(..., MAP_SHARED | MAP_ANONYMOUS, -1, ...)
In both cases the anonymous shared memory is created the same way by mapping an unlinked file on tmpfs.
The memfd way allows to give a name for anonymous shared memory, but not useful when parts of shared memory require to have distinct names.
Example use case: The VMM maps VM memory as anonymous shared memory (not private because VMM is sandboxed and drivers are running in their own processes). However, the VM tells back to the VMM how parts of the memory are actually used by the guest, how each of the segments should be backed (i.e. 4K pages, 2M pages), and some other information about the segments. The naming allows us to monitor the effective memory footprint for each of these segments from the host without looking inside the guest.
Sample output: /* Create shared anonymous segmenet */ anon_shmem = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); /* Name the segment: "MY-NAME" */ rv = prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, anon_shmem, SIZE, "MY-NAME");
cat /proc/<pid>/maps (and smaps): 7fc8e2b4c000-7fc8f2b4c000 rw-s 00000000 00:01 1024 [anon_shmem:MY-NAME]
If the segment is not named, the output is: 7fc8e2b4c000-7fc8f2b4c000 rw-s 00000000 00:01 1024 /dev/zero (deleted)
Link: https://lkml.kernel.org/r/20221115020602.804224-1-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Colin Cross <ccross@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: xu xin <cgel.zte@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70 |
|
#
f1f1f256 |
| 22-Sep-2022 |
Ivan Babrou <ivan@cloudflare.com> |
proc: report open files as size in stat() for /proc/pid/fd
Many monitoring tools include open file count as a metric. Currently the only way to get this number is to enumerate the files in /proc/pi
proc: report open files as size in stat() for /proc/pid/fd
Many monitoring tools include open file count as a metric. Currently the only way to get this number is to enumerate the files in /proc/pid/fd.
The problem with the current approach is that it does many things people generally don't care about when they need one number for a metric. In our tests for cadvisor, which reports open file counts per cgroup, we observed that reading the number of open files is slow. Out of 35.23% of CPU time spent in `proc_readfd_common`, we see 29.43% spent in `proc_fill_cache`, which is responsible for filling dentry info. Some of this extra time is spinlock contention, but it's a contention for the lock we don't want to take to begin with.
We considered putting the number of open files in /proc/pid/status. Unfortunately, counting the number of fds involves iterating the open_files bitmap, which has a linear complexity in proportion with the number of open files (bitmap slots really, but it's close). We don't want to make /proc/pid/status any slower, so instead we put this info in /proc/pid/fd as a size member of the stat syscall result. Previously the reported number was zero, so there's very little risk of breaking anything, while still providing a somewhat logical way to count the open files with a fallback if it's zero.
RFC for this patch included iterating open fds under RCU. Thanks to Frank Hofmann for the suggestion to use the bitmap instead.
Previously:
``` $ sudo stat /proc/1/fd | head -n2 File: /proc/1/fd Size: 0 Blocks: 0 IO Block: 1024 directory ```
With this patch:
``` $ sudo stat /proc/1/fd | head -n2 File: /proc/1/fd Size: 65 Blocks: 0 IO Block: 1024 directory ```
Correctness check:
``` $ sudo ls /proc/1/fd | wc -l 65 ```
I added the docs for /proc/<pid>/fd while I'm at it.
[ivan@cloudflare.com: use bitmap_weight() to count the bits] Link: https://lkml.kernel.org/r/20221018045844.37697-1-ivan@cloudflare.com [akpm@linux-foundation.org: include linux/bitmap.h for bitmap_weight()] [ivan@cloudflare.com: return errno from proc_fd_getattr() instead of setting negative size] Link: https://lkml.kernel.org/r/20221024173140.30673-1-ivan@cloudflare.com Link: https://lkml.kernel.org/r/20220922224027.59266-1-ivan@cloudflare.com Signed-off-by: Ivan Babrou <ivan@cloudflare.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name> Cc: David Hildenbrand <david@redhat.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
df61e945 |
| 02-Nov-2022 |
Chen Linxuan <chenlinxuan@uniontech.com> |
Documentation: update the description of TracerPid in procfs.rst
When the tracer of process is outside of current pid namespace, field `TracerPid` in /proc/<pid>/status will be 0, too, just like thi
Documentation: update the description of TracerPid in procfs.rst
When the tracer of process is outside of current pid namespace, field `TracerPid` in /proc/<pid>/status will be 0, too, just like this process not have been traced.
This is because that function `task_pid_nr_ns` used to get the pid of tracer will return 0 in this situation.
Co-authored-by: Yuan Haisheng <heysion@deepin.com> Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com> Link: https://lore.kernel.org/r/20221102081517.19770-1-chenlinxuan@uniontech.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
show more ...
|
Revision tags: v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63 |
|
#
ebc97a52 |
| 22-Aug-2022 |
Yosry Ahmed <yosryahmed@google.com> |
mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
We keep track of several kernel memory stats (total kernel memory, page tables, stack, vmalloc, etc) on multiple levels (global, pe
mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.
We keep track of several kernel memory stats (total kernel memory, page tables, stack, vmalloc, etc) on multiple levels (global, per-node, per-memcg, etc). These stats give insights to users to how much memory is used by the kernel and for what purposes.
Currently, memory used by KVM mmu is not accounted in any of those kernel memory stats. This patch series accounts the memory pages used by KVM for page tables in those stats in a new NR_SECONDARY_PAGETABLE stat. This stat can be later extended to account for other types of secondary pages tables (e.g. iommu page tables).
KVM has a decent number of large allocations that aren't for page tables, but for most of them, the number/size of those allocations scales linearly with either the number of vCPUs or the amount of memory assigned to the VM. KVM's secondary page table allocations do not scale linearly, especially when nested virtualization is in use.
From a KVM perspective, NR_SECONDARY_PAGETABLE will scale with KVM's per-VM pages_{4k,2m,1g} stats unless the guest is doing something bizarre (e.g. accessing only 4kb chunks of 2mb pages so that KVM is forced to allocate a large number of page tables even though the guest isn't accessing that much memory). However, someone would need to either understand how KVM works to make that connection, or know (or be told) to go look at KVM's stats if they're running VMs to better decipher the stats.
Furthermore, having NR_PAGETABLE side-by-side with NR_SECONDARY_PAGETABLE is informative. For example, when backing a VM with THP vs. HugeTLB, NR_SECONDARY_PAGETABLE is roughly the same, but NR_PAGETABLE is an order of magnitude higher with THP. So having this stat will at the very least prove to be useful for understanding tradeoffs between VM backing types, and likely even steer folks towards potential optimizations.
The original discussion with more details about the rationale: https://lore.kernel.org/all/87ilqoi77b.wl-maz@kernel.org
This stat will be used by subsequent patches to count KVM mmu memory usage.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220823004639.2387269-2-yosryahmed@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
Revision tags: v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49 |
|
#
cb55b838 |
| 16-Jun-2022 |
Yang Shi <shy828301@gmail.com> |
doc: proc: fix the description to THPeligible
The THPeligible bit shows 1 if and only if the VMA is eligible for allocating THP and the THP is also PMD mappable. Some misaligned file VMAs may be el
doc: proc: fix the description to THPeligible
The THPeligible bit shows 1 if and only if the VMA is eligible for allocating THP and the THP is also PMD mappable. Some misaligned file VMAs may be eligible for allocating THP but the THP can't be mapped by PMD. Make this more explicitly to avoid ambiguity.
Link: https://lkml.kernel.org/r/20220616174840.1202070-8-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Zach O'Keefe <zokeefe@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
30934843 |
| 20-Jun-2022 |
Vincent Whitchurch <vincent.whitchurch@axis.com> |
mm/smaps: add Pss_Dirty
Pss is the sum of the sizes of clean and dirty private pages, and the proportional sizes of clean and dirty shared pages:
Private = Private_Dirty + Private_Clean Shared_Pr
mm/smaps: add Pss_Dirty
Pss is the sum of the sizes of clean and dirty private pages, and the proportional sizes of clean and dirty shared pages:
Private = Private_Dirty + Private_Clean Shared_Proportional = Shared_Dirty_Proportional + Shared_Clean_Proportional Pss = Private + Shared_Proportional
The Shared*Proportional fields are not present in smaps, so it is not always possible to determine how much of the Pss is from dirty pages and how much is from clean pages. This information can be useful for measuring memory usage for the purpose of optimisation, since clean pages can usually be discarded by the kernel immediately while dirty pages cannot.
The smaps routines in the kernel already have access to this data, so add a Pss_Dirty to show it to userspace. Pss_Clean is not added since it can be calculated from Pss and Pss_Dirty.
Link: https://lkml.kernel.org/r/20220620081251.2928103-1-vincent.whitchurch@axis.com Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
ee65728e |
| 27-Jun-2022 |
Mike Rapoport <rppt@linux.ibm.com> |
docs: rename Documentation/vm to Documentation/mm
so it will be consistent with code mm directory and with Documentation/admin-guide/mm and won't be confused with virtual machines.
Signed-off-by: M
docs: rename Documentation/vm to Documentation/mm
so it will be consistent with code mm directory and with Documentation/admin-guide/mm and won't be confused with virtual machines.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Tested-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Jonathan Corbet <corbet@lwn.net> Acked-by: Wu XiangCheng <bobwxc@email.cn>
show more ...
|
Revision tags: v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18 |
|
#
f6498b77 |
| 19-May-2022 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: zswap: add basic meminfo and vmstat coverage
Currently it requires poking at debugfs to figure out the size and population of the zswap cache on a host. There are no counters for reads and writ
mm: zswap: add basic meminfo and vmstat coverage
Currently it requires poking at debugfs to figure out the size and population of the zswap cache on a host. There are no counters for reads and writes against the cache. As a result, it's difficult to understand zswap behavior on production systems.
Print zswap memory consumption and how many pages are zswapped out in /proc/meminfo. Count zswapouts and zswapins in /proc/vmstat.
Link: https://lkml.kernel.org/r/20220510152847.230957-6-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
39799b64 |
| 19-May-2022 |
Johannes Weiner <hannes@cmpxchg.org> |
Documentation: filesystems: proc: update meminfo section
Patch series "zswap: accounting & cgroup control", v2.
Zswap can consume nearly a quarter of RAM in the default configuration, yet it's neit
Documentation: filesystems: proc: update meminfo section
Patch series "zswap: accounting & cgroup control", v2.
Zswap can consume nearly a quarter of RAM in the default configuration, yet it's neither listed in /proc/meminfo, nor is it accounted and manageable on a per-cgroup basis.
This makes reasoning about the memory situation on a host in general rather difficult. On shared/cgrouped hosts, the consequences are worse. First, workloads can escape memory containment and cause resource priority inversions: a lo-pri group can fill the global zswap pool and force a hi-pri group out to disk. Second, not all workloads benefit from zswap equally. Some even suffer when memory contents compress poorly, and are better off going to disk swap directly. On a host with mixed workloads, it's currently not possible to enable zswap for one workload but not for the other.
This series implements the missing global accounting as well as cgroup tracking & control for zswap backing memory:
- Patch 1 refreshes the very out-of-date meminfo documentation in Documentation/filesystems/proc.rst.
- Patches 2-4 clean up related and adjacent options in Kconfig. Not actual dependencies, just things I noticed during development.
- Patch 5 adds meminfo and vmstat coverage for zswap consumption and activity.
- Patch 6 implements per-cgroup tracking & control of zswap memory.
This patch (of 6):
Add new entries. Minor corrections and cleanups.
[hannes@cmpxchg.org: fix htmldocs warnings] Link: https://lkml.kernel.org/r/Ynve8dg4zJyhH2gW@cmpxchg.org [hannes@cmpxchg.org: change `Unevictable' wording, per David] Link: https://lkml.kernel.org/r/YnwFraZlVWQoCjz3@cmpxchg.org Link: https://lkml.kernel.org/r/20220510152847.230957-1-hannes@cmpxchg.org Link: https://lkml.kernel.org/r/20220510152847.230957-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.41 |
|
#
e24ccaaf |
| 15-May-2022 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
block: remove last remaining traces of IDE documentation
The last traces of the IDE driver went away in commit b7fb14d3ac63 ("ide: remove the legacy ide driver") but it left behind some traces of ol
block: remove last remaining traces of IDE documentation
The last traces of the IDE driver went away in commit b7fb14d3ac63 ("ide: remove the legacy ide driver") but it left behind some traces of old documentation.
As luck would have it Randy and I would submit similar changes within a week of each other to address this. As Randy's commit is in the doc tree already - this delta is just the stuff my removal contained that was not in Randy's IDE doc removal.
Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: https://lore.kernel.org/all/20220427165917.GE12977@windriver.com [phil@philpotter.co.uk: removed diffs already added by others] Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20220515205833.944139-5-phil@philpotter.co.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15 |
|
#
9a10064f |
| 14-Jan-2022 |
Colin Cross <ccross@google.com> |
mm: add a field to store names for private anonymous memory
In many userspace applications, and especially in VM based applications like Android uses heavily, there are multiple different allocators
mm: add a field to store names for private anonymous memory
In many userspace applications, and especially in VM based applications like Android uses heavily, there are multiple different allocators in use. At a minimum there is libc malloc and the stack, and in many cases there are libc malloc, the stack, direct syscalls to mmap anonymous memory, and multiple VM heaps (one for small objects, one for big objects, etc.). Each of these layers usually has its own tools to inspect its usage; malloc by compiling a debug version, the VM through heap inspection tools, and for direct syscalls there is usually no way to track them.
On Android we heavily use a set of tools that use an extended version of the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped in userspace and slice their usage by process, shared (COW) vs. unique mappings, backing, etc. This can account for real physical memory usage even in cases like fork without exec (which Android uses heavily to share as many private COW pages as possible between processes), Kernel SamePage Merging, and clean zero pages. It produces a measurement of the pages that only exist in that process (USS, for unique), and a measurement of the physical memory usage of that process with the cost of shared pages being evenly split between processes that share them (PSS).
If all anonymous memory is indistinguishable then figuring out the real physical memory usage (PSS) of each heap requires either a pagemap walking tool that can understand the heap debugging of every layer, or for every layer's heap debugging tools to implement the pagemap walking logic, in which case it is hard to get a consistent view of memory across the whole system.
Tracking the information in userspace leads to all sorts of problems. It either needs to be stored inside the process, which means every process has to have an API to export its current heap information upon request, or it has to be stored externally in a filesystem that somebody needs to clean up on crashes. It needs to be readable while the process is still running, so it has to have some sort of synchronization with every layer of userspace. Efficiently tracking the ranges requires reimplementing something like the kernel vma trees, and linking to it from every layer of userspace. It requires more memory, more syscalls, more runtime cost, and more complexity to separately track regions that the kernel is already tracking.
This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a userspace-provided name for anonymous vmas. The names of named anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:<name>].
Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name)
Setting the name to NULL clears it. The name length limit is 80 bytes including NUL-terminator and is checked to contain only printable ascii characters (including space), except '[',']','\','$' and '`'.
Ascii strings are being used to have a descriptive identifiers for vmas, which can be understood by the users reading /proc/pid/maps or /proc/pid/smaps. Names can be standardized for a given system and they can include some variable parts such as the name of the allocator or a library, tid of the thread using it, etc.
The name is stored in a pointer in the shared union in vm_area_struct that points to a null terminated string. Anonymous vmas with the same name (equivalent strings) and are otherwise mergeable will be merged. The name pointers are not shared between vmas even if they contain the same name. The name pointer is stored in a union with fields that are only used on file-backed mappings, so it does not increase memory usage.
CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this feature. It keeps the feature disabled by default to prevent any additional memory overhead and to avoid confusing procfs parsers on systems which are not ready to support named anonymous vmas.
The patch is based on the original patch developed by Colin Cross, more specifically on its latest version [1] posted upstream by Sumit Semwal. It used a userspace pointer to store vma names. In that design, name pointers could be shared between vmas. However during the last upstreaming attempt, Kees Cook raised concerns [2] about this approach and suggested to copy the name into kernel memory space, perform validity checks [3] and store as a string referenced from vm_area_struct.
One big concern is about fork() performance which would need to strdup anonymous vma names. Dave Hansen suggested experimenting with worst-case scenario of forking a process with 64k vmas having longest possible names [4]. I ran this experiment on an ARM64 Android device and recorded a worst-case regression of almost 40% when forking such a process.
This regression is addressed in the followup patch which replaces the pointer to a name with a refcounted structure that allows sharing the name pointer between vmas of the same name. Instead of duplicating the string during fork() or when splitting a vma it increments the refcount.
[1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/ [2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/ [3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/ [4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/
Changes for prctl(2) manual page (in the options section):
PR_SET_VMA Sets an attribute specified in arg2 for virtual memory areas starting from the address specified in arg3 and spanning the size specified in arg4. arg5 specifies the value of the attribute to be set. Note that assigning an attribute to a virtual memory area might prevent it from being merged with adjacent virtual memory areas due to the difference in that attribute's value.
Currently, arg2 must be one of:
PR_SET_VMA_ANON_NAME Set a name for anonymous virtual memory areas. arg5 should be a pointer to a null-terminated string containing the name. The name length including null byte cannot exceed 80 bytes. If arg5 is NULL, the name of the appropriate anonymous virtual memory areas will be reset. The name can contain only printable ascii characters (including space), except '[',']','\','$' and '`'.
This feature is available only if the kernel is built with the CONFIG_ANON_VMA_NAME option enabled.
[surenb@google.com: docs: proc.rst: /proc/PID/maps: fix malformed table] Link: https://lkml.kernel.org/r/20211123185928.2513763-1-surenb@google.com [surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy, added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the work here was done by Colin Cross, therefore, with his permission, keeping him as the author]
Link: https://lkml.kernel.org/r/20211019215511.3771969-2-surenb@google.com Signed-off-by: Colin Cross <ccross@google.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Glauber <jan.glauber@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Landley <rob@landley.net> Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com> Cc: Shaohua Li <shli@fusionio.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10 |
|
#
b0b719ce |
| 06-Oct-2021 |
Christoph Anton Mitterer <mail@christoph.anton.mitterer.name> |
docs: proc.rst: mountinfo: align columns
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name> Link: https://lore.kernel.org/r/20211007040001.103413-3-mail@christoph.anton.mit
docs: proc.rst: mountinfo: align columns
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name> Link: https://lore.kernel.org/r/20211007040001.103413-3-mail@christoph.anton.mitterer.name Signed-off-by: Jonathan Corbet <corbet@lwn.net>
show more ...
|
#
ff9c3d43 |
| 06-Oct-2021 |
Christoph Anton Mitterer <mail@christoph.anton.mitterer.name> |
docs: proc.rst: mountinfo: improved field numbering
Without reading thoroughly, one could easily oversee that there may be several fields after #6. Made it more clearly by changing the field numberi
docs: proc.rst: mountinfo: improved field numbering
Without reading thoroughly, one could easily oversee that there may be several fields after #6. Made it more clearly by changing the field numbering.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name> Link: https://lore.kernel.org/r/20211007040001.103413-2-mail@christoph.anton.mitterer.name Signed-off-by: Jonathan Corbet <corbet@lwn.net>
show more ...
|
Revision tags: v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49 |
|
#
3845f256 |
| 30-Jun-2021 |
Kalesh Singh <kaleshsingh@google.com> |
procfs/dmabuf: add inode number to /proc/*/fdinfo
And 'ino' field to /proc/<pid>/fdinfo/<FD> and /proc/<pid>/task/<tid>/fdinfo/<FD>.
The inode numbers can be used to uniquely identify DMA buffers i
procfs/dmabuf: add inode number to /proc/*/fdinfo
And 'ino' field to /proc/<pid>/fdinfo/<FD> and /proc/<pid>/task/<tid>/fdinfo/<FD>.
The inode numbers can be used to uniquely identify DMA buffers in user space and avoids a dependency on /proc/<pid>/fd/* when accounting per-process DMA buffer sizes.
Link: https://lkml.kernel.org/r/20210308170651.919148-2-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Christian König <christian.koenig@amd.com> Cc: Jann Horn <jannh@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hridya Valsaraju <hridya@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Cc: Szabolcs Nagy <szabolcs.nagy@arm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Michel Lespinasse <walken@google.com> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: Andrei Vagin <avagin@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
8d719afc |
| 30-Jun-2021 |
Mike Rapoport <rppt@linux.ibm.com> |
docs: proc.rst: meminfo: briefly describe gaps in memory accounting
Add a paragraph that explains that it may happen that the counters in /proc/meminfo do not add up to the overall memory usage.
Li
docs: proc.rst: meminfo: briefly describe gaps in memory accounting
Add a paragraph that explains that it may happen that the counters in /proc/meminfo do not add up to the overall memory usage.
Link: https://lkml.kernel.org/r/20210421061127.1182723-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20 |
|
#
1f7faca2 |
| 01-Mar-2021 |
Peter Xu <peterx@redhat.com> |
docs: filesystem: Update smaps vm flag list to latest
We've missed a few documentation when adding new VM_* flags. Add the missing pieces so they'll be in sync now.
Signed-off-by: Peter Xu <peterx
docs: filesystem: Update smaps vm flag list to latest
We've missed a few documentation when adding new VM_* flags. Add the missing pieces so they'll be in sync now.
Signed-off-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20210302000646.432358-1-peterx@redhat.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
show more ...
|
Revision tags: v5.10.19, v5.4.101, v5.10.18 |
|
#
f37a15ea |
| 23-Feb-2021 |
Randy Dunlap <rdunlap@infradead.org> |
docs: proc.rst: fix indentation warning
Fix indentation snafu in proc.rst as reported by Stephen.
next-20210219/Documentation/filesystems/proc.rst:697: WARNING: Unexpected indentation.
Fixes: 93ea
docs: proc.rst: fix indentation warning
Fix indentation snafu in proc.rst as reported by Stephen.
next-20210219/Documentation/filesystems/proc.rst:697: WARNING: Unexpected indentation.
Fixes: 93ea4a0b8fce ("Documentation: proc.rst: add more about the 6 fields in loadavg") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/20210223060418.21443-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
show more ...
|
#
93ea4a0b |
| 21-Feb-2021 |
Randy Dunlap <rdunlap@infradead.org> |
Documentation: proc.rst: add more about the 6 fields in loadavg
Address Jon's feedback on the previous patch by adding info about field separators in the /proc/loadavg file.
Signed-off-by: Randy Du
Documentation: proc.rst: add more about the 6 fields in loadavg
Address Jon's feedback on the previous patch by adding info about field separators in the /proc/loadavg file.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210222034729.22350-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
show more ...
|