Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48 |
|
#
e7716c74 |
| 21-Aug-2023 |
Philipp Stanner <pstanner@redhat.com> |
xarray: Document necessary flag in alloc functions
Adds a new line to the docstrings of functions wrapping __xa_alloc() and __xa_alloc_cyclic(), informing about the necessity of flag XA_FLAGS_ALLOC
xarray: Document necessary flag in alloc functions
Adds a new line to the docstrings of functions wrapping __xa_alloc() and __xa_alloc_cyclic(), informing about the necessity of flag XA_FLAGS_ALLOC being set previously.
The documentation so far says that functions wrapping __xa_alloc() and __xa_alloc_cyclic() are supposed to return either -ENOMEM or -EBUSY in case of an error. If the xarray has been initialized without the flag XA_FLAGS_ALLOC, however, they fail with a different, undocumented error code.
As hinted at in Documentation/core-api/xarray.rst, wrappers around these functions should only be invoked when the flag has been set. The functions' documentation should reflect that as well.
Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8 |
|
#
5649d113 |
| 18-Jan-2023 |
Yang Yang <yang.yang29@zte.com.cn> |
swap_state: update shadow_nodes for anonymous page
Shadow_nodes is for shadow nodes reclaiming of workingset handling, it is updated when page cache add or delete since long time ago workingset only
swap_state: update shadow_nodes for anonymous page
Shadow_nodes is for shadow nodes reclaiming of workingset handling, it is updated when page cache add or delete since long time ago workingset only supported page cache. But when workingset supports anonymous page detection, we missied updating shadow nodes for it. This caused that shadow nodes of anonymous page will never be reclaimd by scan_shadow_nodes() even they use much memory and system memory is tense.
So update shadow_nodes of anonymous page when swap cache is add or delete by calling xas_set_update(..workingset_update_node).
Link: https://lkml.kernel.org/r/202301182013032211005@zte.com.cn Fixes: aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU") Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53 |
|
#
1dd685c4 |
| 06-Jul-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
XArray: Add calls to might_alloc()
Catch bogus GFP flags deterministically, instead of occasionally when we actually have to allocate memory.
Reported-by: Nikolay Borisov <nborisov@suse.com> Signed
XArray: Add calls to might_alloc()
Catch bogus GFP flags deterministically, instead of occasionally when we actually have to allocate memory.
Reported-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46 |
|
#
69a37a8b |
| 08-Jun-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm/huge_memory: Fix xarray node memory leak
If xas_split_alloc() fails to allocate the necessary nodes to complete the xarray entry split, it sets the xa_state to -ENOMEM, which xas_nomem() then int
mm/huge_memory: Fix xarray node memory leak
If xas_split_alloc() fails to allocate the necessary nodes to complete the xarray entry split, it sets the xa_state to -ENOMEM, which xas_nomem() then interprets as "Please allocate more memory", not as "Please free any unnecessary memory" (which was the intended outcome). It's confusing to use xas_nomem() to free memory in this context, so call xas_destroy() instead.
Reported-by: syzbot+9e27a75a8c24f3fe75c1@syzkaller.appspotmail.com Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31 |
|
#
9bbdc0f3 |
| 22-Mar-2022 |
Muchun Song <songmuchun@bytedance.com> |
xarray: use kmem_cache_alloc_lru to allocate xa_node
The workingset will add the xa_node to the shadow_nodes list. So the allocation of xa_node should be done by kmem_cache_alloc_lru(). Using xas_
xarray: use kmem_cache_alloc_lru to allocate xa_node
The workingset will add the xa_node to the shadow_nodes list. So the allocation of xa_node should be done by kmem_cache_alloc_lru(). Using xas_set_lru() to pass the list_lru which we want to insert xa_node into to set up the xa_node reclaim context correctly.
Link: https://lkml.kernel.org/r/20220228122126.37293-9-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20 |
|
#
22f56b8e |
| 03-Feb-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
XArray: Include bitmap.h from xarray.h
xas_find_chunk() calls find_next_bit(), which is defined in find.h, included from bitmap.h. Inside the kernel, this isn't a problem because bitmap.h is includ
XArray: Include bitmap.h from xarray.h
xas_find_chunk() calls find_next_bit(), which is defined in find.h, included from bitmap.h. Inside the kernel, this isn't a problem because bitmap.h is included from cpumask.h which is dragged in (eventually) by gfp.h. When building the test-suite, that doesn't happen, so we need to include bitmap.h explicitly.
Fixes: 4ade0818cf04 ("tools: sync tools/bitmap with mother linux") Reported-by: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14 |
|
#
25a8de7f |
| 27-Aug-2021 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
XArray: Add xas_advance()
Add a new helper function to help iterate over multi-index entries.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de
XArray: Add xas_advance()
Add a new helper function to help iterate over multi-index entries.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
show more ...
|
Revision tags: v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30 |
|
#
df59d0a4 |
| 30-Mar-2021 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
XArray: Add xa_limit_16b
A 16-bit limit is a more common limit than I had realised. Make it generally available.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
Revision tags: v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16 |
|
#
8fc75643 |
| 15-Oct-2020 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
XArray: add xas_split
In order to use multi-index entries for huge pages in the page cache, we need to be able to split a multi-index entry (eg if a file is truncated in the middle of a huge page en
XArray: add xas_split
In order to use multi-index entries for huge pages in the page cache, we need to be able to split a multi-index entry (eg if a file is truncated in the middle of a huge page entry). This version does not support splitting more than one level of the tree at a time. This is an acceptable limitation for the page cache as we do not expect to support order-12 pages in the near future.
[akpm@linux-foundation.org: export xas_split_alloc() to modules] [willy@infradead.org: fix xarray split] Link: https://lkml.kernel.org/r/20200910175450.GV6583@casper.infradead.org [willy@infradead.org: fix xarray] Link: https://lkml.kernel.org/r/20201001233943.GW20115@casper.infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Qian Cai <cai@lca.pw> Cc: Song Liu <songliubraving@fb.com> Link: https://lkml.kernel.org/r/20200903183029.14930-3-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
57417ceb |
| 15-Oct-2020 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
XArray: add xa_get_order
Patch series "Fix read-only THP for non-tmpfs filesystems".
As described more verbosely in the [3/3] changelog, we can inadvertently put an order-0 page in the page cache w
XArray: add xa_get_order
Patch series "Fix read-only THP for non-tmpfs filesystems".
As described more verbosely in the [3/3] changelog, we can inadvertently put an order-0 page in the page cache which occupies 512 consecutive entries. Users are running into this if they enable the READ_ONLY_THP_FOR_FS config option; see https://bugzilla.kernel.org/show_bug.cgi?id=206569 and Qian Cai has also reported it here: https://lore.kernel.org/lkml/20200616013309.GB815@lca.pw/
This is a rather intrusive way of fixing the problem, but has the advantage that I've actually been testing it with the THP patches, which means that it sees far more use than it does upstream -- indeed, Song has been entirely unable to reproduce it. It also has the advantage that it removes a few patches from my gargantuan backlog of THP patches.
This patch (of 3):
This function returns the order of the entry at the index. We need this because there isn't space in the shadow entry to encode its order.
[akpm@linux-foundation.org: export xa_get_order to modules]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Qian Cai <cai@lca.pw> Cc: Song Liu <songliubraving@fb.com> Link: https://lkml.kernel.org/r/20200903183029.14930-1-willy@infradead.org Link: https://lkml.kernel.org/r/20200903183029.14930-2-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8 |
|
#
ca7b639e |
| 02-Aug-2020 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
XArray: Fix xas_reload for multi-index entries
xas_reload() was only checking that the head entry was still at the head index. If the entry has been split, that's not enough as there may be a diffe
XArray: Fix xas_reload for multi-index entries
xas_reload() was only checking that the head entry was still at the head index. If the entry has been split, that's not enough as there may be a different entry at the specified index now.
Solve this by checking the slot for the requested index instead of the head index.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
#
f82cd2f0 |
| 18-Aug-2020 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
XArray: Add private interface for workingset node deletion
Move the tricky bits of dealing with the XArray from the workingset code to the XArray. Make it clear in the documentation that this is a
XArray: Add private interface for workingset node deletion
Move the tricky bits of dealing with the XArray from the workingset code to the XArray. Make it clear in the documentation that this is a private interface, and only export it for the benefit of the test suite.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
#
8446466c |
| 06-Aug-2020 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
XArray: Fix xas_for_each_conflict documentation
At one point, xas_for_each_conflict() was going to work this way, and I forgot to update the documentation when I changed my mind.
Signed-off-by: Mat
XArray: Fix xas_for_each_conflict documentation
At one point, xas_for_each_conflict() was going to work this way, and I forgot to update the documentation when I changed my mind.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2 |
|
#
01f39c1c |
| 07-Jun-2020 |
Manfred Spraul <manfred@colorfullife.com> |
xarray.h: correct return code documentation for xa_store_{bh,irq}()
__xa_store() and xa_store() document that the functions can fail, and that the return code can be an xa_err() encoded error code.
xarray.h: correct return code documentation for xa_store_{bh,irq}()
__xa_store() and xa_store() document that the functions can fail, and that the return code can be an xa_err() encoded error code.
xa_store_bh() and xa_store_irq() do not document that the functions can fail and that they can also return xa_err() encoded error codes.
Thus: Update the documentation.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200430111424.16634-1-manfred@colorfullife.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26 |
|
#
7e934cf5 |
| 12-Mar-2020 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
xarray: Fix early termination of xas_for_each_marked
xas_for_each_marked() is using entry == NULL as a termination condition of the iteration. When xas_for_each_marked() is used protected only by RC
xarray: Fix early termination of xas_for_each_marked
xas_for_each_marked() is using entry == NULL as a termination condition of the iteration. When xas_for_each_marked() is used protected only by RCU, this can however race with xas_store(xas, NULL) in the following way:
TASK1 TASK2 page_cache_delete() find_get_pages_range_tag() xas_for_each_marked() xas_find_marked() off = xas_find_chunk()
xas_store(&xas, NULL) xas_init_marks(&xas); ... rcu_assign_pointer(*slot, NULL); entry = xa_entry(off);
And thus xas_for_each_marked() terminates prematurely possibly leading to missed entries in the iteration (translating to missing writeback of some pages or a similar problem).
If we find a NULL entry that has been marked, skip it (unless we're trying to allocate an entry).
Reported-by: Jan Kara <jack@suse.cz> CC: stable@vger.kernel.org Fixes: ef8e5717db01 ("page cache: Convert delete_batch to XArray") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3 |
|
#
24a448b1 |
| 06-Dec-2019 |
Chengguang Xu <cgxu519@mykernel.net> |
XArray: Fix incorrect comment in header file
256 means Retry entry and 257 means Zero entry, so fix it.
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Matthew Wilcox (Oracle) <w
XArray: Fix incorrect comment in header file
256 means Retry entry and 257 means Zero entry, so fix it.
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
#
00ed452c |
| 12-Jan-2020 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
XArray: Add xa_for_each_range
This function supports iterating over a range of an array. Also add documentation links for xa_for_each_start().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infrade
XArray: Add xa_for_each_range
This function supports iterating over a range of an array. Also add documentation links for xa_for_each_start().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
#
82a95849 |
| 17-Jan-2020 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
XArray: Add wrappers for nested spinlocks
Some users need to take an xarray lock while holding another xarray lock.
Reported-by: Doug Gilbert <dgilbert@interlog.com> Signed-off-by: Matthew Wilcox (
XArray: Add wrappers for nested spinlocks
Some users need to take an xarray lock while holding another xarray lock.
Reported-by: Doug Gilbert <dgilbert@interlog.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7 |
|
#
13bea898 |
| 14-Oct-2019 |
Randy Dunlap <rdunlap@infradead.org> |
xarray.h: fix kernel-doc warning
Fix (Sphinx) kernel-doc warning in <linux/xarray.h>:
include/linux/xarray.h:232: WARNING: Unexpected indentation.
Link: http://lkml.kernel.org/r/89ba2134-ce23-7c
xarray.h: fix kernel-doc warning
Fix (Sphinx) kernel-doc warning in <linux/xarray.h>:
include/linux/xarray.h:232: WARNING: Unexpected indentation.
Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org Fixes: a3e4d3f97ec8 ("XArray: Redesign xa_alloc API") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12, v5.2.11, v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1, v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6, v5.1.5 |
|
#
7b785645 |
| 24-May-2019 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: fix page cache convergence regression
Since a28334862993 ("page cache: Finish XArray conversion"), on most major Linux distributions, the page cache doesn't correctly transition when the hot dat
mm: fix page cache convergence regression
Since a28334862993 ("page cache: Finish XArray conversion"), on most major Linux distributions, the page cache doesn't correctly transition when the hot data set is changing, and leaves the new pages thrashing indefinitely instead of kicking out the cold ones.
On a freshly booted, freshly ssh'd into virtual machine with 1G RAM running stock Arch Linux:
[root@ham ~]# ./reclaimtest.sh + dd of=workingset-a bs=1M count=0 seek=600 + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + ./mincore workingset-a 153600/153600 workingset-a + dd of=workingset-b bs=1M count=0 seek=600 + cat workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 104029/153600 workingset-a 120086/153600 workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 104029/153600 workingset-a 120268/153600 workingset-b
workingset-b is a 600M file on a 1G host that is otherwise entirely idle. No matter how often it's being accessed, it won't get cached.
While investigating, I noticed that the non-resident information gets aggressively reclaimed - /proc/vmstat::workingset_nodereclaim. This is a problem because a workingset transition like this relies on the non-resident information tracked in the page cache tree of evicted file ranges: when the cache faults are refaults of recently evicted cache, we challenge the existing active set, and that allows a new workingset to establish itself.
Tracing the shrinker that maintains this memory revealed that all page cache tree nodes were allocated to the root cgroup. This is a problem, because 1) the shrinker sizes the amount of non-resident information it keeps to the size of the cgroup's other memory and 2) on most major Linux distributions, only kernel threads live in the root cgroup and everything else gets put into services or session groups:
[root@ham ~]# cat /proc/self/cgroup 0::/user.slice/user-0.slice/session-c1.scope
As a result, we basically maintain no non-resident information for the workloads running on the system, thus breaking the caching algorithm.
Looking through the code, I found the culprit in the above-mentioned patch: when switching from the radix tree to xarray, it dropped the __GFP_ACCOUNT flag from the tree node allocations - the flag that makes sure the allocated memory gets charged to and tracked by the cgroup of the calling process - in this case, the one doing the fault.
To fix this, allow xarray users to specify per-tree flag that makes xarray allocate nodes using __GFP_ACCOUNT. Then restore the page cache tree annotation to request such cgroup tracking for the cache nodes.
With this patch applied, the page cache correctly converges on new workingsets again after just a few iterations:
[root@ham ~]# ./reclaimtest.sh + dd of=workingset-a bs=1M count=0 seek=600 + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + cat workingset-a + ./mincore workingset-a 153600/153600 workingset-a + dd of=workingset-b bs=1M count=0 seek=600 + cat workingset-b + ./mincore workingset-a workingset-b 124607/153600 workingset-a 87876/153600 workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 81313/153600 workingset-a 133321/153600 workingset-b + cat workingset-b + ./mincore workingset-a workingset-b 63036/153600 workingset-a 153600/153600 workingset-b
Cc: stable@vger.kernel.org # 4.20+ Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v5.1.4, v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6, v5.0.5, v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25 |
|
#
962033d5 |
| 20-Feb-2019 |
Matthew Wilcox <willy@infradead.org> |
XArray: Use xa_cmpxchg to implement xa_reserve
Jason feels this is clearer, and it saves a function and an exported symbol.
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Matthew Wilco
XArray: Use xa_cmpxchg to implement xa_reserve
Jason feels this is clearer, and it saves a function and an exported symbol.
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Matthew Wilcox <willy@infradead.org>
show more ...
|
#
b38f6c50 |
| 20-Feb-2019 |
Matthew Wilcox <willy@infradead.org> |
XArray: Fix xa_release in allocating arrays
xa_cmpxchg() was a little too magic in turning ZERO entries into NULL, and would leave the entry set to the ZERO entry instead of releasing it for future
XArray: Fix xa_release in allocating arrays
xa_cmpxchg() was a little too magic in turning ZERO entries into NULL, and would leave the entry set to the ZERO entry instead of releasing it for future use. After careful review of existing users of xa_cmpxchg(), change the semantics so that it does not translate either incoming argument from NULL into ZERO entries.
Add several tests to the test-suite to make sure this problem doesn't come back.
Reported-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Matthew Wilcox <willy@infradead.org>
show more ...
|
Revision tags: v4.19.24, v4.19.23, v4.19.22, v4.19.21 |
|
#
f818b82b |
| 08-Feb-2019 |
Matthew Wilcox <willy@infradead.org> |
XArray: Mark xa_insert and xa_reserve as must_check
If the user doesn't care about the return value from xa_insert(), then they should be using xa_store() instead. The point of xa_reserve() is to g
XArray: Mark xa_insert and xa_reserve as must_check
If the user doesn't care about the return value from xa_insert(), then they should be using xa_store() instead. The point of xa_reserve() is to get the return value early before taking another lock, so this should also be __must_check.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
show more ...
|
Revision tags: v4.19.20, v4.19.19, v4.19.18, v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7, v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18 |
|
#
2fa044e5 |
| 06-Nov-2018 |
Matthew Wilcox <willy@infradead.org> |
XArray: Add cyclic allocation
This differs slightly from the IDR equivalent in five ways.
1. It can allocate up to UINT_MAX instead of being limited to INT_MAX, like xa_alloc(). Also like xa_al
XArray: Add cyclic allocation
This differs slightly from the IDR equivalent in five ways.
1. It can allocate up to UINT_MAX instead of being limited to INT_MAX, like xa_alloc(). Also like xa_alloc(), it will write to the 'id' pointer before placing the entry in the XArray. 2. The 'next' cursor is allocated separately from the XArray instead of being part of the IDR. This saves memory for all the users which do not use the cyclic allocation API and suits some users better. 3. It returns -EBUSY instead of -ENOSPC. 4. It will attempt to wrap back to the minimum value on memory allocation failure as well as on an -EBUSY error, assuming that a user would rather allocate a small ID than suffer an ID allocation failure. 5. It reports whether it has wrapped, which is important to some users.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
show more ...
|
#
a3e4d3f9 |
| 31-Dec-2018 |
Matthew Wilcox <willy@infradead.org> |
XArray: Redesign xa_alloc API
It was too easy to forget to initialise the start index. Add an xa_limit data structure which can be used to pass min & max, and define a couple of special values for
XArray: Redesign xa_alloc API
It was too easy to forget to initialise the start index. Add an xa_limit data structure which can be used to pass min & max, and define a couple of special values for common cases. Also add some more tests cribbed from the IDR test suite. Change the return value from -ENOSPC to -EBUSY to match xa_insert().
Signed-off-by: Matthew Wilcox <willy@infradead.org>
show more ...
|