Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14 |
|
#
6156277d |
| 25-Jan-2024 |
Yosry Ahmed <yosryahmed@google.com> |
mm: zswap: fix missing folio cleanup in writeback race path
commit e3b63e966cac0bf78aaa1efede1827a252815a1d upstream.
In zswap_writeback_entry(), after we get a folio from __read_swap_cache_async()
mm: zswap: fix missing folio cleanup in writeback race path
commit e3b63e966cac0bf78aaa1efede1827a252815a1d upstream.
In zswap_writeback_entry(), after we get a folio from __read_swap_cache_async(), we grab the tree lock again to check that the swap entry was not invalidated and recycled. If it was, we delete the folio we just added to the swap cache and exit.
However, __read_swap_cache_async() returns the folio locked when it is newly allocated, which is always true for this path, and the folio is ref'd. Make sure to unlock and put the folio before returning.
This was discovered by code inspection, probably because this path handles a race condition that should not happen often, and the bug would not crash the system, it will only strand the folio indefinitely.
Link: https://lkml.kernel.org/r/20240125085127.1327013-1-yosryahmed@google.com Fixes: 04fc7816089c ("mm: fix zswap writeback race condition") Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
7a361095 |
| 07-Feb-2024 |
Chengming Zhou <zhouchengming@bytedance.com> |
mm/zswap: invalidate duplicate entry when !zswap_enabled
commit 678e54d4bb9a4822f8ae99690ac131c5d490cdb1 upstream.
We have to invalidate any duplicate entry even when !zswap_enabled since zswap can
mm/zswap: invalidate duplicate entry when !zswap_enabled
commit 678e54d4bb9a4822f8ae99690ac131c5d490cdb1 upstream.
We have to invalidate any duplicate entry even when !zswap_enabled since zswap can be disabled anytime. If the folio store success before, then got dirtied again but zswap disabled, we won't invalidate the old duplicate entry in the zswap_store(). So later lru writeback may overwrite the new data in swapfile.
Link: https://lkml.kernel.org/r/20240208023254.3873823-1-chengming.zhou@linux.dev Fixes: 42c06a0e8ebe ("mm: kill frontswap") Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7 |
|
#
969d63e1 |
| 06-Oct-2023 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: zswap: fix pool refcount bug around shrink_worker()
When a zswap store fails due to the limit, it acquires a pool reference and queues the shrinker. When the shrinker runs, it drops the referen
mm: zswap: fix pool refcount bug around shrink_worker()
When a zswap store fails due to the limit, it acquires a pool reference and queues the shrinker. When the shrinker runs, it drops the reference. However, there can be multiple store attempts before the shrinker wakes up and runs once. This results in reference leaks and eventual saturation warnings for the pool refcount.
Fix this by dropping the reference again when the shrinker is already queued. This ensures one reference per shrinker run.
Link: https://lkml.kernel.org/r/20231006160024.170748-1-hannes@cmpxchg.org Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Chris Mason <clm@fb.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: <stable@vger.kernel.org> [5.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.5.6, v6.5.5 |
|
#
ca56489c |
| 22-Sep-2023 |
Domenico Cerasuolo <cerasuolodomenico@gmail.com> |
mm: zswap: fix potential memory corruption on duplicate store
While stress-testing zswap a memory corruption was happening when writing back pages. __frontswap_store used to check for duplicate ent
mm: zswap: fix potential memory corruption on duplicate store
While stress-testing zswap a memory corruption was happening when writing back pages. __frontswap_store used to check for duplicate entries before attempting to store a page in zswap, this was because if the store fails the old entry isn't removed from the tree. This change removes duplicate entries in zswap_store before the actual attempt.
[cerasuolodomenico@gmail.com: add a warning and a comment, per Johannes] Link: https://lkml.kernel.org/r/20230925130002.1929369-1-cerasuolodomenico@gmail.com Link: https://lkml.kernel.org/r/20230922172211.1704917-1-cerasuolodomenico@gmail.com Fixes: 42c06a0e8ebe ("mm: kill frontswap") Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48 |
|
#
3d2c9087 |
| 21-Aug-2023 |
David Hildenbrand <david@redhat.com> |
mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
Let's simply work on the folio directly and remove the helpers.
Link: https://lkml.kernel.org/r/20230821160849.531668-4-david@redhat.co
mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
Let's simply work on the folio directly and remove the helpers.
Link: https://lkml.kernel.org/r/20230821160849.531668-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Chris Li <chrisl@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.46, v6.1.45, v6.1.44 |
|
#
97157d89 |
| 08-Aug-2023 |
Xiu Jianfeng <xiujianfeng@huawei.com> |
mm: zswap: update comment for struct zswap_entry
Since commit 0bb488498c98 ("mm: zswap: remove zswap_header"), the 'offset' has been replaced by swpentry, update the comment for it, and also add com
mm: zswap: update comment for struct zswap_entry
Since commit 0bb488498c98 ("mm: zswap: remove zswap_header"), the 'offset' has been replaced by swpentry, update the comment for it, and also add comment for 'objcg'.
Link: https://lkml.kernel.org/r/20230808062056.292950-1-xiujianfeng@huaweicloud.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.43 |
|
#
98804a94 |
| 27-Jul-2023 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: zswap: kill zswap_get_swap_cache_page()
The __read_swap_cache_async() interface isn't more difficult to understand than what the helper abstracts. Save the indirection and a level of indentatio
mm: zswap: kill zswap_get_swap_cache_page()
The __read_swap_cache_async() interface isn't more difficult to understand than what the helper abstracts. Save the indirection and a level of indentation for the primary work of the writeback func.
Link: https://lkml.kernel.org/r/20230727162343.1415598-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Barry Song <song.bao.hua@hisilicon.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
73108957 |
| 27-Jul-2023 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: zswap: tighten up entry invalidation
Removing a zswap entry from the tree is tied to an explicit operation that's supposed to drop the base reference: swap invalidation, exclusive load, duplicat
mm: zswap: tighten up entry invalidation
Removing a zswap entry from the tree is tied to an explicit operation that's supposed to drop the base reference: swap invalidation, exclusive load, duplicate store. Don't silently remove the entry on final put, but instead warn if an entry is in tree without reference.
While in that diff context, convert a BUG_ON to a WARN_ON_ONCE. No need to crash on a refcount underflow.
Link: https://lkml.kernel.org/r/20230727162343.1415598-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Cc: Barry Song <song.bao.hua@hisilicon.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
56c67049 |
| 27-Jul-2023 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: zswap: use zswap_invalidate_entry() for duplicates
Patch series "mm: zswap: three cleanups".
Three small cleanups to zswap, the first one suggested by Yosry during the frontswap removal.
This
mm: zswap: use zswap_invalidate_entry() for duplicates
Patch series "mm: zswap: three cleanups".
Three small cleanups to zswap, the first one suggested by Yosry during the frontswap removal.
This patch (of 3):
Minor cleanup. Instead of open-coding the tree deletion and the put, use the zswap_invalidate_entry() convenience helper.
Link: https://lkml.kernel.org/r/20230727162343.1415598-1-hannes@cmpxchg.org Link: https://lkml.kernel.org/r/20230727162343.1415598-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Barry Song <song.bao.hua@hisilicon.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.42, v6.1.41, v6.1.40, v6.1.39 |
|
#
ca54f6d8 |
| 14-Jul-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
zswap: make zswap_load() take a folio
Only convert a few easy parts of this function to use the folio passed in; convert back to struct page for the majority of it. Removes three hidden calls to co
zswap: make zswap_load() take a folio
Only convert a few easy parts of this function to use the folio passed in; convert back to struct page for the majority of it. Removes three hidden calls to compound_head().
Link: https://lkml.kernel.org/r/20230715042343.434588-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
074e3e26 |
| 14-Jul-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
memcg: convert get_obj_cgroup_from_page to get_obj_cgroup_from_folio
As the one caller now has a folio, pass it in and use it. Removes three calls to compound_head().
Link: https://lkml.kernel.org
memcg: convert get_obj_cgroup_from_page to get_obj_cgroup_from_folio
As the one caller now has a folio, pass it in and use it. Removes three calls to compound_head().
Link: https://lkml.kernel.org/r/20230715042343.434588-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
34f4c198 |
| 14-Jul-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
zswap: make zswap_store() take a folio
Patch series "Followup folio conversions for zswap".
With frontswap killed, it's worth converting the zswap_load() and zswap_store() functions to take a folio
zswap: make zswap_store() take a folio
Patch series "Followup folio conversions for zswap".
With frontswap killed, it's worth converting the zswap_load() and zswap_store() functions to take a folio instead of a page pointer. They aren't converted to support large folios, but there are a lot of unnecessary calls to compound_head() that are removed by these patches.
This patch (of 4):
Only convert a few easy parts of this function to use the folio passed in; convert back to struct page for the majority of it. This does remove a few hidden calls to compound_head().
Link: https://lkml.kernel.org/r/20230715042343.434588-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230715042343.434588-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
42c06a0e |
| 17-Jul-2023 |
Johannes Weiner <hannes@cmpxchg.org> |
mm: kill frontswap
The only user of frontswap is zswap, and has been for a long time. Have swap call into zswap directly and remove the indirection.
[hannes@cmpxchg.org: remove obsolete comment, p
mm: kill frontswap
The only user of frontswap is zswap, and has been for a long time. Have swap call into zswap directly and remove the indirection.
[hannes@cmpxchg.org: remove obsolete comment, per Yosry] Link: https://lkml.kernel.org/r/20230719142832.GA932528@cmpxchg.org [fengwei.yin@intel.com: don't warn if none swapcache folio is passed to zswap_load] Link: https://lkml.kernel.org/r/20230810095652.3905184-1-fengwei.yin@intel.com Link: https://lkml.kernel.org/r/20230717160227.GA867137@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35 |
|
#
b8cf32dc |
| 20-Jun-2023 |
Yosry Ahmed <yosryahmed@google.com> |
mm: zswap: multiple zpools support
Support using multiple zpools of the same type in zswap, for concurrency purposes. A fixed number of 32 zpools is suggested by this commit, which was determined e
mm: zswap: multiple zpools support
Support using multiple zpools of the same type in zswap, for concurrency purposes. A fixed number of 32 zpools is suggested by this commit, which was determined empirically. It can be later changed or made into a config option if needed.
On a setup with zswap and zsmalloc, comparing a single zpool to 32 zpools shows improvements in the zsmalloc lock contention, especially on the swap out path.
The following shows the perf analysis of the swapout path when 10 workloads are simultaneously reclaiming and refaulting tmpfs pages. There are some improvements on the swap in path as well, but less significant.
1 zpool:
|--28.99%--zswap_frontswap_store | <snip> | |--8.98%--zpool_map_handle | | | --8.98%--zs_zpool_map | | | --8.95%--zs_map_object | | | --8.38%--_raw_spin_lock | | | --7.39%--queued_spin_lock_slowpath | |--8.82%--zpool_malloc | | | --8.82%--zs_zpool_malloc | | | --8.80%--zs_malloc | | | |--7.21%--_raw_spin_lock | | | | | --6.81%--queued_spin_lock_slowpath <snip>
32 zpools:
|--16.73%--zswap_frontswap_store | <snip> | |--1.81%--zpool_malloc | | | --1.81%--zs_zpool_malloc | | | --1.79%--zs_malloc | | | --0.73%--obj_malloc | |--1.06%--zswap_update_total_size | |--0.59%--zpool_map_handle | | | --0.59%--zs_zpool_map | | | --0.57%--zs_map_object | | | --0.51%--_raw_spin_lock <snip>
Link: https://lkml.kernel.org/r/20230620194644.3142384-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Suggested-by: Yu Zhao <yuzhao@google.com> Acked-by: Chris Li (Google) <chrisl@kernel.org> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Tested-by: Nhat Pham <nphamcs@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
18a93707 |
| 21-Jun-2023 |
Yosry Ahmed <yosryahmed@google.com> |
mm: zswap: fix double invalidate with exclusive loads
If exclusive loads are enabled for zswap, we invalidate the entry before returning from zswap_frontswap_load(), after dropping the local referen
mm: zswap: fix double invalidate with exclusive loads
If exclusive loads are enabled for zswap, we invalidate the entry before returning from zswap_frontswap_load(), after dropping the local reference. However, the tree lock is dropped during decompression after the local reference is acquired, so the entry could be invalidated before we drop the local ref. If this happens, the entry is freed once we drop the local ref, and zswap_invalidate_entry() tries to invalidate an already freed entry.
Fix this by: (a) Making sure zswap_invalidate_entry() is always called with a local ref held, to avoid being called on a freed entry. (b) Making sure zswap_invalidate_entry() only drops the ref if the entry was actually on the rbtree. Otherwise, another invalidation could have already happened, and the initial ref is already dropped.
With these changes, there is no need to check that there is no need to make sure the entry still exists in the tree in zswap_reclaim_entry() before invalidating it, as zswap_reclaim_entry() will make this check internally.
Link: https://lkml.kernel.org/r/20230621093009.637544-1-yosryahmed@google.com Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
418fd29d |
| 14-Jun-2023 |
Domenico Cerasuolo <cerasuolodomenico@gmail.com> |
mm: zswap: invaldiate entry after writeback
When an entry started writeback, it used to be invalidated with ref count logic alone, meaning that it would stay on the tree until all references were pu
mm: zswap: invaldiate entry after writeback
When an entry started writeback, it used to be invalidated with ref count logic alone, meaning that it would stay on the tree until all references were put. The problem with this behavior is that as soon as the writeback started, the ownership of the data held by the entry is passed to the swapcache and should not be left in zswap too. Currently there are no known issues because of this, but this change explicitly invalidates an entry that started writeback to reduce opportunities for future bugs.
This patch is a follow up on the series titled "mm: zswap: move writeback LRU from zpool to zswap" + commit f090b7949768("mm: zswap: support exclusive loads").
Link: https://lkml.kernel.org/r/20230614143122.74471-1-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.34 |
|
#
0bb48849 |
| 12-Jun-2023 |
Domenico Cerasuolo <cerasuolodomenico@gmail.com> |
mm: zswap: remove zswap_header
Previously, zswap_header served the purpose of storing the swpentry within zpool pages. This allowed zpool implementations to pass relevant information to the writeba
mm: zswap: remove zswap_header
Previously, zswap_header served the purpose of storing the swpentry within zpool pages. This allowed zpool implementations to pass relevant information to the writeback function. However, with the current implementation, writeback is directly handled within zswap. Consequently, there is no longer a necessity for zswap_header, as the swp_entry_t can be stored directly in zswap_entry.
Link: https://lkml.kernel.org/r/20230612093815.133504-8-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Tested-by: Yosry Ahmed <yosryahmed@google.com> Suggested-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
ff9d5ba2 |
| 12-Jun-2023 |
Domenico Cerasuolo <cerasuolodomenico@gmail.com> |
mm: zswap: simplify writeback function
zswap_writeback_entry() used to be a callback for the backends, which don't know about struct zswap_entry.
Now that the only user is the generic zswap LRU rec
mm: zswap: simplify writeback function
zswap_writeback_entry() used to be a callback for the backends, which don't know about struct zswap_entry.
Now that the only user is the generic zswap LRU reclaimer, it can be simplified: pass the pinned zswap_entry directly, and consolidate the refcount management in the shrink function.
Link: https://lkml.kernel.org/r/20230612093815.133504-7-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Tested-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
35499e2b |
| 12-Jun-2023 |
Domenico Cerasuolo <cerasuolodomenico@gmail.com> |
mm: zswap: remove shrink from zpool interface
Now that all three zswap backends have removed their shrink code, it is no longer necessary for the zpool interface to include shrink/writeback endpoint
mm: zswap: remove shrink from zpool interface
Now that all three zswap backends have removed their shrink code, it is no longer necessary for the zpool interface to include shrink/writeback endpoints.
Link: https://lkml.kernel.org/r/20230612093815.133504-6-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
f999f38b |
| 12-Jun-2023 |
Domenico Cerasuolo <cerasuolodomenico@gmail.com> |
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing the LRU managem
mm: zswap: add pool shrinking mechanism
Patch series "mm: zswap: move writeback LRU from zpool to zswap", v3.
This series aims to improve the zswap reclaim mechanism by reorganizing the LRU management. In the current implementation, the LRU is maintained within each zpool driver, resulting in duplicated code across the three drivers. The proposed change consists in moving the LRU management from the individual implementations up to the zswap layer.
The primary objective of this refactoring effort is to simplify the codebase. By unifying the reclaim loop and consolidating LRU handling within zswap, we can eliminate redundant code and improve maintainability. Additionally, this change enables the reclamation of stored pages in their actual LRU order. Presently, the zpool drivers link backing pages in an LRU, causing compressed pages with different LRU positions to be written back simultaneously.
The series consists of several patches. The first patch implements the LRU and the reclaim loop in zswap, but it is not used yet because all three driver implementations are marked as zpool_evictable. The following three commits modify each zpool driver to be not zpool_evictable, allowing the use of the reclaim loop in zswap. As the drivers removed their shrink functions, the zpool interface is then trimmed by removing zpool_evictable, zpool_ops, and zpool_shrink. Finally, the code in zswap is further cleaned up by simplifying the writeback function and removing the now unnecessary zswap_header.
This patch (of 7):
Each zpool driver (zbud, z3fold and zsmalloc) implements its own shrink function, which is called from zpool_shrink. However, with this commit, a unified shrink function is added to zswap. The ultimate goal is to eliminate the need for zpool_shrink once all zpool implementations have dropped their shrink code.
To ensure the functionality of each commit, this change focuses solely on adding the mechanism itself. No modifications are made to the backends, meaning that functionally, there are no immediate changes. The zswap mechanism will only come into effect once the backends have removed their shrink code. The subsequent commits will address the modifications needed in the backends.
Link: https://lkml.kernel.org/r/20230612093815.133504-1-cerasuolodomenico@gmail.com Link: https://lkml.kernel.org/r/20230612093815.133504-2-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Tested-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.33 |
|
#
b9c91c43 |
| 07-Jun-2023 |
Yosry Ahmed <yosryahmed@google.com> |
mm: zswap: support exclusive loads
Commit 71024cb4a0bf ("frontswap: remove frontswap_tmem_exclusive_gets") removed support for exclusive loads from frontswap as it was not used. Bring back exclusiv
mm: zswap: support exclusive loads
Commit 71024cb4a0bf ("frontswap: remove frontswap_tmem_exclusive_gets") removed support for exclusive loads from frontswap as it was not used. Bring back exclusive loads support to frontswap by adding an "exclusive" output parameter to frontswap_ops->load.
On the zswap side, add a module parameter to enable/disable exclusive loads, and a config option to control the boot default value. Refactor zswap entry invalidation in zswap_frontswap_invalidate_page() into zswap_invalidate_entry() to reuse it in zswap_frontswap_load() if exclusive loads are enabled.
With exclusive loads, we avoid having two copies of the same page in memory (compressed & uncompressed) after faulting it in from zswap. On the other hand, if the page is to be reclaimed again without being dirtied, it will be re-compressed. Compression is not usually slow, and a page that was just faulted in is less likely to be reclaimed again soon.
Link: https://lkml.kernel.org/r/20230607195143.1473802-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Suggested-by: Yu Zhao <yuzhao@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.32 |
|
#
0bdf0efa |
| 30-May-2023 |
Nhat Pham <nphamcs@gmail.com> |
zswap: do not shrink if cgroup may not zswap
Before storing a page, zswap first checks if the number of stored pages exceeds the limit specified by memory.zswap.max, for each cgroup in the hierarchy
zswap: do not shrink if cgroup may not zswap
Before storing a page, zswap first checks if the number of stored pages exceeds the limit specified by memory.zswap.max, for each cgroup in the hierarchy. If this limit is reached or exceeded, then zswap shrinking is triggered and short-circuits the store attempt.
However, since the zswap's LRU is not memcg-aware, this can create the following pathological behavior: the cgroup whose zswap limit is 0 will evict pages from other cgroups continually, without lowering its own zswap usage. This means the shrinking will continue until the need for swap ceases or the pool becomes empty.
As a result of this, we observe a disproportionate amount of zswap writeback and a perpetually small zswap pool in our experiments, even though the pool limit is never hit.
More generally, a cgroup might unnecessarily evict pages from other cgroups before we drive the memcg back below its limit.
This patch fixes the issue by rejecting zswap store attempt without shrinking the pool when obj_cgroup_may_zswap() returns false.
[akpm@linux-foundation.org: fix return of unintialized value] [akpm@linux-foundation.org: s/ENOSPC/ENOMEM/] Link: https://lkml.kernel.org/r/20230530222440.2777700-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20230530232435.3097106-1-nphamcs@gmail.com Fixes: f4840ccfca25 ("zswap: memcg accounting") Signed-off-by: Nhat Pham <nphamcs@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.31 |
|
#
e0228d59 |
| 26-May-2023 |
Domenico Cerasuolo <cerasuolodomenico@gmail.com> |
mm: zswap: shrink until can accept
This update addresses an issue with the zswap reclaim mechanism, which hinders the efficient offloading of cold pages to disk, thereby compromising the preservatio
mm: zswap: shrink until can accept
This update addresses an issue with the zswap reclaim mechanism, which hinders the efficient offloading of cold pages to disk, thereby compromising the preservation of the LRU order and consequently diminishing, if not inverting, its performance benefits.
The functioning of the zswap shrink worker was found to be inadequate, as shown by basic benchmark test. For the test, a kernel build was utilized as a reference, with its memory confined to 1G via a cgroup and a 5G swap file provided. The results are presented below, these are averages of three runs without the use of zswap:
real 46m26s user 35m4s sys 7m37s
With zswap (zbud) enabled and max_pool_percent set to 1 (in a 32G system), the results changed to:
real 56m4s user 35m13s sys 8m43s
written_back_pages: 18 reject_reclaim_fail: 0 pool_limit_hit:1478
Besides the evident regression, one thing to notice from this data is the extremely low number of written_back_pages and pool_limit_hit.
The pool_limit_hit counter, which is increased in zswap_frontswap_store when zswap is completely full, doesn't account for a particular scenario: once zswap hits his limit, zswap_pool_reached_full is set to true; with this flag on, zswap_frontswap_store rejects pages if zswap is still above the acceptance threshold. Once we include the rejections due to zswap_pool_reached_full && !zswap_can_accept(), the number goes from 1478 to a significant 21578266.
Zswap is stuck in an undesirable state where it rejects pages because it's above the acceptance threshold, yet fails to attempt memory reclaimation. This happens because the shrink work is only queued when zswap_frontswap_store detects that it's full and the work itself only reclaims one page per run.
This state results in hot pages getting written directly to disk, while cold ones remain memory, waiting only to be invalidated. The LRU order is completely broken and zswap ends up being just an overhead without providing any benefits.
This commit applies 2 changes: a) the shrink worker is set to reclaim pages until the acceptance threshold is met and b) the task is also enqueued when zswap is not full but still above the threshold.
Testing this suggested update showed much better numbers:
real 36m37s user 35m8s sys 9m32s
written_back_pages: 10459423 reject_reclaim_fail: 12896 pool_limit_hit: 75653
Link: https://lkml.kernel.org/r/20230526183227.793977-1-cerasuolodomenico@gmail.com Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit") Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjenning@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.30, v6.1.29, v6.1.28 |
|
#
04fc7816 |
| 03-May-2023 |
Domenico Cerasuolo <cerasuolodomenico@gmail.com> |
mm: fix zswap writeback race condition
The zswap writeback mechanism can cause a race condition resulting in memory corruption, where a swapped out page gets swapped in with data that was written to
mm: fix zswap writeback race condition
The zswap writeback mechanism can cause a race condition resulting in memory corruption, where a swapped out page gets swapped in with data that was written to a different page.
The race unfolds like this: 1. a page with data A and swap offset X is stored in zswap 2. page A is removed off the LRU by zpool driver for writeback in zswap-shrink work, data for A is mapped by zpool driver 3. user space program faults and invalidates page entry A, offset X is considered free 4. kswapd stores page B at offset X in zswap (zswap could also be full, if so, page B would then be IOed to X, then skip step 5.) 5. entry A is replaced by B in tree->rbroot, this doesn't affect the local reference held by zswap-shrink work 6. zswap-shrink work writes back A at X, and frees zswap entry A 7. swapin of slot X brings A in memory instead of B
The fix: Once the swap page cache has been allocated (case ZSWAP_SWAPCACHE_NEW), zswap-shrink work just checks that the local zswap_entry reference is still the same as the one in the tree. If it's not the same it means that it's either been invalidated or replaced, in both cases the writeback is aborted because the local entry contains stale data.
Reproducer: I originally found this by running `stress` overnight to validate my work on the zswap writeback mechanism, it manifested after hours on my test machine. The key to make it happen is having zswap writebacks, so whatever setup pumps /sys/kernel/debug/zswap/written_back_pages should do the trick.
In order to reproduce this faster on a vm, I setup a system with ~100M of available memory and a 500M swap file, then running `stress --vm 1 --vm-bytes 300000000 --vm-stride 4000` makes it happen in matter of tens of minutes. One can speed things up even more by swinging /sys/module/zswap/parameters/max_pool_percent up and down between, say, 20 and 1; this makes it reproduce in tens of seconds. It's crucial to set `--vm-stride` to something other than 4096 otherwise `stress` won't realize that memory has been corrupted because all pages would have the same data.
Link: https://lkml.kernel.org/r/20230503151200.19707-1-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Chris Li (Google) <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23 |
|
#
141fdeec |
| 03-Apr-2023 |
Liu Shixin <liushixin2@huawei.com> |
mm/zswap: delay the initialization of zswap
Since some users may not use zswap, the zswap_pool is wasted. Save memory by delaying the initialization of zswap until enabled.
[liushixin2@huawei.com:
mm/zswap: delay the initialization of zswap
Since some users may not use zswap, the zswap_pool is wasted. Save memory by delaying the initialization of zswap until enabled.
[liushixin2@huawei.com: fix some pattern problem suggested by Christoph] Link: https://lkml.kernel.org/r/20230411093632.822290-4-liushixin2@huawei.com Link: https://lkml.kernel.org/r/20230403121318.1876082-4-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|