Revision tags: v6.6.25, v6.6.24, v6.6.23 |
|
#
30515231 |
| 06-Feb-2024 |
Kairui Song <kasong@tencent.com> |
mm/swap: fix race when skipping swapcache
commit 13ddaf26be324a7f951891ecd9ccd04466d27458 upstream.
When skipping swapcache for SWP_SYNCHRONOUS_IO, if two or more threads swapin the same entry at t
mm/swap: fix race when skipping swapcache
commit 13ddaf26be324a7f951891ecd9ccd04466d27458 upstream.
When skipping swapcache for SWP_SYNCHRONOUS_IO, if two or more threads swapin the same entry at the same time, they get different pages (A, B). Before one thread (T0) finishes the swapin and installs page (A) to the PTE, another thread (T1) could finish swapin of page (B), swap_free the entry, then swap out the possibly modified page reusing the same entry. It breaks the pte_same check in (T0) because PTE value is unchanged, causing ABA problem. Thread (T0) will install a stalled page (A) into the PTE and cause data corruption.
One possible callstack is like this:
CPU0 CPU1 ---- ---- do_swap_page() do_swap_page() with same entry <direct swapin path> <direct swapin path> <alloc page A> <alloc page B> swap_read_folio() <- read to page A swap_read_folio() <- read to page B <slow on later locks or interrupt> <finished swapin first> ... set_pte_at() swap_free() <- entry is free <write to page B, now page A stalled> <swap out page B to same swap entry> pte_same() <- Check pass, PTE seems unchanged, but page A is stalled! swap_free() <- page B content lost! set_pte_at() <- staled page A installed!
And besides, for ZRAM, swap_free() allows the swap device to discard the entry content, so even if page (B) is not modified, if swap_read_folio() on CPU0 happens later than swap_free() on CPU1, it may also cause data loss.
To fix this, reuse swapcache_prepare which will pin the swap entry using the cache flag, and allow only one thread to swap it in, also prevent any parallel code from putting the entry in the cache. Release the pin after PT unlocked.
Racers just loop and wait since it's a rare and very short event. A schedule_timeout_uninterruptible(1) call is added to avoid repeated page faults wasting too much CPU, causing livelock or adding too much noise to perf statistics. A similar livelock issue was described in commit 029c4628b2eb ("mm: swap: get rid of livelock in swapin readahead")
Reproducer:
This race issue can be triggered easily using a well constructed reproducer and patched brd (with a delay in read path) [1]:
With latest 6.8 mainline, race caused data loss can be observed easily: $ gcc -g -lpthread test-thread-swap-race.c && ./a.out Polulating 32MB of memory region... Keep swapping out... Starting round 0... Spawning 65536 workers... 32746 workers spawned, wait for done... Round 0: Error on 0x5aa00, expected 32746, got 32743, 3 data loss! Round 0: Error on 0x395200, expected 32746, got 32743, 3 data loss! Round 0: Error on 0x3fd000, expected 32746, got 32737, 9 data loss! Round 0 Failed, 15 data loss!
This reproducer spawns multiple threads sharing the same memory region using a small swap device. Every two threads updates mapped pages one by one in opposite direction trying to create a race, with one dedicated thread keep swapping out the data out using madvise.
The reproducer created a reproduce rate of about once every 5 minutes, so the race should be totally possible in production.
After this patch, I ran the reproducer for over a few hundred rounds and no data loss observed.
Performance overhead is minimal, microbenchmark swapin 10G from 32G zram:
Before: 10934698 us After: 11157121 us Cached: 13155355 us (Dropping SWP_SYNCHRONOUS_IO flag)
[kasong@tencent.com: v4] Link: https://lkml.kernel.org/r/20240219082040.7495-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20240206182559.32264-1-ryncsn@gmail.com Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device") Reported-by: "Huang, Ying" <ying.huang@intel.com> Closes: https://lore.kernel.org/lkml/87bk92gqpx.fsf_-_@yhuang6-desk2.ccr.corp.intel.com/ Link: https://github.com/ryncsn/emm-test-project/tree/master/swap-stress-race [1] Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Yu Zhao <yuzhao@google.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Barry Song <21cnbao@gmail.com> Cc: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37 |
|
#
b243dcbf |
| 30-Jun-2023 |
Suren Baghdasaryan <surenb@google.com> |
swap: remove remnants of polling from read_swap_cache_async
Patch series "Per-VMA lock support for swap and userfaults", v7.
When per-VMA locks were introduced in [1] several types of page faults w
swap: remove remnants of polling from read_swap_cache_async
Patch series "Per-VMA lock support for swap and userfaults", v7.
When per-VMA locks were introduced in [1] several types of page faults would still fall back to mmap_lock to keep the patchset simple. Among them are swap and userfault pages. The main reason for skipping those cases was the fact that mmap_lock could be dropped while handling these faults and that required additional logic to be implemented. Implement the mechanism to allow per-VMA locks to be dropped for these cases.
First, change handle_mm_fault to drop per-VMA locks when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED to be consistent with the way mmap_lock is handled. Then change folio_lock_or_retry to accept vm_fault and return vm_fault_t which simplifies later patches. Finally allow swap and uffd page faults to be handled under per-VMA locks by dropping per-VMA and retrying, the same way it's done under mmap_lock. Naturally, once VMA lock is dropped that VMA should be assumed unstable and can't be used.
This patch (of 6):
Commit [1] introduced IO polling support duding swapin to reduce swap read latency for block devices that can be polled. However later commit [2] removed polling support. Therefore it seems safe to remove do_poll parameter in read_swap_cache_async and always call swap_readpage with synchronous=false waiting for IO completion in folio_lock_or_retry.
[1] commit 23955622ff8d ("swap: add block io poll in swapin path") [2] commit 9650b453a3d4 ("block: ignore RWF_HIPRI hint for sync dio")
Link: https://lkml.kernel.org/r/20230630211957.1341547-1-surenb@google.com Link: https://lkml.kernel.org/r/20230630211957.1341547-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Minchan Kim <minchan@google.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9 |
|
#
e3e2762b |
| 25-Jan-2023 |
Christoph Hellwig <hch@lst.de> |
mm: remove the __swap_writepage return value
__swap_writepage always returns 0.
Link: https://lkml.kernel.org/r/20230125133436.447864-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc:
mm: remove the __swap_writepage return value
__swap_writepage always returns 0.
Link: https://lkml.kernel.org/r/20230125133436.447864-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
a8c1408f |
| 25-Jan-2023 |
Christoph Hellwig <hch@lst.de> |
mm: remove the swap_readpage return value
swap_readpage always returns 0, and no caller checks the return value.
[akpm@linux-foundation.org: fix void-returning swap_readpage() stub, per Keith] Link
mm: remove the swap_readpage return value
swap_readpage always returns 0, and no caller checks the return value.
[akpm@linux-foundation.org: fix void-returning swap_readpage() stub, per Keith] Link: https://lkml.kernel.org/r/20230125133436.447864-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3 |
|
#
524984ff |
| 19-Oct-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm: convert find_get_incore_page() to filemap_get_incore_folio()
Return the containing folio instead of the precise page. One of the callers wants the folio and the other can do the folio->page con
mm: convert find_get_incore_page() to filemap_get_incore_folio()
Return the containing folio instead of the precise page. One of the callers wants the folio and the other can do the folio->page conversion itself. Nets 442 bytes of text size reduction, 478 bytes removed and 36 bytes added.
Link: https://lkml.kernel.org/r/20221019183332.2802139-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65 |
|
#
cb691e2f |
| 02-Sep-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm: remove lookup_swap_cache()
All callers have now been converted to swap_cache_get_folio(), so we can remove this wrapper.
Link: https://lkml.kernel.org/r/20220902194653.1739778-39-willy@infradea
mm: remove lookup_swap_cache()
All callers have now been converted to swap_cache_get_folio(), so we can remove this wrapper.
Link: https://lkml.kernel.org/r/20220902194653.1739778-39-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
c9edc242 |
| 02-Sep-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
swap: add swap_cache_get_folio()
Convert lookup_swap_cache() into swap_cache_get_folio() and add a lookup_swap_cache() wrapper around it.
[akpm@linux-foundation.org: add CONFIG_SWAP=n stub for swap
swap: add swap_cache_get_folio()
Convert lookup_swap_cache() into swap_cache_get_folio() and add a lookup_swap_cache() wrapper around it.
[akpm@linux-foundation.org: add CONFIG_SWAP=n stub for swap_cache_get_folio()] Link: https://lkml.kernel.org/r/20220902194653.1739778-20-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
a4c366f0 |
| 02-Sep-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm/swap: convert add_to_swap_cache() to take a folio
With all callers using folios, we can convert add_to_swap_cache() to take a folio and use it throughout.
Link: https://lkml.kernel.org/r/2022090
mm/swap: convert add_to_swap_cache() to take a folio
With all callers using folios, we can convert add_to_swap_cache() to take a folio and use it throughout.
Link: https://lkml.kernel.org/r/20220902194653.1739778-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.64, v5.15.63, v5.15.62, v5.15.61 |
|
#
cf1e3fe4 |
| 11-Aug-2022 |
Christoph Hellwig <hch@lst.de> |
mm/swap: remove the end_write_func argument to __swap_writepage
The argument is always set to end_swap_bio_write, so remove the argument and mark end_swap_bio_write static.
Link: https://lkml.kerne
mm/swap: remove the end_write_func argument to __swap_writepage
The argument is always set to end_swap_bio_write, so remove the argument and mark end_swap_bio_write static.
Link: https://lkml.kernel.org/r/20220811141741.660214-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50 |
|
#
1baec203 |
| 25-Jun-2022 |
Miaohe Lin <linmiaohe@huawei.com> |
mm/khugepaged: try to free transhuge swapcache when possible
Transhuge swapcaches won't be freed in __collapse_huge_page_copy(). It's because release_pte_page() is not called for these pages and th
mm/khugepaged: try to free transhuge swapcache when possible
Transhuge swapcaches won't be freed in __collapse_huge_page_copy(). It's because release_pte_page() is not called for these pages and thus free_page_and_swap_cache can't grab the page lock. These pages won't be freed from swap cache even if we are the only user until next time reclaim. It shouldn't hurt indeed, but we could try to free these pages to save more memory for system.
Link: https://lkml.kernel.org/r/20220625092816.4856-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.49 |
|
#
ceff9d33 |
| 17-Jun-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm/swap: convert __delete_from_swap_cache() to a folio
All callers now have a folio, so convert the entire function to operate on folios.
Link: https://lkml.kernel.org/r/20220617175020.717127-23-wi
mm/swap: convert __delete_from_swap_cache() to a folio
All callers now have a folio, so convert the entire function to operate on folios.
Link: https://lkml.kernel.org/r/20220617175020.717127-23-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
75fa68a5 |
| 17-Jun-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm/swap: convert delete_from_swap_cache() to take a folio
All but one caller already has a folio, so convert it to use a folio.
Link: https://lkml.kernel.org/r/20220617175020.717127-22-willy@infrad
mm/swap: convert delete_from_swap_cache() to take a folio
All but one caller already has a folio, so convert it to use a folio.
Link: https://lkml.kernel.org/r/20220617175020.717127-22-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
b98c359f |
| 17-Jun-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm: convert page_swap_flags to folio_swap_flags
The only caller already has a folio, so push the folio->page conversion down a level.
Link: https://lkml.kernel.org/r/20220617175020.717127-21-willy@
mm: convert page_swap_flags to folio_swap_flags
The only caller already has a folio, so push the folio->page conversion down a level.
Link: https://lkml.kernel.org/r/20220617175020.717127-21-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40 |
|
#
09c02e56 |
| 12-May-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
swap: convert add_to_swap() to take a folio
The only caller already has a folio available, so this saves a conversion. Also convert the return type to boolean.
Link: https://lkml.kernel.org/r/20220
swap: convert add_to_swap() to take a folio
The only caller already has a folio available, so this saves a conversion. Also convert the return type to boolean.
Link: https://lkml.kernel.org/r/20220504182857.4013401-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.39 |
|
#
2282679f |
| 09-May-2022 |
NeilBrown <neilb@suse.de> |
mm: submit multipage write for SWP_FS_OPS swap-space
swap_writepage() is given one page at a time, but may be called repeatedly in succession.
For block-device swapspace, the blk_plug functionality
mm: submit multipage write for SWP_FS_OPS swap-space
swap_writepage() is given one page at a time, but may be called repeatedly in succession.
For block-device swapspace, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads.
With this patch we pass a pointer-to-pointer via the wbc. swap_writepage can store state between calls - much like the pointer passed explicitly to swap_readpage. After calling swap_writepage() some number of times, the state will be passed to swap_write_unplug() which can submit the combined request.
Link: https://lkml.kernel.org/r/164859778128.29473.5191868522654408537.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: David Howells <dhowells@redhat.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
5169b844 |
| 09-May-2022 |
NeilBrown <neilb@suse.de> |
mm: submit multipage reads for SWP_FS_OPS swap-space
swap_readpage() is given one page at a time, but may be called repeatedly in succession.
For block-device swap-space, the blk_plug functionality
mm: submit multipage reads for SWP_FS_OPS swap-space
swap_readpage() is given one page at a time, but may be called repeatedly in succession.
For block-device swap-space, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads.
With this patch we pass in a pointer-to-pointer when swap_readpage can store state between calls - much like the effect of blk_plug. After calling swap_readpage() some number of times, the state will be passed to swap_read_unplug() which can submit the combined request.
Link: https://lkml.kernel.org/r/164859778127.29473.14059420492644907783.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: David Howells <dhowells@redhat.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
e1209d3a |
| 09-May-2022 |
NeilBrown <neilb@suse.de> |
mm: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space
swap currently uses ->readpage to read swap pages. This can only request one page at a time from the filesystem, which is not
mm: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space
swap currently uses ->readpage to read swap pages. This can only request one page at a time from the filesystem, which is not most efficient.
swap uses ->direct_IO for writes which while this is adequate is an inappropriate over-loading. ->direct_IO may need to had handle allocate space for holes or other details that are not relevant for swap.
So this patch introduces a new address_space operation: ->swap_rw. In this patch it is used for reads, and a subsequent patch will switch writes to use it.
No filesystem yet supports ->swap_rw, but that is not a problem because no filesystem actually works with filesystem-based swap. Only two filesystems set SWP_FS_OPS: - cifs sets the flag, but ->direct_IO always fails so swap cannot work. - nfs sets the flag, but ->direct_IO calls generic_write_checks() which has failed on swap files for several releases.
To ensure that a NULL ->swap_rw isn't called, ->activate_swap() for both NFS and cifs are changed to fail if ->swap_rw is not set. This can be removed if/when the function is added.
Future patches will restore swap-over-NFS functionality.
To submit an async read with ->swap_rw() we need to allocate a structure to hold the kiocb and other details. swap_readpage() cannot handle transient failure, so we create a mempool to provide the structures.
Link: https://lkml.kernel.org/r/164859778125.29473.13430559328221330589.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: David Howells <dhowells@redhat.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
d791ea67 |
| 09-May-2022 |
NeilBrown <neilb@suse.de> |
mm: reclaim mustn't enter FS for SWP_FS_OPS swap-space
If swap-out is using filesystem operations (SWP_FS_OPS), then it is not safe to enter the FS for reclaim. So only down-grade the requirement f
mm: reclaim mustn't enter FS for SWP_FS_OPS swap-space
If swap-out is using filesystem operations (SWP_FS_OPS), then it is not safe to enter the FS for reclaim. So only down-grade the requirement for swap pages to __GFP_IO after checking that SWP_FS_OPS are not being used.
This makes the calculation of "may_enter_fs" slightly more complex, so move it into a separate function. with that done, there is little value in maintaining the bool variable any more. So replace the may_enter_fs variable with a may_enter_fs() function. This removes any risk for the variable becoming out-of-date.
Link: https://lkml.kernel.org/r/164859778124.29473.16176717935781721855.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: David Howells <dhowells@redhat.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
014bb1de |
| 09-May-2022 |
NeilBrown <neilb@suse.de> |
mm: create new mm/swap.h header file
Patch series "MM changes to improve swap-over-NFS support".
Assorted improvements for swap-via-filesystem.
This is a resend of these patches, rebased on curren
mm: create new mm/swap.h header file
Patch series "MM changes to improve swap-over-NFS support".
Assorted improvements for swap-via-filesystem.
This is a resend of these patches, rebased on current HEAD. The only substantial changes is that swap_dirty_folio has replaced swap_set_page_dirty.
Currently swap-via-fs (SWP_FS_OPS) doesn't work for any filesystem. It has previously worked for NFS but that broke a few releases back. This series changes to use a new ->swap_rw rather than ->readpage and ->direct_IO. It also makes other improvements.
There is a companion series already in linux-next which fixes various issues with NFS. Once both series land, a final patch is needed which changes NFS over to use ->swap_rw.
This patch (of 10):
Many functions declared in include/linux/swap.h are only used within mm/
Create a new "mm/swap.h" and move some of these declarations there. Remove the redundant 'extern' from the function declarations.
[akpm@linux-foundation.org: mm/memory-failure.c needs mm/swap.h] Link: https://lkml.kernel.org/r/164859751830.29473.5309689752169286816.stgit@noble.brown Link: https://lkml.kernel.org/r/164859778120.29473.11725907882296224053.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: David Howells <dhowells@redhat.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|