Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
468971c3 |
| 26-Apr-2024 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that page_cache_ra_order() do not attempt to reclaim file-backed pages too, or it leads to a deadlock, found issue when test ext4 large folio.
INFO: task DataXceiver for:7494 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200 Call trace: __switch_to+0x14c/0x240 __schedule+0x82c/0xdd0 schedule+0x58/0xf0 io_schedule+0x24/0xa0 __folio_lock+0x130/0x300 migrate_pages_batch+0x378/0x918 migrate_pages+0x350/0x700 compact_zone+0x63c/0xb38 compact_zone_order+0xc0/0x118 try_to_compact_pages+0xb0/0x280 __alloc_pages_direct_compact+0x98/0x248 __alloc_pages+0x510/0x1110 alloc_pages+0x9c/0x130 folio_alloc+0x20/0x78 filemap_alloc_folio+0x8c/0x1b0 page_cache_ra_order+0x174/0x308 ondemand_readahead+0x1c8/0x2b8 page_cache_async_ra+0x68/0xb8 filemap_readahead.isra.0+0x64/0xa8 filemap_get_pages+0x3fc/0x5b0 filemap_splice_read+0xf4/0x280 ext4_file_splice_read+0x2c/0x48 [ext4] vfs_splice_read.part.0+0xa8/0x118 splice_direct_to_actor+0xbc/0x288 do_splice_direct+0x9c/0x108 do_sendfile+0x328/0x468 __arm64_sys_sendfile64+0x8c/0x148 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x1f8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x188/0x190
Link: https://lkml.kernel.org/r/20240426112938.124740-1-wangkefeng.wang@huawei.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
468971c3 |
| 26-Apr-2024 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that page_cache_ra_order() do not attempt to reclaim file-backed pages too, or it leads to a deadlock, found issue when test ext4 large folio.
INFO: task DataXceiver for:7494 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200 Call trace: __switch_to+0x14c/0x240 __schedule+0x82c/0xdd0 schedule+0x58/0xf0 io_schedule+0x24/0xa0 __folio_lock+0x130/0x300 migrate_pages_batch+0x378/0x918 migrate_pages+0x350/0x700 compact_zone+0x63c/0xb38 compact_zone_order+0xc0/0x118 try_to_compact_pages+0xb0/0x280 __alloc_pages_direct_compact+0x98/0x248 __alloc_pages+0x510/0x1110 alloc_pages+0x9c/0x130 folio_alloc+0x20/0x78 filemap_alloc_folio+0x8c/0x1b0 page_cache_ra_order+0x174/0x308 ondemand_readahead+0x1c8/0x2b8 page_cache_async_ra+0x68/0xb8 filemap_readahead.isra.0+0x64/0xa8 filemap_get_pages+0x3fc/0x5b0 filemap_splice_read+0xf4/0x280 ext4_file_splice_read+0x2c/0x48 [ext4] vfs_splice_read.part.0+0xa8/0x118 splice_direct_to_actor+0xbc/0x288 do_splice_direct+0x9c/0x108 do_sendfile+0x328/0x468 __arm64_sys_sendfile64+0x8c/0x148 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x1f8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x188/0x190
Link: https://lkml.kernel.org/r/20240426112938.124740-1-wangkefeng.wang@huawei.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
468971c3 |
| 26-Apr-2024 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that page_cache_ra_order() do not attempt to reclaim file-backed pages too, or it leads to a deadlock, found issue when test ext4 large folio.
INFO: task DataXceiver for:7494 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200 Call trace: __switch_to+0x14c/0x240 __schedule+0x82c/0xdd0 schedule+0x58/0xf0 io_schedule+0x24/0xa0 __folio_lock+0x130/0x300 migrate_pages_batch+0x378/0x918 migrate_pages+0x350/0x700 compact_zone+0x63c/0xb38 compact_zone_order+0xc0/0x118 try_to_compact_pages+0xb0/0x280 __alloc_pages_direct_compact+0x98/0x248 __alloc_pages+0x510/0x1110 alloc_pages+0x9c/0x130 folio_alloc+0x20/0x78 filemap_alloc_folio+0x8c/0x1b0 page_cache_ra_order+0x174/0x308 ondemand_readahead+0x1c8/0x2b8 page_cache_async_ra+0x68/0xb8 filemap_readahead.isra.0+0x64/0xa8 filemap_get_pages+0x3fc/0x5b0 filemap_splice_read+0xf4/0x280 ext4_file_splice_read+0x2c/0x48 [ext4] vfs_splice_read.part.0+0xa8/0x118 splice_direct_to_actor+0xbc/0x288 do_splice_direct+0x9c/0x108 do_sendfile+0x328/0x468 __arm64_sys_sendfile64+0x8c/0x148 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x1f8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x188/0x190
Link: https://lkml.kernel.org/r/20240426112938.124740-1-wangkefeng.wang@huawei.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
468971c3 |
| 26-Apr-2024 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that page_cache_ra_order() do not attempt to reclaim file-backed pages too, or it leads to a deadlock, found issue when test ext4 large folio.
INFO: task DataXceiver for:7494 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200 Call trace: __switch_to+0x14c/0x240 __schedule+0x82c/0xdd0 schedule+0x58/0xf0 io_schedule+0x24/0xa0 __folio_lock+0x130/0x300 migrate_pages_batch+0x378/0x918 migrate_pages+0x350/0x700 compact_zone+0x63c/0xb38 compact_zone_order+0xc0/0x118 try_to_compact_pages+0xb0/0x280 __alloc_pages_direct_compact+0x98/0x248 __alloc_pages+0x510/0x1110 alloc_pages+0x9c/0x130 folio_alloc+0x20/0x78 filemap_alloc_folio+0x8c/0x1b0 page_cache_ra_order+0x174/0x308 ondemand_readahead+0x1c8/0x2b8 page_cache_async_ra+0x68/0xb8 filemap_readahead.isra.0+0x64/0xa8 filemap_get_pages+0x3fc/0x5b0 filemap_splice_read+0xf4/0x280 ext4_file_splice_read+0x2c/0x48 [ext4] vfs_splice_read.part.0+0xa8/0x118 splice_direct_to_actor+0xbc/0x288 do_splice_direct+0x9c/0x108 do_sendfile+0x328/0x468 __arm64_sys_sendfile64+0x8c/0x148 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x1f8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x188/0x190
Link: https://lkml.kernel.org/r/20240426112938.124740-1-wangkefeng.wang@huawei.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
468971c3 |
| 26-Apr-2024 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that page_cache_ra_order() do not attempt to reclaim file-backed pages too, or it leads to a deadlock, found issue when test ext4 large folio.
INFO: task DataXceiver for:7494 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200 Call trace: __switch_to+0x14c/0x240 __schedule+0x82c/0xdd0 schedule+0x58/0xf0 io_schedule+0x24/0xa0 __folio_lock+0x130/0x300 migrate_pages_batch+0x378/0x918 migrate_pages+0x350/0x700 compact_zone+0x63c/0xb38 compact_zone_order+0xc0/0x118 try_to_compact_pages+0xb0/0x280 __alloc_pages_direct_compact+0x98/0x248 __alloc_pages+0x510/0x1110 alloc_pages+0x9c/0x130 folio_alloc+0x20/0x78 filemap_alloc_folio+0x8c/0x1b0 page_cache_ra_order+0x174/0x308 ondemand_readahead+0x1c8/0x2b8 page_cache_async_ra+0x68/0xb8 filemap_readahead.isra.0+0x64/0xa8 filemap_get_pages+0x3fc/0x5b0 filemap_splice_read+0xf4/0x280 ext4_file_splice_read+0x2c/0x48 [ext4] vfs_splice_read.part.0+0xa8/0x118 splice_direct_to_actor+0xbc/0x288 do_splice_direct+0x9c/0x108 do_sendfile+0x328/0x468 __arm64_sys_sendfile64+0x8c/0x148 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x1f8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x188/0x190
Link: https://lkml.kernel.org/r/20240426112938.124740-1-wangkefeng.wang@huawei.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
468971c3 |
| 26-Apr-2024 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that
mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69 upstream.
See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead path"), ensure that page_cache_ra_order() do not attempt to reclaim file-backed pages too, or it leads to a deadlock, found issue when test ext4 large folio.
INFO: task DataXceiver for:7494 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200 Call trace: __switch_to+0x14c/0x240 __schedule+0x82c/0xdd0 schedule+0x58/0xf0 io_schedule+0x24/0xa0 __folio_lock+0x130/0x300 migrate_pages_batch+0x378/0x918 migrate_pages+0x350/0x700 compact_zone+0x63c/0xb38 compact_zone_order+0xc0/0x118 try_to_compact_pages+0xb0/0x280 __alloc_pages_direct_compact+0x98/0x248 __alloc_pages+0x510/0x1110 alloc_pages+0x9c/0x130 folio_alloc+0x20/0x78 filemap_alloc_folio+0x8c/0x1b0 page_cache_ra_order+0x174/0x308 ondemand_readahead+0x1c8/0x2b8 page_cache_async_ra+0x68/0xb8 filemap_readahead.isra.0+0x64/0xa8 filemap_get_pages+0x3fc/0x5b0 filemap_splice_read+0xf4/0x280 ext4_file_splice_read+0x2c/0x48 [ext4] vfs_splice_read.part.0+0xa8/0x118 splice_direct_to_actor+0xbc/0x288 do_splice_direct+0x9c/0x108 do_sendfile+0x328/0x468 __arm64_sys_sendfile64+0x8c/0x148 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x1f8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x188/0x190
Link: https://lkml.kernel.org/r/20240426112938.124740-1-wangkefeng.wang@huawei.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10 |
|
#
de5c36ab |
| 04-Jan-2024 |
Jan Kara <jack@suse.cz> |
readahead: avoid multiple marked readahead pages
[ Upstream commit ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ]
ra_alloc_folio() marks a page that should trigger next round of async readahead. Howev
readahead: avoid multiple marked readahead pages
[ Upstream commit ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ]
ra_alloc_folio() marks a page that should trigger next round of async readahead. However it rounds up computed index to the order of page being allocated. This can however lead to multiple consecutive pages being marked with readahead flag. Consider situation with index == 1, mark == 1, order == 0. We insert order 0 page at index 1 and mark it. Then we bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page at index 2 is marked as well. Then we bump order to 2, index is incremented to 4, mark gets rounded to 4 so page at index 4 is marked as well. The fact that multiple pages get marked within a single readahead window confuses the readahead logic and results in readahead window being trimmed back to 1. This situation is triggered in particular when maximum readahead window size is not a power of two (in the observed case it was 768 KB) and as a result sequential read throughput suffers.
Fix the problem by rounding 'mark' down instead of up. Because the index is naturally aligned to 'order', we are guaranteed 'rounded mark' == index iff 'mark' is within the page we are allocating at 'index' and thus exactly one page is marked with readahead flag as required by the readahead code and sequential read performance is restored.
This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix readahead with large folios"). The commit changed the rounding with the rationale:
"... we were setting the readahead flag on the folio which contains the last byte read from the block. This is wrong because we will trigger readahead at the end of the read without waiting to see if a subsequent read is going to use the pages we just read."
Although this is true, the fact is this was always the case with read sizes not aligned to folio boundaries and large folios in the page cache just make the situation more obvious (and frequent). Also for sequential read workloads it is better to trigger the readahead earlier rather than later. It is true that the difference in the rounding and thus earlier triggering of the readahead can result in reading more for semi-random workloads. However workloads really suffering from this seem to be rare. In particular I have verified that the workload described in commit b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of reading random 100k blocks from a file like:
[reader] bs=100k rw=randread numjobs=1 size=64g runtime=60s
is not impacted by the rounding change and achieves ~70MB/s in both cases.
[jack@suse.cz: fix one more place where mark rounding was done as well] Link: https://lkml.kernel.org/r/20240123153254.5206-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240104085839.21029-1-jack@suse.cz Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") Signed-off-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Guo Xuenan <guoxuenan@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6 |
|
#
b900eeff |
| 02-Oct-2023 |
Reuben Hawkins <reubenhwk@gmail.com> |
vfs: fix readahead(2) on block devices
[ Upstream commit 7116c0af4b8414b2f19fdb366eea213cbd9d91c2 ]
Readahead was factored to call generic_fadvise. That refactor added an S_ISREG restriction which
vfs: fix readahead(2) on block devices
[ Upstream commit 7116c0af4b8414b2f19fdb366eea213cbd9d91c2 ]
Readahead was factored to call generic_fadvise. That refactor added an S_ISREG restriction which broke readahead on block devices.
In addition to S_ISREG, this change checks S_ISBLK to fix block device readahead. There is no change in behavior with any file type besides block devices in this change.
Fixes: 3d8f7615319b ("vfs: implement readahead(2) using POSIX_FADV_WILLNEED") Signed-off-by: Reuben Hawkins <reubenhwk@gmail.com> Link: https://lore.kernel.org/r/20231003015704.2415-1-reubenhwk@gmail.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30 |
|
#
4f661701 |
| 19-May-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
filemap: Allow __filemap_get_folio to allocate large folios
Allow callers of __filemap_get_folio() to specify a preferred folio order in the FGP flags. This is only honoured in the FGP_CREATE path;
filemap: Allow __filemap_get_folio to allocate large folios
Allow callers of __filemap_get_folio() to specify a preferred folio order in the FGP flags. This is only honoured in the FGP_CREATE path; if there is already a folio in the page cache that covers the index, we will return it, no matter what its order is. No create-around is attempted; we will only create folios which start at the specified index. Unmodified callers will continue to allocate order 0 folios.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
994ec4e2 |
| 21-Jun-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm: remove unnecessary pagevec includes
These files no longer need pagevec.h, mostly due to function declarations being moved out of it.
Link: https://lkml.kernel.org/r/20230621164557.3510324-14-wi
mm: remove unnecessary pagevec includes
These files no longer need pagevec.h, mostly due to function declarations being moved out of it.
Link: https://lkml.kernel.org/r/20230621164557.3510324-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7 |
|
#
11a98042 |
| 16-Jan-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
readahead: convert readahead_expand() to use a folio
Replace the uses of page with a folio. Also add a missing test for workingset in the leading edge expansion.
Link: https://lkml.kernel.org/r/20
readahead: convert readahead_expand() to use a folio
Replace the uses of page with a folio. Also add a missing test for workingset in the leading edge expansion.
Link: https://lkml.kernel.org/r/20230116193941.2148487-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69 |
|
#
17604240 |
| 15-Sep-2022 |
Christoph Hellwig <hch@lst.de> |
mm: add PSI accounting around ->read_folio and ->readahead calls
PSI tries to account for the cost of bringing back in pages discarded by the MM LRU management. Currently the prime place for that i
mm: add PSI accounting around ->read_folio and ->readahead calls
PSI tries to account for the cost of bringing back in pages discarded by the MM LRU management. Currently the prime place for that is hooked into the bio submission path, which is a rather bad place:
- it does not actually account I/O for non-block file systems, of which we have many - it adds overhead and a layering violation to the block layer
Add the accounting into the two places in the core MM code that read pages into an address space by calling into ->read_folio and ->readahead so that the entire file system operations are covered, to broaden the coverage and allow removing the accounting in the block layer going forward.
As psi_memstall_enter can deal with nested calls this will not lead to double accounting even while the bio annotations are still present.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220915094200.139713-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49 |
|
#
00fa15e0 |
| 20-Jun-2022 |
Alistair Popple <apopple@nvidia.com> |
filemap: Fix serialization adding transparent huge pages to page cache
Commit 793917d997df ("mm/readahead: Add large folio readahead") introduced support for using large folios for filebacked pages
filemap: Fix serialization adding transparent huge pages to page cache
Commit 793917d997df ("mm/readahead: Add large folio readahead") introduced support for using large folios for filebacked pages if the filesystem supports it.
page_cache_ra_order() was introduced to allocate and add these large folios to the page cache. However adding pages to the page cache should be serialized against truncation and hole punching by taking invalidate_lock. Not doing so can lead to data races resulting in stale data getting added to the page cache and marked up-to-date. See commit 730633f0b7f9 ("mm: Protect operations adding pages to page cache with invalidate_lock") for more details.
This issue was found by inspection but a testcase revealed it was possible to observe in practice on XFS. Fix this by taking invalidate_lock in page_cache_ra_order(), to mirror what is done for the non-thp case in page_cache_ra_unbounded().
Signed-off-by: Alistair Popple <apopple@nvidia.com> Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v5.15.48, v5.15.47, v5.15.46 |
|
#
6bf74cdd |
| 07-Jun-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
filemap: Don't release a locked folio
We must hold a reference over the call to filemap_release_folio(), otherwise the page cache will put the last reference to the folio before we unlock it, leadin
filemap: Don't release a locked folio
We must hold a reference over the call to filemap_release_folio(), otherwise the page cache will put the last reference to the folio before we unlock it, leading to splats like this:
BUG: Bad page state in process u8:5 pfn:1ab1f4 page:ffffea0006ac7d00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x28b1de pfn:0x1ab1f4 flags: 0x17ff80000040001(locked|reclaim|node=0|zone=2|lastcpupid=0xfff) raw: 017ff80000040001 dead000000000100 dead000000000122 0000000000000000 raw: 000000000028b1de 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
It's an error path, so it doesn't see much testing.
Reported-by: Darrick J. Wong <djwong@kernel.org> Fixes: a42634a6c07d ("readahead: Use a folio in read_pages()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37 |
|
#
7e0a1265 |
| 29-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm,fs: Remove aops->readpage
With all implementations of aops->readpage converted to aops->read_folio, we can stop checking whether it's set and remove the member from aops.
Signed-off-by: Matthew
mm,fs: Remove aops->readpage
With all implementations of aops->readpage converted to aops->read_folio, we can stop checking whether it's set and remove the member from aops.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
#
5efe7448 |
| 29-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Introduce aops->read_folio
Change all the callers of ->readpage to call ->read_folio in preference, if it exists. This is a transitional duplication, and will be removed by the end of the serie
fs: Introduce aops->read_folio
Change all the callers of ->readpage to call ->read_folio in preference, if it exists. This is a transitional duplication, and will be removed by the end of the series.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v5.15.36, v5.15.35, v5.15.34, v5.15.33 |
|
#
a42634a6 |
| 31-Mar-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
readahead: Use a folio in read_pages()
Handle multi-page folios correctly and removes a few calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Chris
readahead: Use a folio in read_pages()
Handle multi-page folios correctly and removes a few calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
b9ff43dd |
| 27-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm/readahead: Fix readahead with large folios
Reading 100KB chunks from a big file (eg dd bs=100K) leads to poor readahead behaviour. Studying the traces in detail, I noticed two problems.
The fir
mm/readahead: Fix readahead with large folios
Reading 100KB chunks from a big file (eg dd bs=100K) leads to poor readahead behaviour. Studying the traces in detail, I noticed two problems.
The first is that we were setting the readahead flag on the folio which contains the last byte read from the block. This is wrong because we will trigger readahead at the end of the read without waiting to see if a subsequent read is going to use the pages we just read. Instead, we need to set the readahead flag on the first folio _after_ the one which contains the last byte that we're reading.
The second is that we were looking for the index of the folio with the readahead flag set to exactly match the start + size - async_size. If we've rounded this, either down (as previously) or up (as now), we'll think we hit a folio marked as readahead by a different read, and try to read the wrong pages. So round the expected index to the order of the folio we hit.
Reported-by: Guo Xuenan <guoxuenan@huawei.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
#
c97ab271 |
| 19-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
blk-cgroup: remove unneeded includes from <linux/blk-cgroup.h>
Remove all the includes that aren't actually needed from <linux/blk-cgroup.h> and push them to the actual source files where needed.
S
blk-cgroup: remove unneeded includes from <linux/blk-cgroup.h>
Remove all the includes that aren't actually needed from <linux/blk-cgroup.h> and push them to the actual source files where needed.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
59c10c52 |
| 05-Apr-2022 |
Guo Ren <guoren@linux.alibaba.com> |
riscv: compat: syscall: Add compat_sys_call_table implementation
Implement compat sys_call_table and some system call functions: truncate64, ftruncate64, fallocate, pread64, pwrite64, sync_file_rang
riscv: compat: syscall: Add compat_sys_call_table implementation
Implement compat sys_call_table and some system call functions: truncate64, ftruncate64, fallocate, pread64, pwrite64, sync_file_range, readahead, fadvise64_64 which need argument translation.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20220405071314.3225832-12-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
#
1e470280 |
| 31-Mar-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
readahead: Update comments
- Refer to folios where appropriate, not pages (Matthew Wilcox) - Eliminate references to the internal PG_readhead - Use "readahead" consistently - not "read-ahead" or
readahead: Update comments
- Refer to folios where appropriate, not pages (Matthew Wilcox) - Eliminate references to the internal PG_readhead - Use "readahead" consistently - not "read-ahead" or "read ahead" (mostly Neil Brown) - Clarify some sections that, on reflection, weren't very clear (Neil Brown) - Minor punctuation/spelling fixes (Neil Brown)
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
#
b4e089d7 |
| 31-Mar-2022 |
Christoph Hellwig <hch@lst.de> |
mm: remove the skip_page argument to read_pages
The skip_page argument to read_pages controls if rac->_index is incremented before returning from the function. Just open code that in the callers.
mm: remove the skip_page argument to read_pages
The skip_page argument to read_pages controls if rac->_index is incremented before returning from the function. Just open code that in the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
#
dfd8b4fc |
| 31-Mar-2022 |
Christoph Hellwig <hch@lst.de> |
mm: remove the pages argument to read_pages
This is always an empty list or NULL with the removal of the ->readahead support, so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by
mm: remove the pages argument to read_pages
This is always an empty list or NULL with the removal of the ->readahead support, so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
Revision tags: v5.15.32 |
|
#
704528d8 |
| 23-Mar-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Remove ->readpages address space operation
All filesystems have now been converted to use ->readahead, so remove the ->readpages operation and fix all the comments that used to refer to it.
Sig
fs: Remove ->readpages address space operation
All filesystems have now been converted to use ->readahead, so remove the ->readpages operation and fix all the comments that used to refer to it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
Revision tags: v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17 |
|
#
ebf921a9 |
| 22-Jan-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
readahead: Remove read_cache_pages()
With no remaining users, remove this function and the related infrastructure.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christop
readahead: Remove read_cache_pages()
With no remaining users, remove this function and the related infrastructure.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|