Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43 |
|
#
925c86a1 |
| 01-Aug-2023 |
Christoph Hellwig <hch@lst.de> |
fs: add CONFIG_BUFFER_HEAD
Add a new config option that controls building the buffer_head code, and select it from all file systems and stacking drivers that need it.
For the block device nodes and
fs: add CONFIG_BUFFER_HEAD
Add a new config option that controls building the buffer_head code, and select it from all file systems and stacking drivers that need it.
For the block device nodes and alternative iomap based buffered I/O path is provided when buffer_head support is not enabled, and iomap needs a a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call into the buffer_head code when it doesn't exist.
Otherwise this is just Kconfig and ifdef changes.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.42, v6.1.41, v6.1.40, v6.1.39 |
|
#
4ce02c67 |
| 10-Jul-2023 |
Ritesh Harjani (IBM) <ritesh.list@gmail.com> |
iomap: Add per-block dirty state tracking to improve performance
When filesystem blocksize is less than folio size (either with mapping_large_folio_support() or with blocksize < pagesize) and when t
iomap: Add per-block dirty state tracking to improve performance
When filesystem blocksize is less than folio size (either with mapping_large_folio_support() or with blocksize < pagesize) and when the folio is uptodate in pagecache, then even a byte write can cause an entire folio to be written to disk during writeback. This happens because we currently don't have a mechanism to track per-block dirty state within struct iomap_folio_state. We currently only track uptodate state.
This patch implements support for tracking per-block dirty state in iomap_folio_state->state bitmap. This should help improve the filesystem write performance and help reduce write amplification.
Performance testing of below fio workload reveals ~16x performance improvement using nvme with XFS (4k blocksize) on Power (64K pagesize) FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.
1. <test_randwrite.fio> [global] ioengine=psync rw=randwrite overwrite=1 pre_read=1 direct=0 bs=4k size=1G dir=./ numjobs=8 fdatasync=1 runtime=60 iodepth=64 group_reporting=1
[fio-run]
2. Also our internal performance team reported that this patch improves their database workload performance by around ~83% (with XFS on Power)
Reported-by: Aravinda Herle <araherle@in.ibm.com> Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30 |
|
#
d6bb59a9 |
| 19-May-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
iomap: Create large folios in the buffered write path
Use the size of the write as a hint for the size of the folio to create.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-
iomap: Create large folios in the buffered write path
Use the size of the write as a hint for the size of the folio to create.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3 |
|
#
d3bff1fc |
| 21-Apr-2023 |
Ritesh Harjani (IBM) <ritesh.list@gmail.com> |
iomap: Remove IOMAP_DIO_NOSYNC unused dio flag
IOMAP_DIO_NOSYNC earlier was added for use in btrfs. But it seems for aio dsync writes this is not useful anyway. For aio dsync case, we we queue the r
iomap: Remove IOMAP_DIO_NOSYNC unused dio flag
IOMAP_DIO_NOSYNC earlier was added for use in btrfs. But it seems for aio dsync writes this is not useful anyway. For aio dsync case, we we queue the request and return -EIOCBQUEUED. Now, since IOMAP_DIO_NOSYNC doesn't let iomap_dio_complete() to call generic_write_sync(), hence we may lose the sync write.
Hence kill this flag as it is not in use by any FS now.
Tested-by: Disha Goel <disgoel@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8 |
|
#
8e81aa16 |
| 21-Jan-2023 |
Christoph Hellwig <hch@lst.de> |
iomap: remove IOMAP_F_ZONE_APPEND
No users left now that btrfs takes REQ_OP_WRITE bios from iomap and splits and converts them to REQ_OP_ZONE_APPEND internally.
Reviewed-by: Josef Bacik <josef@toxi
iomap: remove IOMAP_F_ZONE_APPEND
No users left now that btrfs takes REQ_OP_WRITE bios from iomap and splits and converts them to REQ_OP_ZONE_APPEND internally.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.7 |
|
#
471859f5 |
| 15-Jan-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
iomap: Rename page_ops to folio_ops
The operations in struct page_ops all operate on folios, so rename struct page_ops to struct folio_ops.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
iomap: Rename page_ops to folio_ops
The operations in struct page_ops all operate on folios, so rename struct page_ops to struct folio_ops.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> [djwong: port around not removing iomap_valid] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
c82abc23 |
| 15-Jan-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
iomap: Rename page_prepare handler to get_folio
The ->page_prepare() handler in struct iomap_page_ops is now somewhat misnamed, so rename it to ->get_folio().
Signed-off-by: Andreas Gruenbacher <ag
iomap: Rename page_prepare handler to get_folio
The ->page_prepare() handler in struct iomap_page_ops is now somewhat misnamed, so rename it to ->get_folio().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
9060bc4d |
| 15-Jan-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
iomap/gfs2: Get page in page_prepare handler
Change the iomap ->page_prepare() handler to get and return a locked folio instead of doing that in iomap_write_begin(). This allows to recover from out
iomap/gfs2: Get page in page_prepare handler
Change the iomap ->page_prepare() handler to get and return a locked folio instead of doing that in iomap_write_begin(). This allows to recover from out-of-memory situations in ->page_prepare(), which eliminates the corresponding error handling code in iomap_write_begin(). The ->put_folio() handler now also isn't called with NULL as the folio value anymore.
Filesystems are expected to use the iomap_get_folio() helper for getting locked folios in their ->page_prepare() handlers.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
98321b51 |
| 15-Jan-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
iomap: Add iomap_get_folio helper
Add an iomap_get_folio() helper that gets a folio reference based on an iomap iterator and an offset into the address space. Use it in iomap_write_begin().
Signed
iomap: Add iomap_get_folio helper
Add an iomap_get_folio() helper that gets a folio reference based on an iomap iterator and an offset into the address space. Use it in iomap_write_begin().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
40405ddd |
| 15-Jan-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
iomap: Rename page_done handler to put_folio
The ->page_done() handler in struct iomap_page_ops is now somewhat misnamed in that it mainly deals with unlocking and putting a folio, so rename it to -
iomap: Rename page_done handler to put_folio
The ->page_done() handler in struct iomap_page_ops is now somewhat misnamed in that it mainly deals with unlocking and putting a folio, so rename it to ->put_folio().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
80baab88 |
| 15-Jan-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
iomap/gfs2: Unlock and put folio in page_done handler
When an iomap defines a ->page_done() handler in its page_ops, delegate unlocking the folio and putting the folio reference to that handler.
Th
iomap/gfs2: Unlock and put folio in page_done handler
When an iomap defines a ->page_done() handler in its page_ops, delegate unlocking the folio and putting the folio reference to that handler.
This allows to fix a race between journaled data writes and folio writeback in gfs2: before this change, gfs2_iomap_page_done() was called after unlocking the folio, so writeback could start writing back the folio's buffers before they could be marked for writing to the journal. Also, try_to_free_buffers() could free the buffers before gfs2_iomap_page_done() was done adding the buffers to the current current transaction. With this change, gfs2_iomap_page_done() adds the buffers to the current transaction while the folio is still locked, so the problems described above can no longer occur.
The only current user of ->page_done() is gfs2, so other filesystems are not affected. To catch out any out-of-tree users, switch from a page to a folio in ->page_done().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11 |
|
#
d7b64041 |
| 28-Nov-2022 |
Dave Chinner <dchinner@redhat.com> |
iomap: write iomap validity checks
A recent multithreaded write data corruption has been uncovered in the iomap write code. The core of the problem is partial folio writes can be flushed to disk whi
iomap: write iomap validity checks
A recent multithreaded write data corruption has been uncovered in the iomap write code. The core of the problem is partial folio writes can be flushed to disk while a new racing write can map it and fill the rest of the page:
writeback new write
allocate blocks blocks are unwritten submit IO ..... map blocks iomap indicates UNWRITTEN range loop { lock folio copyin data ..... IO completes runs unwritten extent conv blocks are marked written <iomap now stale> get next folio }
Now add memory pressure such that memory reclaim evicts the partially written folio that has already been written to disk.
When the new write finally gets to the last partial page of the new write, it does not find it in cache, so it instantiates a new page, sees the iomap is unwritten, and zeros the part of the page that it does not have data from. This overwrites the data on disk that was originally written.
The full description of the corruption mechanism can be found here:
https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/
To solve this problem, we need to check whether the iomap is still valid after we lock each folio during the write. We have to do it after we lock the page so that we don't end up with state changes occurring while we wait for the folio to be locked.
Hence we need a mechanism to be able to check that the cached iomap is still valid (similar to what we already do in buffered writeback), and we need a way for ->begin_write to back out and tell the high level iomap iterator that we need to remap the remaining write range.
The iomap needs to grow some storage for the validity cookie that the filesystem provides to travel with the iomap. XFS, in particular, also needs to know some more information about what the iomap maps (attribute extents rather than file data extents) to for the validity cookie to cover all the types of iomaps we might need to validate.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v6.0.10, v5.15.80 |
|
#
9c7babf9 |
| 22-Nov-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs,iomap: move delalloc punching to iomap
Because that's what Christoph wants for this error handling path only XFS uses.
It requires a new iomap export for handling errors over delalloc ranges. T
xfs,iomap: move delalloc punching to iomap
Because that's what Christoph wants for this error handling path only XFS uses.
It requires a new iomap export for handling errors over delalloc ranges. This is basically the XFS code as is stands, but even though Christoph wants this as iomap funcitonality, we still have to call it from the filesystem specific ->iomap_end callback, and call into the iomap code with yet another filesystem specific callback to punch the delalloc extent within the defined ranges.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46 |
|
#
2ec810d5 |
| 06-Jun-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm/migrate: Add filemap_migrate_folio()
There is nothing iomap-specific about iomap_migratepage(), and it fits a pattern used by several other filesystems, so move it to mm/migrate.c, convert it to
mm/migrate: Add filemap_migrate_folio()
There is nothing iomap-specific about iomap_migratepage(), and it fits a pattern used by several other filesystems, so move it to mm/migrate.c, convert it to be filemap_migrate_folio() and convert the iomap filesystems to use it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
478af190 |
| 22-Jul-2022 |
Christoph Hellwig <hch@lst.de> |
iomap: remove iomap_writepage
Unused now.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.co
iomap: remove iomap_writepage
Unused now.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
36518b6b |
| 07-Jun-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
teach iomap_dio_rw() to suppress dsync
New flag, equivalent to removal of IOCB_DSYNC from iocb flags. This mimics what btrfs is doing (and that's what btrfs will switch to). However, I'm not at all
teach iomap_dio_rw() to suppress dsync
New flag, equivalent to removal of IOCB_DSYNC from iocb flags. This mimics what btrfs is doing (and that's what btrfs will switch to). However, I'm not at all sure that we want to suppress REQ_FUA for those - all btrfs hack really cares about is suppression of generic_write_sync(). For now let's keep the existing behaviour, but I really want to hear more detailed arguments pro or contra.
[folded brain fix from willy]
Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
Revision tags: v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38 |
|
#
786f847f |
| 05-May-2022 |
Christoph Hellwig <hch@lst.de> |
iomap: add per-iomap_iter private data
Allow the file system to keep state for all iterations. For now only wire it up for direct I/O as there is an immediate need for it there.
Reviewed-by: Darri
iomap: add per-iomap_iter private data
Allow the file system to keep state for all iterations. For now only wire it up for direct I/O as there is an immediate need for it there.
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
908c5490 |
| 05-May-2022 |
Christoph Hellwig <hch@lst.de> |
iomap: allow the file system to provide a bio_set for direct I/O
Allow the file system to provide a specific bio_set for allocating direct I/O bios. This will allow file systems that use the ->subm
iomap: allow the file system to provide a bio_set for direct I/O
Allow the file system to provide a specific bio_set for allocating direct I/O bios. This will allow file systems that use the ->submit_io hook to stash away additional information for file system use.
To make use of this additional space for information in the completion path, the file system needs to override the ->bi_end_io callback and then call back into iomap, so export iomap_dio_bio_end_io for that.
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.37 |
|
#
8597447d |
| 30-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
iomap: Convert to release_folio
Change all the filesystems which used iomap_releasepage to use the new function.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layto
iomap: Convert to release_folio
Change all the filesystems which used iomap_releasepage to use the new function.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
show more ...
|
#
7479c505 |
| 29-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Convert iomap_readpage to iomap_read_folio
A straightforward conversion as iomap_readpage already worked in folios.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
Revision tags: v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23 |
|
#
d82354f6 |
| 09-Feb-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
iomap: Remove iomap_invalidatepage()
Use iomap_invalidate_folio() in all the iomap-based filesystems and rename the iomap_invalidatepage tracepoint.
Signed-off-by: Matthew Wilcox (Oracle) <willy@in
iomap: Remove iomap_invalidatepage()
Use iomap_invalidate_folio() in all the iomap-based filesystems and rename the iomap_invalidatepage tracepoint.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
show more ...
|
#
2e7e80f7 |
| 09-Feb-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Convert is_partially_uptodate to folios
Since the uptodate property is maintained on a per-folio basis, the is_partially_uptodate method should also take a folio. Fix the types at the same time
fs: Convert is_partially_uptodate to folios
Since the uptodate property is maintained on a per-folio basis, the is_partially_uptodate method should also take a folio. Fix the types at the same time so it's clear that it returns true/false and takes the count in bytes, not blocks.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
show more ...
|
Revision tags: v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17 |
|
#
ebb7fb15 |
| 26-Jan-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs, iomap: limit individual ioend chain lengths in writeback
Trond Myklebust reported soft lockups in XFS IO completion such as this:
watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/1
xfs, iomap: limit individual ioend chain lengths in writeback
Trond Myklebust reported soft lockups in XFS IO completion such as this:
watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106] CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1 Workqueue: xfs-conv/md127 xfs_end_io [xfs] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20 Call Trace: wake_up_page_bit+0x8a/0x110 iomap_finish_ioend+0xd7/0x1c0 iomap_finish_ioends+0x7f/0xb0 xfs_end_ioend+0x6b/0x100 [xfs] xfs_end_io+0xb9/0xe0 [xfs] process_one_work+0x1a7/0x360 worker_thread+0x1fa/0x390 kthread+0x116/0x130 ret_from_fork+0x35/0x40
Ioends are processed as an atomic completion unit when all the chained bios in the ioend have completed their IO. Logically contiguous ioends can also be merged and completed as a single, larger unit. Both of these things can be problematic as both the bio chains per ioend and the size of the merged ioends processed as a single completion are both unbound.
If we have a large sequential dirty region in the page cache, write_cache_pages() will keep feeding us sequential pages and we will keep mapping them into ioends and bios until we get a dirty page at a non-sequential file offset. These large sequential runs can will result in bio and ioend chaining to optimise the io patterns. The pages iunder writeback are pinned within these chains until the submission chaining is broken, allowing the entire chain to be completed. This can result in huge chains being processed in IO completion context.
We get deep bio chaining if we have large contiguous physical extents. We will keep adding pages to the current bio until it is full, then we'll chain a new bio to keep adding pages for writeback. Hence we can build bio chains that map millions of pages and tens of gigabytes of RAM if the page cache contains big enough contiguous dirty file regions. This long bio chain pins those pages until the final bio in the chain completes and the ioend can iterate all the chained bios and complete them.
OTOH, if we have a physically fragmented file, we end up submitting one ioend per physical fragment that each have a small bio or bio chain attached to them. We do not chain these at IO submission time, but instead we chain them at completion time based on file offset via iomap_ioend_try_merge(). Hence we can end up with unbound ioend chains being built via completion merging.
XFS can then do COW remapping or unwritten extent conversion on that merged chain, which involves walking an extent fragment at a time and running a transaction to modify the physical extent information. IOWs, we merge all the discontiguous ioends together into a contiguous file range, only to then process them individually as discontiguous extents.
This extent manipulation is computationally expensive and can run in a tight loop, so merging logically contiguous but physically discontigous ioends gains us nothing except for hiding the fact the fact we broke the ioends up into individual physical extents at submission and then need to loop over those individual physical extents at completion.
Hence we need to have mechanisms to limit ioend sizes and to break up completion processing of large merged ioend chains:
1. bio chains per ioend need to be bound in length. Pure overwrites go straight to iomap_finish_ioend() in softirq context with the exact bio chain attached to the ioend by submission. Hence the only way to prevent long holdoffs here is to bound ioend submission sizes because we can't reschedule in softirq context.
2. iomap_finish_ioends() has to handle unbound merged ioend chains correctly. This relies on any one call to iomap_finish_ioend() being bound in runtime so that cond_resched() can be issued regularly as the long ioend chain is processed. i.e. this relies on mechanism #1 to limit individual ioend sizes to work correctly.
3. filesystems have to loop over the merged ioends to process physical extent manipulations. This means they can loop internally, and so we break merging at physical extent boundaries so the filesystem can easily insert reschedule points between individual extent manipulations.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reported-and-tested-by: Trond Myklebust <trondmy@hammerspace.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60 |
|
#
6e478521 |
| 30-Jul-2021 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
iomap,xfs: Convert ->discard_page to ->discard_folio
XFS has the only implementation of ->discard_page today, so convert it to use folios in the same patch as converting the API.
Signed-off-by: Mat
iomap,xfs: Convert ->discard_page to ->discard_folio
XFS has the only implementation of ->discard_page today, so convert it to use folios in the same patch as converting the API.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116 |
|
#
8306a5f5 |
| 28-Apr-2021 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
iomap: Add iomap_invalidate_folio
Keep iomap_invalidatepage around as a wrapper for use in address_space operations.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christ
iomap: Add iomap_invalidate_folio
Keep iomap_invalidatepage around as a wrapper for use in address_space operations.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|