#
a633f5a3 |
| 22-Feb-2014 |
Lukas Czerner <lczerner@redhat.com> |
ext4: translate fallocate mode bits to strings
Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
19ea8060 |
| 16-Feb-2014 |
Theodore Ts'o <tytso@mit.edu> |
ext4: don't leave i_crtime.tv_sec uninitialized
If the i_crtime field is not present in the inode, don't leave the field uninitialized.
Fixes: ef7f38359 ("ext4: Add nanosecond timestamps") Reported
ext4: don't leave i_crtime.tv_sec uninitialized
If the i_crtime field is not present in the inode, don't leave the field uninitialized.
Fixes: ef7f38359 ("ext4: Add nanosecond timestamps") Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
show more ...
|
Revision tags: v3.14-rc3, v3.14-rc2, v3.14-rc1, v3.13, v3.13-rc8, v3.13-rc7, v3.13-rc6, v3.13-rc5 |
|
#
f5a44db5 |
| 20-Dec-2013 |
Theodore Ts'o <tytso@mit.edu> |
ext4: add explicit casts when masking cluster sizes
The missing casts can cause the high 64-bits of the physical blocks to be lost. Set up new macros which allows us to make sure the right thing ha
ext4: add explicit casts when masking cluster sizes
The missing casts can cause the high 64-bits of the physical blocks to be lost. Set up new macros which allows us to make sure the right thing happen, even if at some point we end up supporting larger logical block numbers.
Thanks to the Emese Revfy and the PaX security team for reporting this issue.
Reported-by: PaX Team <pageexec@freemail.hu> Reported-by: Emese Revfy <re.emese@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
show more ...
|
Revision tags: v3.13-rc4, v3.13-rc3, v3.13-rc2, v3.13-rc1 |
|
#
3f61c0cc |
| 11-Nov-2013 |
Andreas Dilger <adilger@dilger.ca> |
ext4: add prototypes for macro-generated functions
It isn't very easy to find the declarations for the functions created by EXT4_INODE_BIT_FNS() because the names are generated by macros:
ext4_
ext4: add prototypes for macro-generated functions
It isn't very easy to find the declarations for the functions created by EXT4_INODE_BIT_FNS() because the names are generated by macros:
ext4_test_inode_flag, ext4_set_inode_flag, ext4_clear_inode_flag ext4_test_inode_state, ext4_set_inode_state, ext4_clear_inode_state
Add explicit declarations for these functions so that grep and tags can find them.
Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
Revision tags: v3.12, v3.12-rc7, v3.12-rc6, v3.12-rc5, v3.12-rc4, v3.12-rc3, v3.12-rc2, v3.12-rc1, v3.11, v3.11-rc7, v3.11-rc6, v3.11-rc5, v3.11-rc4, v3.11-rc3, v3.11-rc2, v3.11-rc1, v3.10, v3.10-rc7, v3.10-rc6, v3.10-rc5, v3.10-rc4, v3.10-rc3, v3.10-rc2, v3.10-rc1, v3.9, v3.9-rc8, v3.9-rc7, v3.9-rc6, v3.9-rc5, v3.9-rc4, v3.9-rc3, v3.9-rc2, v3.9-rc1, v3.8, v3.8-rc7, v3.8-rc6, v3.8-rc5, v3.8-rc4, v3.8-rc3, v3.8-rc2, v3.8-rc1, v3.7, v3.7-rc8, v3.7-rc7, v3.7-rc6, v3.7-rc5, v3.7-rc4, v3.7-rc3, v3.7-rc2, v3.7-rc1, v3.6, v3.6-rc7, v3.6-rc6, v3.6-rc5, v3.6-rc4, v3.6-rc3, v3.6-rc2, v3.6-rc1, v3.5, v3.5-rc7, v3.5-rc6, v3.5-rc5, v3.5-rc4, v3.5-rc3, v3.5-rc2, v3.5-rc1, v3.4, v3.4-rc7, v3.4-rc6, v3.4-rc5, v3.4-rc4 |
|
#
375e289e |
| 18-Apr-2012 |
J. Bruce Fields <bfields@redhat.com> |
vfs: pull ext4's double-i_mutex-locking into common code
We want to do this elsewhere as well.
Also catch any attempts to use it for directories (where this ordering would conflict with ancestor-fi
vfs: pull ext4's double-i_mutex-locking into common code
We want to do this elsewhere as well.
Also catch any attempts to use it for directories (where this ordering would conflict with ancestor-first directory ordering in lock_rename).
Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dave Chinner <david@fromorbit.com> Acked-by: Jeff Layton <jlayton@redhat.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
efbed4dc |
| 17-Oct-2013 |
Theodore Ts'o <tytso@mit.edu> |
ext4: add ratelimiting to ext4 messages
In the case of a storage device that suddenly disappears, or in the case of significant file system corruption, this can result in a huge flood of messages be
ext4: add ratelimiting to ext4 messages
In the case of a storage device that suddenly disappears, or in the case of significant file system corruption, this can result in a huge flood of messages being sent to the console. This can overflow the file system containing /var/log/messages, or if a serial console is configured, this can slow down the system so much that a hardware watchdog can end up triggering forcing a system reboot.
Google-Bug-Id: 7258357
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
7b7a8665 |
| 04-Sep-2013 |
Christoph Hellwig <hch@infradead.org> |
direct-io: Implement generic deferred AIO completions
Add support to the core direct-io code to defer AIO completions to user context using a workqueue. This replaces opencoded and less efficient c
direct-io: Implement generic deferred AIO completions
Add support to the core direct-io code to defer AIO completions to user context using a workqueue. This replaces opencoded and less efficient code in XFS and ext4 (we save a memory allocation for each direct IO) and will be needed to properly support O_(D)SYNC for AIO.
The communication between the filesystem and the direct I/O code requires a new buffer head flag, which is a bit ugly but not avoidable until the direct I/O code stops abusing the buffer_head structure for communicating with the filesystems.
Currently this creates a per-superblock unbound workqueue for these completions, which is taken from an earlier patch by Jan Kara. I'm not really convinced about this use and would prefer a "normal" global workqueue with a high concurrency limit, but this needs further discussion.
JK: Fixed ext4 part, dynamic allocation of the workqueue.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
87a39389 |
| 28-Aug-2013 |
Darrick J. Wong <darrick.wong@oracle.com> |
ext4: mark block group as corrupt on inode bitmap error
If we detect either a discrepancy between the inode bitmap and the inode counts or the inode bitmap fails to pass validation checks, mark the
ext4: mark block group as corrupt on inode bitmap error
If we detect either a discrepancy between the inode bitmap and the inode counts or the inode bitmap fails to pass validation checks, mark the block group corrupt and refuse to allocate or deallocate inodes from the group.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
163a203d |
| 28-Aug-2013 |
Darrick J. Wong <darrick.wong@oracle.com> |
ext4: mark block group as corrupt on block bitmap error
When we notice a block-bitmap corruption (because of device failure or something else), we should mark this group as corrupt and prevent furth
ext4: mark block group as corrupt on block bitmap error
When we notice a block-bitmap corruption (because of device failure or something else), we should mark this group as corrupt and prevent further block allocations/deallocations from it. Currently, we end up generating one error message for every block in the bitmap. This potentially could make the system unstable as noticed in some bugs. With this patch, the error will be printed only the first time and mark the entire block group as corrupted. This prevents future access allocations/deallocations from it.
Also tested by corrupting the block bitmap and forcefully introducing the mb_free_blocks error: (1) create a largefile (2Gb) $ dd if=/dev/zero of=largefile oflag=direct bs=10485760 count=200 (2) umount filesystem. use dumpe2fs to see which block-bitmaps are in use by largefile and note their block numbers (3) use dd to zero-out the used block bitmaps $ dd if=/dev/zero of=/dev/hdc4 bs=4096 seek=14 count=8 oflag=direct (4) mount the FS and delete the largefile. (5) recreate the largefile. verify that the new largefile does not get any blocks from the groups marked as bad. Without the patch, we will see mb_free_blocks error for each bit in each zero'ed out bitmap at (4). With the patch, we only see the error once per blockgroup: [ 309.706803] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 15: 32768 clusters in bitmap, 0 in gd. blk grp corrupted. [ 309.720824] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 14: 32768 clusters in bitmap, 0 in gd. blk grp corrupted. [ 309.732858] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure [ 309.748321] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 13: 32768 clusters in bitmap, 0 in gd. blk grp corrupted. [ 309.760331] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure [ 309.769695] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 12: 32768 clusters in bitmap, 0 in gd. blk grp corrupted. [ 309.781721] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure [ 309.798166] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 11: 32768 clusters in bitmap, 0 in gd. blk grp corrupted. [ 309.810184] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure [ 309.819532] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 10: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
Google-Bug-Id: 7258357
[darrick.wong@oracle.com] Further modifications (by Darrick) to make more obvious that this corruption bit applies to blocks only. Set the corruption flag if the block group bitmap verification fails.
Original-author: Aditya Kali <adityakali@google.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
dbde0abe |
| 28-Aug-2013 |
Darrick J. Wong <darrick.wong@oracle.com> |
ext4: fix type declaration of ext4_validate_block_bitmap
The block_group parameter to ext4_validate_block_bitmap is both used as a ext4_group_t inside the function and the same type is passed in by
ext4: fix type declaration of ext4_validate_block_bitmap
The block_group parameter to ext4_validate_block_bitmap is both used as a ext4_group_t inside the function and the same type is passed in by all callers. We might as well use the typedef consistently instead of open-coding the 'unsigned int'.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
d7b2a00c |
| 28-Aug-2013 |
Zheng Liu <wenqing.lz@taobao.com> |
ext4: isolate ext4_extents.h file
After applied the commit (4a092d73), we have reduced the number of source files that need to #include ext4_extents.h. But we can do better.
This commit defines ex
ext4: isolate ext4_extents.h file
After applied the commit (4a092d73), we have reduced the number of source files that need to #include ext4_extents.h. But we can do better.
This commit defines ext4_zeroout_es() in extents.c and move EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in indirect.c and ioctl.c. Meanwhile we just need to include this file in extent_status.c when ES_AGGRESSIVE_TEST is defined. Otherwise, this commit removes a duplicated declaration in trace/events/ext4.h.
After applied this patch, we just need to include ext4_extents.h file in {super,migrate,move_extents,extents}.c, and it is easy for us to define a new extent disk layout.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
90e775b7 |
| 17-Aug-2013 |
Jan Kara <jack@suse.cz> |
ext4: fix lost truncate due to race with writeback
The following race can lead to a loss of i_disksize update from truncate thus resulting in a wrong inode size if the inode size isn't updated again
ext4: fix lost truncate due to race with writeback
The following race can lead to a loss of i_disksize update from truncate thus resulting in a wrong inode size if the inode size isn't updated again before inode is reclaimed:
ext4_setattr() mpage_map_and_submit_extent() EXT4_I(inode)->i_disksize = attr->ia_size; ... ... disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT /* False because i_size isn't * updated yet */ if (disksize > i_size_read(inode)) /* True, because i_disksize is * already truncated */ if (disksize > EXT4_I(inode)->i_disksize) /* Overwrite i_disksize * update from truncate */ ext4_update_i_disksize() i_size_write(inode, attr->ia_size);
For other places updating i_disksize such race cannot happen because i_mutex prevents these races. Writeback is the only place where we do not hold i_mutex and we cannot grab it there because of lock ordering.
We fix the race by doing both i_disksize and i_size update in truncate atomically under i_data_sem and in mpage_map_and_submit_extent() we move the check against i_size under i_data_sem as well.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
show more ...
|
#
7d734532 |
| 17-Aug-2013 |
Jan Kara <jack@suse.cz> |
ext4: fix warning in ext4_da_update_reserve_space()
reaim workfile.dbase test easily triggers warning in ext4_da_update_reserve_space():
EXT4-fs warning (device ram0): ext4_da_update_reserve_space:
ext4: fix warning in ext4_da_update_reserve_space()
reaim workfile.dbase test easily triggers warning in ext4_da_update_reserve_space():
EXT4-fs warning (device ram0): ext4_da_update_reserve_space:365: ino 12, allocated 1 with only 0 reserved metadata blocks (releasing 1 blocks with reserved 9 data blocks)
The problem is that (one of) tests creates file and then randomly writes to it with O_SYNC. That results in writing back pages of the file in random order so we create extents for written blocks say 0, 2, 4, 6, 8 - this last allocation also allocates new block for extents. Then we writeout block 1 so we have extents 0-2, 4, 6, 8 and we release indirect extent block because extents fit in the inode again. Then we writeout block 10 and we need to allocate indirect extent block again which triggers the warning because we don't have the reservation anymore.
Fix the problem by giving back freed metadata blocks resulting from extent merging into inode's reservation pool.
Signed-off-by: Jan Kara <jack@suse.cz>
show more ...
|
#
7869a4a6 |
| 16-Aug-2013 |
Theodore Ts'o <tytso@mit.edu> |
ext4: add support for extent pre-caching
Add a new fiemap flag which forces the all of the extents in an inode to be cached in the extent_status tree. This is critically important when using AIO to
ext4: add support for extent pre-caching
Add a new fiemap flag which forces the all of the extents in an inode to be cached in the extent_status tree. This is critically important when using AIO to a preallocated file, since if we need to read in blocks from the extent tree, the io_submit(2) system call becomes synchronous, and the AIO is no longer "A", which is bad.
In addition, for most files which have an external leaf tree block, the cost of caching the information in the extent status tree will be less than caching the entire 4k block in the buffer cache. So it is generally a win to keep the extent information cached.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
107a7bd3 |
| 16-Aug-2013 |
Theodore Ts'o <tytso@mit.edu> |
ext4: cache all of an extent tree's leaf block upon reading
When we read in an extent tree leaf block from disk, arrange to have all of its entries cached. In nearly all cases the in-memory represe
ext4: cache all of an extent tree's leaf block upon reading
When we read in an extent tree leaf block from disk, arrange to have all of its entries cached. In nearly all cases the in-memory representation will be more compact than the on-disk representation in the buffer cache, and it allows us to get the information without having to traverse the extent tree for successive extents.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
show more ...
|
#
a361293f |
| 16-Aug-2013 |
Jan Kara <jack@suse.cz> |
jbd2: Fix oops in jbd2_journal_file_inode()
Commit 0713ed0cde76438d05849f1537d3aab46e099475 added jbd2_journal_file_inode() call into ext4_block_zero_page_range(). However that function gets called
jbd2: Fix oops in jbd2_journal_file_inode()
Commit 0713ed0cde76438d05849f1537d3aab46e099475 added jbd2_journal_file_inode() call into ext4_block_zero_page_range(). However that function gets called from truncate path and thus inode needn't have jinode attached - that happens in ext4_file_open() but the file needn't be ever open since mount. Calling jbd2_journal_file_inode() without jinode attached results in the oops.
We fix the problem by attaching jinode to inode also in ext4_truncate() and ext4_punch_hole() when we are going to zero out partial blocks.
Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
aeb2817a |
| 01-Jul-2013 |
Ashish Sangwan <a.sangwan@samsung.com> |
ext4: pass inode pointer instead of file pointer to punch hole
No need to pass file pointer when we can directly pass inode pointer.
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off
ext4: pass inode pointer instead of file pointer to punch hole
No need to pass file pointer when we can directly pass inode pointer.
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
e7c96e8e |
| 01-Jul-2013 |
Joe Perches <joe@perches.com> |
ext4: reduce object size when !CONFIG_PRINTK
Reduce the object size ~10% could be useful for embedded systems.
Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and arguments, passing "
ext4: reduce object size when !CONFIG_PRINTK
Reduce the object size ~10% could be useful for embedded systems.
Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and arguments, passing " " to functions when !CONFIG_PRINTK and still verifying format and arguments with no_printk.
$ size fs/ext4/built-in.o* text data bss dec hex filename 239375 610 888 240873 3ace9 fs/ext4/built-in.o.new 264167 738 888 265793 40e41 fs/ext4/built-in.o.old
$ grep -E "CONFIG_EXT4|CONFIG_PRINTK" .config # CONFIG_PRINTK is not set CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT23=y CONFIG_EXT4_FS_POSIX_ACL=y # CONFIG_EXT4_FS_SECURITY is not set # CONFIG_EXT4_DEBUG is not set
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
d3922a77 |
| 01-Jul-2013 |
Zheng Liu <wenqing.lz@taobao.com> |
ext4: improve extent cache shrink mechanism to avoid to burn CPU time
Now we maintain an proper in-order LRU list in ext4 to reclaim entries from extent status tree when we are under heavy memory pr
ext4: improve extent cache shrink mechanism to avoid to burn CPU time
Now we maintain an proper in-order LRU list in ext4 to reclaim entries from extent status tree when we are under heavy memory pressure. For keeping this order, a spin lock is used to protect this list. But this lock burns a lot of CPU time. We can use the following steps to trigger it.
% cd /dev/shm % dd if=/dev/zero of=ext4-img bs=1M count=2k % mkfs.ext4 ext4-img % mount -t ext4 -o loop ext4-img /mnt % cd /mnt % for ((i=0;i<160;i++)); do truncate -s 64g $i; done % for ((i=0;i<160;i++)); do cp $i /dev/null &; done % perf record -a -g % perf report
This commit tries to fix this problem. Now a new member called i_touch_when is added into ext4_inode_info to record the last access time for an inode. Meanwhile we never need to keep a proper in-order LRU list. So this can avoid to burns some CPU time. When we try to reclaim some entries from extent status tree, we use list_sort() to get a proper in-order list. Then we traverse this list to discard some entries. In ext4_sb_info, we use s_es_last_sorted to record the last time of sorting this list. When we traverse the list, we skip the inode that is newer than this time, and move this inode to the tail of LRU list. When the head of the list is newer than s_es_last_sorted, we will sort the LRU list again.
In this commit, we break the loop if s_extent_cache_cnt == 0 because that means that all extents in extent status tree have been reclaimed.
Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is changed to save a local variable in these functions.
Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
725bebb2 |
| 17-May-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
[readdir] convert ext4
and trim the living hell out bogosities in inline dir case
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2f2e09eb |
| 06-Jun-2013 |
Theodore Ts'o <tytso@mit.edu> |
ext4: add sanity check to ext4_get_group_info()
The group number passed to ext4_get_group_info() should be valid, but let's add an assert to check this before we start creating a pointer based on th
ext4: add sanity check to ext4_get_group_info()
The group number passed to ext4_get_group_info() should be valid, but let's add an assert to check this before we start creating a pointer based on that group number and dereferencing it.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
5dc23bdd |
| 04-Jun-2013 |
Jan Kara <jack@suse.cz> |
ext4: remove ext4_ioend_wait()
Now that we clear PageWriteback after extent conversion, there's no need to wait for io_end processing in ext4_evict_inode(). Running AIO/DIO keeps file reference unt
ext4: remove ext4_ioend_wait()
Now that we clear PageWriteback after extent conversion, there's no need to wait for io_end processing in ext4_evict_inode(). Running AIO/DIO keeps file reference until aio_complete() is called so ext4_evict_inode() cannot be called. For io_end structures resulting from buffered IO waiting is happening because we wait for PageWriteback in truncate_inode_pages().
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
c724585b |
| 04-Jun-2013 |
Jan Kara <jack@suse.cz> |
ext4: don't wait for extent conversion in ext4_punch_hole()
We don't have to wait for extent conversion in ext4_punch_hole() as buffered IO for the punched range has been flushed and waited upon (th
ext4: don't wait for extent conversion in ext4_punch_hole()
We don't have to wait for extent conversion in ext4_punch_hole() as buffered IO for the punched range has been flushed and waited upon (thus all extent conversions for that range have completed). Also we wait for all DIO to finish using inode_dio_wait() so there cannot be any extent conversions pending due to direct IO.
Also remove ext4_flush_unwritten_io() since it's unused now.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
b0857d30 |
| 04-Jun-2013 |
Jan Kara <jack@suse.cz> |
ext4: defer clearing of PageWriteback after extent conversion
Currently PageWriteback bit gets cleared from put_io_page() called from ext4_end_bio(). This is somewhat inconvenient as extent tree is
ext4: defer clearing of PageWriteback after extent conversion
Currently PageWriteback bit gets cleared from put_io_page() called from ext4_end_bio(). This is somewhat inconvenient as extent tree is not fully updated at that time (unwritten extents are not marked as written) so we cannot read the data back yet. This design was dictated by lock ordering as we cannot start a transaction while PageWriteback bit is set (we could easily deadlock with ext4_da_writepages()). But now that we use transaction reservation for extent conversion, locking issues are solved and we can move PageWriteback bit clearing after extent conversion is done. As a result we can remove wait for unwritten extent conversion from ext4_sync_file() because it already implicitely happens through wait_on_page_writeback().
We implement deferring of PageWriteback clearing by queueing completed bios to appropriate io_end and processing all the pages when io_end is going to be freed instead of at the moment ext4_io_end() is called.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|
#
2e8fa54e |
| 04-Jun-2013 |
Jan Kara <jack@suse.cz> |
ext4: split extent conversion lists to reserved & unreserved parts
Now that we have extent conversions with reserved transaction, we have to prevent extent conversions without reserved transaction (
ext4: split extent conversion lists to reserved & unreserved parts
Now that we have extent conversions with reserved transaction, we have to prevent extent conversions without reserved transaction (from DIO code) to block these (as that would effectively void any transaction reservation we did). So split lists, work items, and work queues to reserved and unreserved parts.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
show more ...
|