Revision tags: v5.16 |
|
#
a2e3965d |
| 28-Dec-2021 |
xu xin <xu.xin16@zte.com.cn> |
ext4: use BUG_ON instead of if condition followed by BUG
BUG_ON would be better.
This issue was detected with the help of Coccinelle.
Reported-by: Zeal robot <zealci@zte.com.cn> Reviewed-by: Lukas
ext4: use BUG_ON instead of if condition followed by BUG
BUG_ON would be better.
This issue was detected with the help of Coccinelle.
Reported-by: Zeal robot <zealci@zte.com.cn> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Link: https://lore.kernel.org/r/20211228073252.580296-1-xu.xin16@zte.com.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
Revision tags: v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1 |
|
#
2327fb2e |
| 03-Nov-2021 |
Lukas Czerner <lczerner@redhat.com> |
ext4: change s_last_trim_minblks type to unsigned long
There is no good reason for the s_last_trim_minblks to be atomic. There is no data integrity needed and there is no real danger in setting and
ext4: change s_last_trim_minblks type to unsigned long
There is no good reason for the s_last_trim_minblks to be atomic. There is no data integrity needed and there is no real danger in setting and reading it in a racy manner. Change it to be unsigned long, the same type as s_clusters_per_group which is the maximum that's allowed.
Signed-off-by: Lukas Czerner <lczerner@redhat.com> Suggested-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20211103145122.17338-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
bbc605cd |
| 13-Dec-2021 |
Lukas Czerner <lczerner@redhat.com> |
ext4: implement support for get/set fs label
Implement support for FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls for online reading and setting of file system label.
ext4_ioctl_getlabel() is simpl
ext4: implement support for get/set fs label
Implement support for FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls for online reading and setting of file system label.
ext4_ioctl_getlabel() is simple, just get the label from the primary superblock. This might not be the first sb on the file system if 'sb=' mount option is used.
In ext4_ioctl_setlabel() we update what ext4 currently views as a primary superblock and then proceed to update backup superblocks. There are two caveats: - the primary superblock might not be the first superblock and so it might not be the one used by userspace tools if read directly off the disk. - because the primary superblock might not be the first superblock we potentialy have to update it as part of backup superblock update. However the first sb location is a bit more complicated than the rest so we have to account for that.
The superblock modification is created generic enough so the infrastructure can be used for other potential superblock modification operations, such as chaning UUID.
Tested with generic/492 with various configurations. I also checked the behavior with 'sb=' mount options, including very large file systems with and without sparse_super/sparse_super2.
Signed-off-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20211213135618.43303-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
ab047d51 |
| 23-Dec-2021 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
ext4: destroy ext4_fc_dentry_cachep kmemcache on module removal
The kmemcache for ext4_fc_dentry_cachep remains registered after module removal.
Destroy ext4_fc_dentry_cachep kmemcache on module re
ext4: destroy ext4_fc_dentry_cachep kmemcache on module removal
The kmemcache for ext4_fc_dentry_cachep remains registered after module removal.
Destroy ext4_fc_dentry_cachep kmemcache on module removal.
Fixes: aa75f4d3daaeb ("ext4: main fast-commit commit path") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211110134640.lyku5vklvdndw6uk@linutronix.de Link: https://lore.kernel.org/r/YbiK3JetFFl08bd7@linutronix.de Link: https://lore.kernel.org/r/20211223164436.2628390-1-bigeasy@linutronix.de Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
show more ...
|
#
0915e464 |
| 23-Dec-2021 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
ext4: simplify updating of fast commit stats
Move fast commit stats updating logic to a separate function from ext4_fc_commit(). This significantly improves readability of ext4_fc_commit().
Signed-
ext4: simplify updating of fast commit stats
Move fast commit stats updating logic to a separate function from ext4_fc_commit(). This significantly improves readability of ext4_fc_commit().
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-4-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
7bbbe241 |
| 23-Dec-2021 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
ext4: drop ineligible txn start stop APIs
This patch drops ext4_fc_start_ineligible() and ext4_fc_stop_ineligible() APIs. Fast commit ineligible transactions should simply call ext4_fc_mark_ineligib
ext4: drop ineligible txn start stop APIs
This patch drops ext4_fc_start_ineligible() and ext4_fc_stop_ineligible() APIs. Fast commit ineligible transactions should simply call ext4_fc_mark_ineligible() after starting the trasaction.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-3-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
cd913c76 |
| 29-Nov-2021 |
Christoph Hellwig <hch@lst.de> |
dax: return the partition offset from fs_dax_get_by_bdev
Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file
dax: return the partition offset from fs_dax_get_by_bdev
Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file systems have it at hand for use during I/O.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51 |
|
#
31d21d21 |
| 19-Jul-2021 |
Xiyu Yang <xiyuyang19@fudan.edu.cn> |
ext4: convert from atomic_t to refcount_t on ext4_io_end->count
refcount_t type and corresponding API can protect refcounters from accidental underflow and overflow and further use-after-free situat
ext4: convert from atomic_t to refcount_t on ext4_io_end->count
refcount_t type and corresponding API can protect refcounters from accidental underflow and overflow and further use-after-free situations.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1626674355-55795-1-git-send-email-xiyuyang19@fudan.edu.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
d973bc80 |
| 19-May-2022 |
Eric Biggers <ebiggers@google.com> |
ext4: only allow test_dummy_encryption when supported
commit 5f41fdaea63ddf96d921ab36b2af4a90ccdb5744 upstream.
Make the test_dummy_encryption mount option require that the encrypt feature flag be
ext4: only allow test_dummy_encryption when supported
commit 5f41fdaea63ddf96d921ab36b2af4a90ccdb5744 upstream.
Make the test_dummy_encryption mount option require that the encrypt feature flag be already enabled on the filesystem, rather than automatically enabling it. Practically, this means that "-O encrypt" will need to be included in MKFS_OPTIONS when running xfstests with the test_dummy_encryption mount option. (ext4/053 also needs an update.)
Moreover, as long as the preconditions for test_dummy_encryption are being tightened anyway, take the opportunity to start rejecting it when !CONFIG_FS_ENCRYPTION rather than ignoring it.
The motivation for requiring the encrypt feature flag is that:
- Having the filesystem auto-enable feature flags is problematic, as it bypasses the usual sanity checks. The specific issue which came up recently is that in kernel versions where ext4 supports casefold but not encrypt+casefold (v5.1 through v5.10), the kernel will happily add the encrypt flag to a filesystem that has the casefold flag, making it unmountable -- but only for subsequent mounts, not the initial one. This confused the casefold support detection in xfstests, causing generic/556 to fail rather than be skipped.
- The xfstests-bld test runners (kvm-xfstests et al.) already use the required mkfs flag, so they will not be affected by this change. Only users of test_dummy_encryption alone will be affected. But, this option has always been for testing only, so it should be fine to require that the few users of this option update their test scripts.
- f2fs already requires it (for its equivalent feature flag).
Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Link: https://lore.kernel.org/r/20220519204437.61645-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
d973bc80 |
| 19-May-2022 |
Eric Biggers <ebiggers@google.com> |
ext4: only allow test_dummy_encryption when supported
commit 5f41fdaea63ddf96d921ab36b2af4a90ccdb5744 upstream.
Make the test_dummy_encryption mount option require that the encrypt feature flag be
ext4: only allow test_dummy_encryption when supported
commit 5f41fdaea63ddf96d921ab36b2af4a90ccdb5744 upstream.
Make the test_dummy_encryption mount option require that the encrypt feature flag be already enabled on the filesystem, rather than automatically enabling it. Practically, this means that "-O encrypt" will need to be included in MKFS_OPTIONS when running xfstests with the test_dummy_encryption mount option. (ext4/053 also needs an update.)
Moreover, as long as the preconditions for test_dummy_encryption are being tightened anyway, take the opportunity to start rejecting it when !CONFIG_FS_ENCRYPTION rather than ignoring it.
The motivation for requiring the encrypt feature flag is that:
- Having the filesystem auto-enable feature flags is problematic, as it bypasses the usual sanity checks. The specific issue which came up recently is that in kernel versions where ext4 supports casefold but not encrypt+casefold (v5.1 through v5.10), the kernel will happily add the encrypt flag to a filesystem that has the casefold flag, making it unmountable -- but only for subsequent mounts, not the initial one. This confused the casefold support detection in xfstests, causing generic/556 to fail rather than be skipped.
- The xfstests-bld test runners (kvm-xfstests et al.) already use the required mkfs flag, so they will not be affected by this change. Only users of test_dummy_encryption alone will be affected. But, this option has always been for testing only, so it should be fine to require that the few users of this option update their test scripts.
- f2fs already requires it (for its equivalent feature flag).
Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Link: https://lore.kernel.org/r/20220519204437.61645-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
e3912775 |
| 24-Mar-2022 |
Ye Bin <yebin10@huawei.com> |
ext4: fix use-after-free in ext4_search_dir
commit c186f0887fe7061a35cebef024550ec33ef8fbd8 upstream.
We got issue as follows: EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=con
ext4: fix use-after-free in ext4_search_dir
commit c186f0887fe7061a35cebef024550ec33ef8fbd8 upstream.
We got issue as follows: EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue ================================================================== BUG: KASAN: use-after-free in ext4_search_dir fs/ext4/namei.c:1394 [inline] BUG: KASAN: use-after-free in search_dirblock fs/ext4/namei.c:1199 [inline] BUG: KASAN: use-after-free in __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553 Read of size 1 at addr ffff8881317c3005 by task syz-executor117/2331
CPU: 1 PID: 2331 Comm: syz-executor117 Not tainted 5.10.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:83 [inline] dump_stack+0x144/0x187 lib/dump_stack.c:124 print_address_description+0x7d/0x630 mm/kasan/report.c:387 __kasan_report+0x132/0x190 mm/kasan/report.c:547 kasan_report+0x47/0x60 mm/kasan/report.c:564 ext4_search_dir fs/ext4/namei.c:1394 [inline] search_dirblock fs/ext4/namei.c:1199 [inline] __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553 ext4_lookup_entry fs/ext4/namei.c:1622 [inline] ext4_lookup+0xb8/0x3a0 fs/ext4/namei.c:1690 __lookup_hash+0xc5/0x190 fs/namei.c:1451 do_rmdir+0x19e/0x310 fs/namei.c:3760 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x445e59 Code: 4d c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff2277fac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054 RAX: ffffffffffffffda RBX: 0000000000400280 RCX: 0000000000445e59 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000002 R10: 00007fff2277f990 R11: 0000000000000246 R12: 0000000000000000 R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000
The buggy address belongs to the page: page:0000000048cd3304 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1317c3 flags: 0x200000000000000() raw: 0200000000000000 ffffea0004526588 ffffea0004528088 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8881317c2f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8881317c2f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8881317c3000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff8881317c3080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8881317c3100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ==================================================================
ext4_search_dir: ... de = (struct ext4_dir_entry_2 *)search_buf; dlimit = search_buf + buf_size; while ((char *) de < dlimit) { ... if ((char *) de + de->name_len <= dlimit && ext4_match(dir, fname, de)) { ... } ... de_len = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize); if (de_len <= 0) return -1; offset += de_len; de = (struct ext4_dir_entry_2 *) ((char *) de + de_len); }
Assume: de=0xffff8881317c2fff dlimit=0x0xffff8881317c3000
If read 'de->name_len' which address is 0xffff8881317c3005, obviously is out of range, then will trigger use-after-free. To solve this issue, 'dlimit' must reserve 8 bytes, as we will read 'de->name_len' to judge if '(char *) de + de->name_len' out of range.
Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220324064816.1209985-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
ba50ea45 |
| 08-Mar-2022 |
Darrick J. Wong <djwong@kernel.org> |
ext4: fix fallocate to use file_modified to update permissions consistently
commit ad5cd4f4ee4d5fcdb1bfb7a0c073072961e70783 upstream.
Since the initial introduction of (posix) fallocate back at the
ext4: fix fallocate to use file_modified to update permissions consistently
commit ad5cd4f4ee4d5fcdb1bfb7a0c073072961e70783 upstream.
Since the initial introduction of (posix) fallocate back at the turn of the century, it has been possible to use this syscall to change the user-visible contents of files. This can happen by extending the file size during a preallocation, or through any of the newer modes (punch, zero, collapse, insert range). Because the call can be used to change file contents, we should treat it like we do any other modification to a file -- update the mtime, and drop set[ug]id privileges/capabilities.
The VFS function file_modified() does all this for us if pass it a locked inode, so let's make fallocate drop permissions correctly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20220308185043.GA117678@magnolia Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
6f6ffc71 |
| 17-Jan-2022 |
Xin Yin <yinxin.x@bytedance.com> |
ext4: fast commit may miss file actions
[ Upstream commit bdc8a53a6f2f0b1cb5f991440f2100732299eb93 ]
in the follow scenario: 1. jbd start transaction n 2. task A get new handle for transaction n+1
ext4: fast commit may miss file actions
[ Upstream commit bdc8a53a6f2f0b1cb5f991440f2100732299eb93 ]
in the follow scenario: 1. jbd start transaction n 2. task A get new handle for transaction n+1 3. task A do some actions and add inode to FC_Q_MAIN fc_q 4. jbd complete transaction n and clear FC_Q_MAIN fc_q 5. task A call fsync
Fast commit will lost the file actions during a full commit.
we should also add updates to staging queue during a full commit. and in ext4_fc_cleanup(), when reset a inode's fc track range, check it's i_sync_tid, if it bigger than current transaction tid, do not rest it, or we will lost the track range.
And EXT4_MF_FC_COMMITTING is not needed anymore, so drop it.
Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Link: https://lore.kernel.org/r/20220117093655.35160-3-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
97abcfed |
| 17-Jan-2022 |
Xin Yin <yinxin.x@bytedance.com> |
ext4: fast commit may not fallback for ineligible commit
[ Upstream commit e85c81ba8859a4c839bcd69c5d83b32954133a5b ]
For the follow scenario: 1. jbd start commit transaction n 2. task A get new ha
ext4: fast commit may not fallback for ineligible commit
[ Upstream commit e85c81ba8859a4c839bcd69c5d83b32954133a5b ]
For the follow scenario: 1. jbd start commit transaction n 2. task A get new handle for transaction n+1 3. task A do some ineligible actions and mark FC_INELIGIBLE 4. jbd complete transaction n and clean FC_INELIGIBLE 5. task A call fsync
In this case fast commit will not fallback to full commit and transaction n+1 also not handled by jbd.
Make ext4_fc_mark_ineligible() also record transaction tid for latest ineligible case, when call ext4_fc_cleanup() check current transaction tid, if small than latest ineligible tid do not clear the EXT4_MF_FC_INELIGIBLE.
Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Ritesh Harjani <riteshh@linux.ibm.com> Suggested-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Link: https://lore.kernel.org/r/20220117093655.35160-2-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
647b3f15 |
| 23-Dec-2021 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
ext4: simplify updating of fast commit stats
[ Upstream commit 0915e464cb274648e1ef1663e1356e53ff400983 ]
Move fast commit stats updating logic to a separate function from ext4_fc_commit(). This si
ext4: simplify updating of fast commit stats
[ Upstream commit 0915e464cb274648e1ef1663e1356e53ff400983 ]
Move fast commit stats updating logic to a separate function from ext4_fc_commit(). This significantly improves readability of ext4_fc_commit().
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-4-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
5abb1d84 |
| 23-Dec-2021 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
ext4: drop ineligible txn start stop APIs
[ Upstream commit 7bbbe241ec7ce0def9f71464c878fdbd2b0dcf37 ]
This patch drops ext4_fc_start_ineligible() and ext4_fc_stop_ineligible() APIs. Fast commit in
ext4: drop ineligible txn start stop APIs
[ Upstream commit 7bbbe241ec7ce0def9f71464c878fdbd2b0dcf37 ]
This patch drops ext4_fc_start_ineligible() and ext4_fc_stop_ineligible() APIs. Fast commit ineligible transactions should simply call ext4_fc_mark_ineligible() after starting the trasaction.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-3-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
0cb4480b |
| 09-Jan-2022 |
Xin Yin <yinxin.x@bytedance.com> |
ext4: prevent used blocks from being allocated during fast commit replay
commit 599ea31d13617c5484c40cdf50d88301dc351cfc upstream.
During fast commit replay procedure, we clear inode blocks bitmap
ext4: prevent used blocks from being allocated during fast commit replay
commit 599ea31d13617c5484c40cdf50d88301dc351cfc upstream.
During fast commit replay procedure, we clear inode blocks bitmap in ext4_ext_clear_bb(), this may cause ext4_mb_new_blocks_simple() allocate blocks still in use.
Make ext4_fc_record_regions() also record physical disk regions used by inodes during replay procedure. Then ext4_mb_new_blocks_simple() can excludes these blocks in use.
Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Link: https://lore.kernel.org/r/20220110035141.1980-2-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
c8577696 |
| 23-Dec-2021 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
ext4: destroy ext4_fc_dentry_cachep kmemcache on module removal
commit ab047d516dea72f011c15c04a929851e4d053109 upstream.
The kmemcache for ext4_fc_dentry_cachep remains registered after module rem
ext4: destroy ext4_fc_dentry_cachep kmemcache on module removal
commit ab047d516dea72f011c15c04a929851e4d053109 upstream.
The kmemcache for ext4_fc_dentry_cachep remains registered after module removal.
Destroy ext4_fc_dentry_cachep kmemcache on module removal.
Fixes: aa75f4d3daaeb ("ext4: main fast-commit commit path") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211110134640.lyku5vklvdndw6uk@linutronix.de Link: https://lore.kernel.org/r/YbiK3JetFFl08bd7@linutronix.de Link: https://lore.kernel.org/r/20211223164436.2628390-1-bigeasy@linutronix.de Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
6984aef5 |
| 16-Jul-2021 |
Zhang Yi <yi.zhang@huawei.com> |
ext4: factor out write end code of inline file
Now that the inline_data file write end procedure are falled into the common write end functions, it is not clear. Factor them out and do some cleanup.
ext4: factor out write end code of inline file
Now that the inline_data file write end procedure are falled into the common write end functions, it is not clear. Factor them out and do some cleanup. This patch also drop ext4_da_write_inline_data_end() and switch to use ext4_write_inline_data_end() instead because we also need to do the same error processing if we failed to write data into inline entry.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20210716122024.1105856-4-yi.zhang@huawei.com
show more ...
|
#
4a79a98c |
| 16-Aug-2021 |
Jan Kara <jack@suse.cz> |
ext4: Improve scalability of ext4 orphan file handling
Even though the length of the critical section when adding / removing orphaned inodes was significantly reduced by using orphan file, the conte
ext4: Improve scalability of ext4 orphan file handling
Even though the length of the critical section when adding / removing orphaned inodes was significantly reduced by using orphan file, the contention of lock protecting orphan file still appears high in profiles for truncate / unlink intensive workloads with high number of threads.
This patch makes handling of orphan file completely lockless. Also to reduce conflicts between CPUs different CPUs start searching for empty slot in orphan file in different blocks.
Performance comparison of locked orphan file handling, lockless orphan file handling, and completely disabled orphan inode handling from 80 CPU Xeon Server with 526 GB of RAM, filesystem located on SAS SSD disk, average of 5 runs:
stress-orphan (microbenchmark truncating files byte-by-byte from N processes in parallel)
Threads Time Time Time Orphan locked Orphan lockless No orphan 1 0.945600 0.939400 0.891200 2 1.331800 1.246600 1.174400 4 1.995000 1.780600 1.713200 8 6.424200 4.900000 4.106000 16 14.937600 8.516400 8.138000 32 33.038200 24.565600 24.002200 64 60.823600 39.844600 38.440200 128 122.941400 70.950400 69.315000
So we can see that with lockless orphan file handling, addition / deletion of orphaned inodes got almost completely out of picture even for a microbenchmark stressing it.
For reaim creat_clo workload on ramdisk there are also noticeable gains (average of 5 runs):
Clients Vanilla (ops/s) Patched (ops/s) creat_clo-1 14705.88 ( 0.00%) 14354.07 * -2.39%* creat_clo-3 27108.43 ( 0.00%) 28301.89 ( 4.40%) creat_clo-5 37406.48 ( 0.00%) 45180.73 * 20.78%* creat_clo-7 41338.58 ( 0.00%) 54687.50 * 32.29%* creat_clo-9 45226.13 ( 0.00%) 62937.07 * 39.16%* creat_clo-11 44000.00 ( 0.00%) 65088.76 * 47.93%* creat_clo-13 36516.85 ( 0.00%) 68661.97 * 88.03%* creat_clo-15 30864.20 ( 0.00%) 69551.78 * 125.35%* creat_clo-17 27478.45 ( 0.00%) 67729.08 * 146.48%* creat_clo-19 25000.00 ( 0.00%) 61621.62 * 146.49%* creat_clo-21 18772.35 ( 0.00%) 63829.79 * 240.02%* creat_clo-23 16698.94 ( 0.00%) 61938.96 * 270.92%* creat_clo-25 14973.05 ( 0.00%) 56947.61 * 280.33%* creat_clo-27 16436.69 ( 0.00%) 65008.03 * 295.51%* creat_clo-29 13949.01 ( 0.00%) 69047.62 * 395.00%* creat_clo-31 14283.52 ( 0.00%) 67982.45 * 375.95%*
Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210816095713.16537-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
02f310fc |
| 16-Aug-2021 |
Jan Kara <jack@suse.cz> |
ext4: Speedup ext4 orphan inode handling
Ext4 orphan inode handling is a bottleneck for workloads which heavily truncate / unlink small files since it contends on the global s_orphan_mutex lock (and
ext4: Speedup ext4 orphan inode handling
Ext4 orphan inode handling is a bottleneck for workloads which heavily truncate / unlink small files since it contends on the global s_orphan_mutex lock (and generally it's difficult to improve scalability of the ondisk linked list of orphaned inodes).
This patch implements new way of handling orphan inodes. Instead of linking orphaned inode into a linked list, we store it's inode number in a new special file which we call "orphan file". Only if there's no more space in the orphan file (too many inodes are currently orphaned) we fall back to using old style linked list. Currently we protect operations in the orphan file with a spinlock for simplicity but even in this setting we can substantially reduce the length of the critical section and thus speedup some workloads. In the next patch we improve this by making orphan handling lockless.
Note that the change is backwards compatible when the filesystem is clean - the existence of the orphan file is a compat feature, we set another ro-compat feature indicating orphan file needs scanning for orphaned inodes when mounting filesystem read-write. This ro-compat feature gets cleared on unmount / remount read-only.
Some performance data from 80 CPU Xeon Server with 512 GB of RAM, filesystem located on SSD, average of 5 runs:
stress-orphan (microbenchmark truncating files byte-by-byte from N processes in parallel)
Threads Time Time Vanilla Patched 1 1.057200 0.945600 2 1.680400 1.331800 4 2.547000 1.995000 8 7.049400 6.424200 16 14.827800 14.937600 32 40.948200 33.038200 64 87.787400 60.823600 128 206.504000 122.941400
So we can see significant wins all over the board.
Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210816095713.16537-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
25c6d98f |
| 16-Aug-2021 |
Jan Kara <jack@suse.cz> |
ext4: Move orphan inode handling into a separate file
Move functions for handling orphan inodes into a new file fs/ext4/orphan.c to have them in one place and somewhat reduce size of other files. No
ext4: Move orphan inode handling into a separate file
Move functions for handling orphan inodes into a new file fs/ext4/orphan.c to have them in one place and somewhat reduce size of other files. No code changes.
Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210816095713.16537-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
188c299e |
| 16-Aug-2021 |
Jan Kara <jack@suse.cz> |
ext4: Support for checksumming from journal triggers
JBD2 layer support triggers which are called when journaling layer moves buffer to a certain state. We can use the frozen trigger, which gets cal
ext4: Support for checksumming from journal triggers
JBD2 layer support triggers which are called when journaling layer moves buffer to a certain state. We can use the frozen trigger, which gets called when buffer data is frozen and about to be written out to the journal, to compute block checksums for some buffer types (similarly as does ocfs2). This avoids unnecessary repeated recomputation of the checksum (at the cost of larger window where memory corruption won't be caught by checksumming) and is even necessary when there are unsynchronized updaters of the checksummed data.
So add superblock and journal trigger type arguments to ext4_journal_get_write_access() and ext4_journal_get_create_access() so that frozen triggers can be set accordingly. Also add inode argument to ext4_walk_page_buffers() and all the callbacks used with that function for the same purpose. This patch is mostly only a change of prototype of the above mentioned functions and a few small helpers. Real checksumming will come later.
Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210816095713.16537-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
5036ab8d |
| 30-Aug-2021 |
Wang Jianchao <wangjianchao@kuaishou.com> |
ext4: flush background discard kwork when retry allocation
The background discard kwork tries to mark blocks used and issue discard. This can make filesystem suffer from NOSPC error, xfstest generic
ext4: flush background discard kwork when retry allocation
The background discard kwork tries to mark blocks used and issue discard. This can make filesystem suffer from NOSPC error, xfstest generic/371 can fail due to it. Fix it by flushing discard kwork in ext4_should_retry_alloc. At the same time, give up discard at the moment.
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com> Link: https://lore.kernel.org/r/20210830075246.12516-6-jianchao.wan9@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
55cdd0af |
| 24-Jul-2021 |
Wang Jianchao <wangjianchao@kuaishou.com> |
ext4: get discard out of jbd2 commit kthread contex
Right now, discard is issued and waited to be completed in jbd2 commit kthread context after the logs are committed. When large amount of files ar
ext4: get discard out of jbd2 commit kthread contex
Right now, discard is issued and waited to be completed in jbd2 commit kthread context after the logs are committed. When large amount of files are deleted and discard is flooding, jbd2 commit kthread can be blocked for long time. Then all of the metadata operations can be blocked to wait the log space.
One case is the page fault path with read mm->mmap_sem held, which wants to update the file time but has to wait for the log space. When other threads in the task wants to do mmap, then write mmap_sem is blocked. Finally all of the following read mmap_sem requirements are blocked, even the ps command which need to read the /proc/pid/ -cmdline. Our monitor service which needs to read /proc/pid/cmdline used to be blocked for 5 mins.
This patch frees the blocks back to buddy after commit and then do discard in a async kworker context in fstrim fashion, namely, - mark blocks to be discarded as used if they have not been allocated - do discard - mark them free After this, jbd2 commit kthread won't be blocked any more by discard and we won't get NOSPC even if the discard is slow or throttled.
Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2 Suggested-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com> Link: https://lore.kernel.org/r/20210830075246.12516-5-jianchao.wan9@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|