#
493720a4 |
| 24-Nov-2020 |
Chao Yu <yuchao0@huawei.com> |
f2fs: fix to avoid REQ_TIME and CP_TIME collision
Lei Li reported a issue: if foreground operations are frequent, background checkpoint may be always skipped due to below check, result in losing mor
f2fs: fix to avoid REQ_TIME and CP_TIME collision
Lei Li reported a issue: if foreground operations are frequent, background checkpoint may be always skipped due to below check, result in losing more data after sudden power-cut.
f2fs_balance_fs_bg() ... if (!is_idle(sbi, REQ_TIME) && (!excess_dirty_nats(sbi) && !excess_dirty_nodes(sbi))) return;
E.g: cp_interval = 5 second idle_interval = 2 second foreground operation interval = 1 second (append 1 byte per second into file)
In such case, no matter when it calls f2fs_balance_fs_bg(), is_idle(, REQ_TIME) returns false, result in skipping background checkpoint.
This patch changes as below to make trigger condition being more reasonable: - trigger sync_fs() if dirty_{nats,nodes} and prefree segs exceeds threshold; - skip triggering sync_fs() if there is any background inflight IO or there is foreground operation recently and meanwhile cp_rwsem is being held by someone;
Reported-by: Lei Li <noctis.akm@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
8769918b |
| 22-Nov-2020 |
Sahitya Tummala <stummala@codeaurora.org> |
f2fs: change to use rwsem for cp_mutex
Use rwsem to ensure serialization of the callers and to avoid starvation of high priority tasks, when the system is under heavy IO workload.
Signed-off-by: Sa
f2fs: change to use rwsem for cp_mutex
Use rwsem to ensure serialization of the callers and to avoid starvation of high priority tasks, when the system is under heavy IO workload.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
7ad08a58 |
| 19-Nov-2020 |
Daniel Rosenberg <drosen@google.com> |
f2fs: Handle casefolding with Encryption
Expand f2fs's casefolding support to include encrypted directories. To index casefolded+encrypted directories, we use the SipHash of the casefolded name, ke
f2fs: Handle casefolding with Encryption
Expand f2fs's casefolding support to include encrypted directories. To index casefolded+encrypted directories, we use the SipHash of the casefolded name, keyed by a key derived from the directory's fscrypt master key. This ensures that the dirhash doesn't leak information about the plaintext filenames.
Encryption keys are unavailable during roll-forward recovery, so we can't compute the dirhash when recovering a new dentry in an encrypted + casefolded directory. To avoid having to force a checkpoint when a new file is fsync'ed, store the dirhash on-disk appended to i_name.
This patch incorporates work by Eric Biggers <ebiggers@google.com> and Jaegeuk Kim <jaegeuk@kernel.org>.
Co-developed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
bb9cd910 |
| 19-Nov-2020 |
Daniel Rosenberg <drosen@google.com> |
fscrypt: Have filesystems handle their d_ops
This shifts the responsibility of setting up dentry operations from fscrypt to the individual filesystems, allowing them to have their own operations whi
fscrypt: Have filesystems handle their d_ops
This shifts the responsibility of setting up dentry operations from fscrypt to the individual filesystems, allowing them to have their own operations while still setting fscrypt's d_revalidate as appropriate.
Most filesystems can just use generic_set_encrypted_ci_d_ops, unless they have their own specific dentry operations as well. That operation will set the minimal d_ops required under the circumstances.
Since the fscrypt d_ops are set later on, we must set all d_ops there, since we cannot adjust those later on. This should not result in any change in behavior.
Signed-off-by: Daniel Rosenberg <drosen@google.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
8446fe92 |
| 24-Nov-2020 |
Christoph Hellwig <hch@lst.de> |
block: switch partition lookup to use struct block_device
Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path.
Signed-off-by: Chris
block: switch partition lookup to use struct block_device
Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
bfc2b7e8 |
| 18-Nov-2020 |
Eric Biggers <ebiggers@google.com> |
f2fs: prevent creating duplicate encrypted filenames
As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to create a duplicate filename in an encrypted directory by creating a file
f2fs: prevent creating duplicate encrypted filenames
As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to create a duplicate filename in an encrypted directory by creating a file concurrently with adding the directory's encryption key.
Fix this bug on f2fs by rejecting no-key dentries in f2fs_add_link().
Note that the weird check for the current task in f2fs_do_add_link() seems to make this bug difficult to reproduce on f2fs.
Fixes: 9ea97163c6da ("f2fs crypto: add filename encryption for f2fs_add_link") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201118075609.120337-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
show more ...
|
#
fa4320ce |
| 02-Nov-2020 |
Chao Yu <yuchao0@huawei.com> |
f2fs: move ioctl interface definitions to separated file
Like other filesystem does, we introduce a new file f2fs.h in path of include/uapi/linux/, and move f2fs-specified ioctl interface definition
f2fs: move ioctl interface definitions to separated file
Like other filesystem does, we introduce a new file f2fs.h in path of include/uapi/linux/, and move f2fs-specified ioctl interface definitions to that file, after then, in order to use those definitions, userspace developer only need to include the new header file rather than copy & paste definitions from fs/f2fs/f2fs.h.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
86f33603 |
| 02-Oct-2020 |
Jaegeuk Kim <jaegeuk@kernel.org> |
f2fs: handle errors of f2fs_get_meta_page_nofail
First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on f2fs_get_meta_page_nofail().
Quick fix was not to give any error with infinite lo
f2fs: handle errors of f2fs_get_meta_page_nofail
First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on f2fs_get_meta_page_nofail().
Quick fix was not to give any error with infinite loop, but syzbot caught a case where it goes to that loop from fuzzed image. In turned out we abused f2fs_get_meta_page_nofail() like in the below call stack.
- f2fs_fill_super - f2fs_build_segment_manager - build_sit_entries - get_current_sit_page
INFO: task syz-executor178:6870 can't die for more than 143 seconds. task:syz-executor178 state:R stack:26960 pid: 6870 ppid: 6869 flags:0x00004006 Call Trace:
Showing all locks held in the system: 1 lock held by khungtaskd/1179: #0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242 1 lock held by systemd-journal/3920: 1 lock held by in:imklog/6769: #0: ffff88809eebc130 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930 1 lock held by syz-executor178/6870: #0: ffff8880925120e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229
Actually, we didn't have to use _nofail in this case, since we could return error to mount(2) already with the error handler.
As a result, this patch tries to 1) remove _nofail callers as much as possible, 2) deal with error case in last remaining caller, f2fs_get_sum_page().
Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
48046cb5 |
| 07-Oct-2020 |
Jaegeuk Kim <jaegeuk@kernel.org> |
f2fs: fix memory alignment to support 32bit
In 32bit system, 64-bits key breaks memory alignment. This fixes the commit "f2fs: support 64-bits key in f2fs rb-tree node entry".
Reported-by: Nicolas
f2fs: fix memory alignment to support 32bit
In 32bit system, 64-bits key breaks memory alignment. This fixes the commit "f2fs: support 64-bits key in f2fs rb-tree node entry".
Reported-by: Nicolas Chauvet <kwizart@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
c68d6c88 |
| 14-Sep-2020 |
Chao Yu <yuchao0@huawei.com> |
f2fs: compress: introduce cic/dic slab cache
Add two slab caches: "f2fs_cic_entry" and "f2fs_dic_entry" for memory allocation of compress_io_ctx and decompress_io_ctx structure.
Signed-off-by: Chao
f2fs: compress: introduce cic/dic slab cache
Add two slab caches: "f2fs_cic_entry" and "f2fs_dic_entry" for memory allocation of compress_io_ctx and decompress_io_ctx structure.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
31083031 |
| 14-Sep-2020 |
Chao Yu <yuchao0@huawei.com> |
f2fs: compress: introduce page array slab cache
Add a per-sbi slab cache "f2fs_page_array_entry-%u:%u" for memory allocation of page pointer array in compress context.
Signed-off-by: Chao Yu <yucha
f2fs: compress: introduce page array slab cache
Add a per-sbi slab cache "f2fs_page_array_entry-%u:%u" for memory allocation of page pointer array in compress context.
Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: Fix wrong memory allocation] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
ac4acb1f |
| 16-Sep-2020 |
Eric Biggers <ebiggers@google.com> |
fscrypt: handle test_dummy_encryption in more logical way
The behavior of the test_dummy_encryption mount option is that when a new file (or directory or symlink) is created in an unencrypted direct
fscrypt: handle test_dummy_encryption in more logical way
The behavior of the test_dummy_encryption mount option is that when a new file (or directory or symlink) is created in an unencrypted directory, it's automatically encrypted using a dummy encryption policy. That's it; in particular, the encryption (or lack thereof) of existing files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird and confusing. When test_dummy_encryption is enabled and a file is being created in an unencrypted directory, we set up an encryption key (->i_crypt_info) for the directory. This isn't actually used to do any encryption, however, since the directory is still unencrypted! Instead, ->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a "dummy context" (policy + nonce) instead of a "dummy policy". In commit ed318a6cc0b6 ("fscrypt: support test_dummy_encryption=v2"), I mistakenly thought this was required. However, actually the nonce only ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for 'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge case that can be forgotten about. For example, currently FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the dummy encryption policy when the filesystem is mounted with test_dummy_encryption. That seems like the wrong thing to do, since again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation where the dummy encryption policy inheritance is done without setting up keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the encryption policy to inherit from a directory. This can be a real policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc. with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
show more ...
|
#
e075b690 |
| 16-Sep-2020 |
Eric Biggers <ebiggers@google.com> |
f2fs: use fscrypt_prepare_new_inode() and fscrypt_set_context()
Convert f2fs to use the new functions fscrypt_prepare_new_inode() and fscrypt_set_context(). This avoids calling fscrypt_get_encrypti
f2fs: use fscrypt_prepare_new_inode() and fscrypt_set_context()
Convert f2fs to use the new functions fscrypt_prepare_new_inode() and fscrypt_set_context(). This avoids calling fscrypt_get_encryption_info() from under f2fs_lock_op(), which can deadlock because fscrypt_get_encryption_info() isn't GFP_NOFS-safe.
For more details about this problem, see the earlier patch "fscrypt: add fscrypt_prepare_new_inode() and fscrypt_set_context()".
This also fixes a f2fs-specific deadlock when the filesystem is mounted with '-o test_dummy_encryption' and a file is created in an unencrypted directory other than the root directory:
INFO: task touch:207 blocked for more than 30 seconds. Not tainted 5.9.0-rc4-00099-g729e3d0919844 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:touch state:D stack: 0 pid: 207 ppid: 167 flags:0x00000000 Call Trace: [...] lock_page include/linux/pagemap.h:548 [inline] pagecache_get_page+0x25e/0x310 mm/filemap.c:1682 find_or_create_page include/linux/pagemap.h:348 [inline] grab_cache_page include/linux/pagemap.h:424 [inline] f2fs_grab_cache_page fs/f2fs/f2fs.h:2395 [inline] f2fs_grab_cache_page fs/f2fs/f2fs.h:2373 [inline] __get_node_page.part.0+0x39/0x2d0 fs/f2fs/node.c:1350 __get_node_page fs/f2fs/node.c:35 [inline] f2fs_get_node_page+0x2e/0x60 fs/f2fs/node.c:1399 read_inline_xattr+0x88/0x140 fs/f2fs/xattr.c:288 lookup_all_xattrs+0x1f9/0x2c0 fs/f2fs/xattr.c:344 f2fs_getxattr+0x9b/0x160 fs/f2fs/xattr.c:532 f2fs_get_context+0x1e/0x20 fs/f2fs/super.c:2460 fscrypt_get_encryption_info+0x9b/0x450 fs/crypto/keysetup.c:472 fscrypt_inherit_context+0x2f/0xb0 fs/crypto/policy.c:640 f2fs_init_inode_metadata+0xab/0x340 fs/f2fs/dir.c:540 f2fs_add_inline_entry+0x145/0x390 fs/f2fs/inline.c:621 f2fs_add_dentry+0x31/0x80 fs/f2fs/dir.c:757 f2fs_do_add_link+0xcd/0x130 fs/f2fs/dir.c:798 f2fs_add_link fs/f2fs/f2fs.h:3234 [inline] f2fs_create+0x104/0x290 fs/f2fs/namei.c:344 lookup_open.isra.0+0x2de/0x500 fs/namei.c:3103 open_last_lookups+0xa9/0x340 fs/namei.c:3177 path_openat+0x8f/0x1b0 fs/namei.c:3365 do_filp_open+0x87/0x130 fs/namei.c:3395 do_sys_openat2+0x96/0x150 fs/open.c:1168 [...]
That happened because f2fs_add_inline_entry() locks the directory inode's page in order to add the dentry, then f2fs_get_context() tries to lock it recursively in order to read the encryption xattr. This problem is specific to "test_dummy_encryption" because normally the directory's fscrypt_info would be set up prior to f2fs_add_inline_entry() in order to encrypt the new filename.
Regardless, the new design fixes this test_dummy_encryption deadlock as well as potential deadlocks with fs reclaim, by setting up any needed fscrypt_info structs prior to taking so many locks.
The test_dummy_encryption deadlock was reported by Daniel Rosenberg.
Reported-by: Daniel Rosenberg <drosen@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Link: https://lore.kernel.org/r/20200917041136.178600-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
show more ...
|
#
78134d03 |
| 07-Sep-2020 |
Daeho Jeong <daehojeong@google.com> |
f2fs: change return value of f2fs_disable_compressed_file to bool
The returned integer is not required anywhere. So we need to change the return value to bool type.
Signed-off-by: Daeho Jeong <daeh
f2fs: change return value of f2fs_disable_compressed_file to bool
The returned integer is not required anywhere. So we need to change the return value to bool type.
Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
c2759eba |
| 07-Sep-2020 |
Daeho Jeong <daehojeong@google.com> |
f2fs: change i_compr_blocks of inode to atomic value
writepages() can be concurrently invoked for the same file by different threads such as a thread fsyncing the file and a kworker kernel thread. S
f2fs: change i_compr_blocks of inode to atomic value
writepages() can be concurrently invoked for the same file by different threads such as a thread fsyncing the file and a kworker kernel thread. So, changing i_compr_blocks without protection is racy and we need to protect it by changing it with atomic type value. Plus, we don't need a 64bit value for i_compr_blocks, so just we will use a atomic value, not atomic64.
Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
0e2b7385 |
| 02-Sep-2020 |
Chao Yu <yuchao0@huawei.com> |
f2fs: allocate proper size memory for zstd decompress
As 5kft <5kft@5kft.org> reported:
kworker/u9:3: page allocation failure: order:9, mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,
f2fs: allocate proper size memory for zstd decompress
As 5kft <5kft@5kft.org> reported:
kworker/u9:3: page allocation failure: order:9, mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 3 PID: 8168 Comm: kworker/u9:3 Tainted: G C 5.8.3-sunxi #trunk Hardware name: Allwinner sun8i Family Workqueue: f2fs_post_read_wq f2fs_post_read_work [<c010d6d5>] (unwind_backtrace) from [<c0109a55>] (show_stack+0x11/0x14) [<c0109a55>] (show_stack) from [<c056d489>] (dump_stack+0x75/0x84) [<c056d489>] (dump_stack) from [<c0243b53>] (warn_alloc+0xa3/0x104) [<c0243b53>] (warn_alloc) from [<c024473b>] (__alloc_pages_nodemask+0xb87/0xc40) [<c024473b>] (__alloc_pages_nodemask) from [<c02267c5>] (kmalloc_order+0x19/0x38) [<c02267c5>] (kmalloc_order) from [<c02267fd>] (kmalloc_order_trace+0x19/0x90) [<c02267fd>] (kmalloc_order_trace) from [<c047c665>] (zstd_init_decompress_ctx+0x21/0x88) [<c047c665>] (zstd_init_decompress_ctx) from [<c047e9cf>] (f2fs_decompress_pages+0x97/0x228) [<c047e9cf>] (f2fs_decompress_pages) from [<c045d0ab>] (__read_end_io+0xfb/0x130) [<c045d0ab>] (__read_end_io) from [<c045d141>] (f2fs_post_read_work+0x61/0x84) [<c045d141>] (f2fs_post_read_work) from [<c0130b2f>] (process_one_work+0x15f/0x3b0) [<c0130b2f>] (process_one_work) from [<c0130e7b>] (worker_thread+0xfb/0x3e0) [<c0130e7b>] (worker_thread) from [<c0135c3b>] (kthread+0xeb/0x10c) [<c0135c3b>] (kthread) from [<c0100159>]
zstd may allocate large size memory for {,de}compression, it may cause file copy failure on low-end device which has very few memory.
For decompression, let's just allocate proper size memory based on current file's cluster size instead of max cluster size.
Reported-by: 5kft <5kft@5kft.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
ae999bb9 |
| 30-Aug-2020 |
Daeho Jeong <daehojeong@google.com> |
f2fs: change compr_blocks of superblock info to 64bit
Current compr_blocks of superblock info is not 64bit value. We are accumulating each i_compr_blocks count of inodes to this value and those are
f2fs: change compr_blocks of superblock info to 64bit
Current compr_blocks of superblock info is not 64bit value. We are accumulating each i_compr_blocks count of inodes to this value and those are 64bit values. So, need to change this to 64bit value.
Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
093749e2 |
| 04-Aug-2020 |
Chao Yu <yuchao0@huawei.com> |
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm: - valid blocks is one of key factors during cost overhead calculation, so if segment
f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm: - valid blocks is one of key factors during cost overhead calculation, so if segment has less valid block, however even its age is young or it locates hot segment, CB algorithm will still choose the segment as victim, it's not appropriate. - GCed data/node will go to existing logs, no matter in-there datas' update frequency is the same or not, it may mix hot and cold data again. - GC alloctor mainly use LFS type segment, it will cost free segment more quickly.
This patch introduces a new algorithm named age threshold based garbage collection to solve above issues, there are three steps mainly:
1. select a source victim: - set an age threshold, and select candidates beased threshold: e.g. 0 means youngest, 100 means oldest, if we set age threshold to 80 then select dirty segments which has age in range of [80, 100] as candiddates; - set candidate_ratio threshold, and select candidates based the ratio, so that we can shrink candidates to those oldest segments; - select target segment with fewest valid blocks in order to migrate blocks with minimum cost;
2. select a target victim: - select candidates beased age threshold; - set candidate_radius threshold, search candidates whose age is around source victims, searching radius should less than the radius threshold. - select target segment with most valid blocks in order to avoid migrating current target segment.
3. merge valid blocks from source victim into target victim with SSR alloctor.
Test steps: - create 160 dirty segments: * half of them have 128 valid blocks per segment * left of them have 384 valid blocks per segment - run background GC
Benefit: GC count and block movement count both decrease obviously:
- Before: - Valid: 86 - Dirty: 1 - Prefree: 11 - Free: 6001 (6001)
GC calls: 162 (BG: 220) - data segments : 160 (160) - node segments : 2 (2) Try to move 41454 blocks (BG: 41454) - data blocks : 40960 (40960) - node blocks : 494 (494)
IPU: 0 blocks SSR: 0 blocks in 0 segments LFS: 41364 blocks in 81 segments
- After:
- Valid: 87 - Dirty: 0 - Prefree: 4 - Free: 6008 (6008)
GC calls: 75 (BG: 76) - data segments : 74 (74) - node segments : 1 (1) Try to move 12813 blocks (BG: 12813) - data blocks : 12544 (12544) - node blocks : 269 (269)
IPU: 0 blocks SSR: 12032 blocks in 77 segments LFS: 855 blocks in 2 segments
Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
eca4873e |
| 08-Jul-2020 |
Daniel Rosenberg <drosen@google.com> |
f2fs: Use generic casefolding support
This switches f2fs over to the generic support provided in the previous patch.
Since casefolded dentries behave the same in ext4 and f2fs, we decrease the main
f2fs: Use generic casefolding support
This switches f2fs over to the generic support provided in the previous patch.
Since casefolded dentries behave the same in ext4 and f2fs, we decrease the maintenance burden by unifying them, and any optimizations will immediately apply to both.
Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
e6c3948d |
| 10-Aug-2020 |
Chao Yu <yuchao0@huawei.com> |
f2fs: compress: use more readable atomic_t type for {cic,dic}.ref
refcount_t type variable should never be less than one, so it's a little bit hard to understand when we use it to indicate pending c
f2fs: compress: use more readable atomic_t type for {cic,dic}.ref
refcount_t type variable should never be less than one, so it's a little bit hard to understand when we use it to indicate pending compressed page count, let's change to use atomic_t for better readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
2e9b2bb2 |
| 04-Aug-2020 |
Chao Yu <yuchao0@huawei.com> |
f2fs: support 64-bits key in f2fs rb-tree node entry
then, we can add specified entry into rb-tree with 64-bits segment time as key.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaege
f2fs: support 64-bits key in f2fs rb-tree node entry
then, we can add specified entry into rb-tree with 64-bits segment time as key.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
c5d02785 |
| 04-Aug-2020 |
Chao Yu <yuchao0@huawei.com> |
f2fs: inherit mtime of original block during GC
Don't let f2fs inner GC ruins original aging degree of segment.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kerne
f2fs: inherit mtime of original block during GC
Don't let f2fs inner GC ruins original aging degree of segment.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
d0b9e42a |
| 04-Aug-2020 |
Chao Yu <yuchao0@huawei.com> |
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will: - allocate new segment on cold data log no matter whether last used segment is partially used or not, it make
f2fs: introduce inmem curseg
Previous implementation of aligned pinfile allocation will: - allocate new segment on cold data log no matter whether last used segment is partially used or not, it makes IOs more random; - force concurrent cold data/GCed IO going into warm data area, it can make a bad effect on hot/cold data separation;
In this patch, we introduce a new type of log named 'inmem curseg', the differents from normal curseg is: - it reuses existed segment type (CURSEG_XXX_NODE/DATA); - it only exists in memory, its segno, blkofs, summary will not b persisted into checkpoint area;
With this new feature, we can enhance scalability of log, special allocators can be created for purposes: - pure lfs allocator for aligned pinfile allocation or file defragmentation - pure ssr allocator for later feature
So that, let's update aligned pinfile allocation to use this new inmem curseg fwk.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
e90027d2 |
| 04-Aug-2020 |
Xiaojun Wang <wangxiaojun11@huawei.com> |
f2fs: remove duplicated type casting
Since DUMMY_WRITTEN_PAGE and ATOMIC_WRITTEN_PAGE have already been converted as unsigned long type, we don't need do type casting again.
Signed-off-by: Xiaojun
f2fs: remove duplicated type casting
Since DUMMY_WRITTEN_PAGE and ATOMIC_WRITTEN_PAGE have already been converted as unsigned long type, we don't need do type casting again.
Signed-off-by: Xiaojun Wang <wangxiaojun11@huawei.com> Reported-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|
#
de881df9 |
| 16-Jul-2020 |
Aravind Ramesh <aravind.ramesh@wdc.com> |
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size. Zone-capacity indicates the maximum number of sectors that are usable in a zo
f2fs: support zone capacity less than zone size
NVMe Zoned Namespace devices can have zone-capacity less than zone-size. Zone-capacity indicates the maximum number of sectors that are usable in a zone beginning from the first sector of the zone. This makes the sectors sectors after the zone-capacity till zone-size to be unusable. This patch set tracks zone-size and zone-capacity in zoned devices and calculate the usable blocks per segment and usable segments per section.
If zone-capacity is less than zone-size mark only those segments which start before zone-capacity as free segments. All segments at and beyond zone-capacity are treated as permanently used segments. In cases where zone-capacity does not align with segment size the last segment will start before zone-capacity and end beyond the zone-capacity of the zone. For such spanning segments only sectors within the zone-capacity are used.
During writes and GC manage the usable segments in a section and usable blocks per segment. Segments which are beyond zone-capacity are never allocated, and do not need to be garbage collected, only the segments which are before zone-capacity needs to garbage collected. For spanning segments based on the number of usable blocks in that segment, write to blocks only up to zone-capacity.
Zone-capacity is device specific and cannot be configured by the user. Since NVMe ZNS device zones are sequentially write only, a block device with conventional zones or any normal block device is needed along with the ZNS device for the metadata operations of F2fs.
A typical nvme-cli output of a zoned device shows zone start and capacity and write pointer as below:
SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones are in EMPTY state. For each zone, only zone start + 49MB is usable area, any lba/sector after 49MB cannot be read or written to, the drive will fail any attempts to read/write. So, the second zone starts at 64MB and is usable till 113MB (64 + 49) and the range between 113 and 128MB is again unusable. The next zone starts at 128MB, and so on.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
show more ...
|