#
b7620416 |
| 26-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: pass btrfs_inode to btrfs_submit_data_read_bio
The function is for internal interfaces so we should use the btrfs_inode.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David
btrfs: pass btrfs_inode to btrfs_submit_data_read_bio
The function is for internal interfaces so we should use the btrfs_inode.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
535a7e5d |
| 26-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: pass btrfs_inode to btrfs_submit_data_write_bio
The function is for internal interfaces so we should use the btrfs_inode.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David
btrfs: pass btrfs_inode to btrfs_submit_data_write_bio
The function is for internal interfaces so we should use the btrfs_inode.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
bfa17066 |
| 26-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: pass btrfs_inode to btrfs_submit_bio_start_direct_io
The function is for internal interfaces so we should use the btrfs_inode.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by:
btrfs: pass btrfs_inode to btrfs_submit_bio_start_direct_io
The function is for internal interfaces so we should use the btrfs_inode.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
882681ac |
| 26-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: pass btrfs_inode to btrfs_submit_bio_start
The function is for internal interfaces so we should use the btrfs_inode.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Ster
btrfs: pass btrfs_inode to btrfs_submit_bio_start
The function is for internal interfaces so we should use the btrfs_inode.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
ad65ecf3 |
| 26-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: simplify btree_submit_bio_start and btrfs_submit_bio_start parameters
After previous patches the unused parameters can be removed from btree_submit_bio_start and btrfs_submit_bio_start as the
btrfs: simplify btree_submit_bio_start and btrfs_submit_bio_start parameters
After previous patches the unused parameters can be removed from btree_submit_bio_start and btrfs_submit_bio_start as they don't need to conform to the extent_submit_bio_start_t typedef.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
ab2072b2 |
| 26-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: change how submit bio callback is passed to btrfs_wq_submit_bio
There's a callback function parameter for btrfs_wq_submit_bio that can be one of: metadata, buffered data, direct io data. The
btrfs: change how submit bio callback is passed to btrfs_wq_submit_bio
There's a callback function parameter for btrfs_wq_submit_bio that can be one of: metadata, buffered data, direct io data. The callback abstraction is unnecessary as we have all functions available.
Replace the parameter with a command that leads to a direct call in run_one_async_start. The called functions can be then simplified and we can also remove the extent_submit_bio_start_t typedef.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
7920b773 |
| 26-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: drop parameter compression_type from btrfs_submit_dio_repair_bio
Compression and direct io don't work together so the compression parameter can be dropped after previous patch that changed th
btrfs: drop parameter compression_type from btrfs_submit_dio_repair_bio
Compression and direct io don't work together so the compression parameter can be dropped after previous patch that changed the call to direct.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
19af6a7d |
| 26-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: change how repair action is passed to btrfs_repair_one_sector
There's a function pointer passed to btrfs_repair_one_sector that will submit the right bio for repair. However there are only tw
btrfs: change how repair action is passed to btrfs_repair_one_sector
There's a function pointer passed to btrfs_repair_one_sector that will submit the right bio for repair. However there are only two callbacks, for buffered and for direct IO. This can be simplified to a bool-based switch and call either function, indirect calls in this case is an unnecessary abstraction. This allows to remove the submit_bio_hook_t typedef.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
2885fd63 |
| 26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move inode prototypes to btrfs_inode.h
I initially wanted to make a new header file for this, but these prototypes do naturally fit into btrfs_inode.h. If we want to extract vfs from pure bt
btrfs: move inode prototypes to btrfs_inode.h
I initially wanted to make a new header file for this, but these prototypes do naturally fit into btrfs_inode.h. If we want to extract vfs from pure btrfs code in the future we may need to split this up, but btrfs_inode embeds the vfs_inode, so it makes sense to put the prototypes in this header for now.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68 |
|
#
f60acad3 |
| 14-Sep-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move btrfs_print_data_csum_error into inode.c
This isn't used outside of inode.c, there's no reason to define it in btrfs_inode.h. Drop the inline and add __cold as it's for errors that are n
btrfs: move btrfs_print_data_csum_error into inode.c
This isn't used outside of inode.c, there's no reason to define it in btrfs_inode.h. Drop the inline and add __cold as it's for errors that are not in any hot path.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
9b9b8854 |
| 14-Sep-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: use a runtime flag to indicate an inode is a free space inode
We always check the root of an inode as well as it's inode number to determine if it's a free space inode. This is problematic a
btrfs: use a runtime flag to indicate an inode is a free space inode
We always check the root of an inode as well as it's inode number to determine if it's a free space inode. This is problematic as the helper is in a header file where it doesn't have the fs_info definition. To avoid this and make the check a little cleaner simply add a flag to the runtime_flags to indicate that the inode is a free space inode, set that when we create the inode, and then change the helper to check for this flag.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
e256927b |
| 14-Sep-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: open code and remove btrfs_insert_inode_hash helper
This exists to insert the btree_inode in the super blocks inode hash table. Since it's only used for the btree inode move the code to wher
btrfs: open code and remove btrfs_insert_inode_hash helper
This exists to insert the btree_inode in the super blocks inode hash table. Since it's only used for the btree inode move the code to where we use it in disk-io.c and remove the helper.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
ee8ba05c |
| 14-Sep-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: open code and remove btrfs_inode_sectorsize helper
This is defined in btrfs_inode.h, and dereferences btrfs_root and btrfs_fs_info, both of which aren't defined in btrfs_inode.h. Additionally
btrfs: open code and remove btrfs_inode_sectorsize helper
This is defined in btrfs_inode.h, and dereferences btrfs_root and btrfs_fs_info, both of which aren't defined in btrfs_inode.h. Additionally, in many places we already have root or fs_info, so this helper often makes the code harder to read. So delete the helper and simply open code it in the few places that we use it.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
87c11705 |
| 09-Sep-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: convert the io_failure_tree to a plain rb_tree
We still have this oddity of stashing the io_failure_record in the extent state for the io_failure_tree, which is leftover from when we used to
btrfs: convert the io_failure_tree to a plain rb_tree
We still have this oddity of stashing the io_failure_record in the extent state for the io_failure_tree, which is leftover from when we used to stuff private pointers in extent_io_trees.
However this doesn't make a lot of sense for the io failure records, we can simply use a normal rb_tree for this. This will allow us to further simplify the extent_io_tree code by removing the io_failure_rec pointer from the extent state.
Convert the io_failure_tree to an rb tree + spinlock in the inode, and then use our rb tree simple helpers to insert and find failed records. This greatly cleans up this code and makes it easier to separate out the extent_io_tree code.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54 |
|
#
cf2404a9 |
| 11-Jul-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: add optimized btrfs_ino() version for 64 bits systems
Currently btrfs_ino() tries to use first the objectid of the inode's location key. This is to avoid truncation of the inode number on 32
btrfs: add optimized btrfs_ino() version for 64 bits systems
Currently btrfs_ino() tries to use first the objectid of the inode's location key. This is to avoid truncation of the inode number on 32 bits platforms because the i_ino field of struct inode has the unsigned long type, while the objectid is a 64 bits unsigned type (u64) on every system. This logic was added in commit 33345d01522f81 ("Btrfs: Always use 64bit inode number").
However if we are running on a 64 bits system, we can always directly return the i_ino value from struct inode, which eliminates the need for he special if statement that tests for a location key type of BTRFS_ROOT_ITEM_KEY - in which case i_ino may not have the same value as the objectid in the inode's location objectid, it may have a value of BTRFS_EMPTY_SUBVOL_DIR_OBJECTID, for the case of snapshots of trees with subvolumes/snapshots inside them.
So add a special version for 64 bits system that directly returns i_ino of struct inode. This eliminates one branch and reduces the overall code size, since btrfs_ino() is an inline function that is extensively used.
Before:
$ size fs/btrfs/btrfs.ko text data bss dec hex filename 1617487 189240 29032 1835759 1c02ef fs/btrfs/btrfs.ko
After:
$ size fs/btrfs/btrfs.ko text data bss dec hex filename 1612028 189180 29032 1830240 1bed60 fs/btrfs/btrfs.ko
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
adac5584 |
| 11-Jul-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: set the objectid of the btree inode's location key
We currently don't use the location key of the btree inode, its content is set to zeroes, as it's a special inode that is not persisted (it
btrfs: set the objectid of the btree inode's location key
We currently don't use the location key of the btree inode, its content is set to zeroes, as it's a special inode that is not persisted (it has no inode item stored in any btree).
At btrfs_ino(), an inline function used extensively in btrfs, we have this special check if the given inode's location objectid is 0, and if it is, we return the value stored in the VFS' inode i_ino field instead (which is BTRFS_BTREE_INODE_OBJECTID for the btree inode).
To reduce the code at btrfs_ino(), we can simply set the objectid of the btree inode to the value BTRFS_BTREE_INODE_OBJECTID. This eliminates the need to check for the special case of the objectid being zero, with the side effect of reducing the overall code size and having less code to execute, as btrfs_ino() is an inline function.
Before:
$ size fs/btrfs/btrfs.ko text data bss dec hex filename 1620502 189240 29032 1838774 1c0eb6 fs/btrfs/btrfs.ko
After:
$ size fs/btrfs/btrfs.ko text data bss dec hex filename 1617487 189240 29032 1835759 1c02ef fs/btrfs/btrfs.ko
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.53 |
|
#
0201fceb |
| 06-Jul-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: remove the inode cache check at btrfs_is_free_space_inode()
The inode cache feature was removed in kernel 5.11, and we no longer have any code that reads from or writes to inode caches. We ma
btrfs: remove the inode cache check at btrfs_is_free_space_inode()
The inode cache feature was removed in kernel 5.11, and we no longer have any code that reads from or writes to inode caches. We may still mount a filesystem that has inode caches, but they are ignored.
Remove the check for an inode cache from btrfs_is_free_space_inode(), since we no longer have code to trigger reads from an inode cache or writes to an inode cache. The check at send.c is still needed, because in case we find a filesystem with an inode cache, we must ignore it. Also leave the checks at tree-checker.c, as they are sanity checks.
This eliminates a dead branch and reduces the amount of code since it's in an inline function.
Before:
$ size fs/btrfs/btrfs.ko text data bss dec hex filename 1620662 189240 29032 1838934 1c0f56 fs/btrfs/btrfs.ko
After:
$ size fs/btrfs/btrfs.ko text data bss dec hex filename 1620502 189240 29032 1838774 1c0eb6 fs/btrfs/btrfs.ko
Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38 |
|
#
a3e171a0 |
| 05-May-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: move struct btrfs_dio_private to inode.c
The btrfs_dio_private structure is only used in inode.c, so move the definition there.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by
btrfs: move struct btrfs_dio_private to inode.c
The btrfs_dio_private structure is only used in inode.c, so move the definition there.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
acb8b52a |
| 05-May-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: remove the disk_bytenr in struct btrfs_dio_private
This field is never used, so remove it. Last use was probably in 23ea8e5a0767 ("Btrfs: load checksum data once when submitting a direct read
btrfs: remove the disk_bytenr in struct btrfs_dio_private
This field is never used, so remove it. Last use was probably in 23ea8e5a0767 ("Btrfs: load checksum data once when submitting a direct read io").
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.37, v5.15.36, v5.15.35 |
|
#
e6f9d696 |
| 15-Apr-2022 |
Chung-Chiang Cheng <cccheng@synology.com> |
btrfs: export a helper for compression hard check
inode_can_compress will be used outside of inode.c to check the availability of setting compression flag by xattr. This patch moves this function as
btrfs: export a helper for compression hard check
inode_can_compress will be used outside of inode.c to check the availability of setting compression flag by xattr. This patch moves this function as an internal helper and renames it to btrfs_inode_can_compress.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25 |
|
#
23e3337f |
| 17-Feb-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: reset last_reflink_trans after fsyncing inode
When an inode has a last_reflink_trans matching the current transaction, we have to take special care when logging its checksums in order to avoi
btrfs: reset last_reflink_trans after fsyncing inode
When an inode has a last_reflink_trans matching the current transaction, we have to take special care when logging its checksums in order to avoid getting checksum items with overlapping ranges in a log tree, which could result in missing checksums after log replay (more on that in the changelogs of commit 40e046acbd2f36 ("Btrfs: fix missing data checksums after replaying a log tree") and commit e289f03ea79bbc ("btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents")). We also need to make sure a full fsync will copy all old file extent items it finds in modified leaves, because they might have been copied from some other inode.
However once we fsync an inode, we don't need to keep paying the price of that extra special care in future fsyncs done in the same transaction, unless the inode is used for another reflink operation or the full sync flag is set on it (truncate, failure to allocate extent maps for holes, and other exceptional and infrequent cases).
So after we fsync an inode reset its last_unlink_trans to zero. In case another reflink happens, we continue to update the last_reflink_trans of the inode, just as before. Also set last_reflink_trans to the generation of the last transaction that modified the inode whenever we need to set the full sync flag on the inode, just like when we need to load an inode from disk after eviction.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9 |
|
#
528ee697 |
| 15-Dec-2021 |
Filipe Manana <fdmanana@suse.com> |
btrfs: put initial index value of a directory in a constant
At btrfs_set_inode_index_count() we refer twice to the number 2 as the initial index value for a directory (when it's empty), with a prope
btrfs: put initial index value of a directory in a constant
At btrfs_set_inode_index_count() we refer twice to the number 2 as the initial index value for a directory (when it's empty), with a proper comment explaining the reason for that value. In the next patch I'll have to use that magic value in the directory logging code, so put the value in a #define at btrfs_inode.h, to avoid hardcoding the magic value again at tree-log.c.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15 |
|
#
339d0354 |
| 25-Oct-2021 |
Filipe Manana <fdmanana@suse.com> |
btrfs: only copy dir index keys when logging a directory
Currently, when logging a directory, we copy both dir items and dir index items from the fs/subvolume tree to the log tree. Both items have e
btrfs: only copy dir index keys when logging a directory
Currently, when logging a directory, we copy both dir items and dir index items from the fs/subvolume tree to the log tree. Both items have exactly the same data (same struct btrfs_dir_item), the difference lies in the key values, where a dir index key contains the index number of a directory entry while the dir item key does not, as it's used for doing fast lookups of an entry by name, while the former is used for sorting entries when listing a directory.
We can exploit that and log only the dir index items, since they contain all the information needed to correctly add, replace and delete directory entries when replaying a log tree. Logging only the dir index items is also backward and forward compatible: an unpatched kernel (without this change) can correctly replay a log tree generated by a patched kernel (with this patch), and a patched kernel can correctly replay a log tree generated by an unpatched kernel.
The backward compatibility is ensured because:
1) For inserting a new dentry: a dentry is only inserted when we find a new dir index key - we can only insert if we know the dir index offset, which is encoded in the dir index key's offset;
2) For deleting dentries: during log replay, before adding or replacing dentries, we first replay dentry deletions. Whenever we find a dir item key or a dir index key in the subvolume/fs tree that is not logged in a range for which the log tree is authoritative, we do the unlink of the dentry, which removes both the existing dir item key and the dir index key. Therefore logging just dir index keys is enough to ensure dentry deletions are correctly replayed;
3) For dentry replacements: they work when we log only dir index keys and this is mostly due to a combination of 1) and 2). If we replace a dentry with name "foobar" to point from inode A to inode B, then we know the dir index key for the new dentry is different from the old one, as it has an index number (key offset) larger than the old one. This results in replaying a deletion, through replay_dir_deletes(), that causes the old dentry to be removed, both the dir item key and the dir index key, as mentioned at 2). Then when processing the new dir index key, we add the new dentry, adding both a new dir item key and a new index key pointing to inode B, as stated in 1).
The forward compatibility, the ability for a patched kernel to replay a log created by an older, unpatched kernel, comes from the changes required for making sure we are able to replay a log that only contains dir index keys - we simply ignore every dir item key we find.
So modify directory logging to log only dir index items, and modify the log replay process to ignore dir item keys, from log trees created by an unpatched kernel, and process only with dir index keys. This reduces the amount of logged metadata by about half, and therefore the time spent logging or fsyncing large directories (less CPU time and less IO).
The following test script was used to measure this change:
#!/bin/bash
DEV=/dev/nvme0n1 MNT=/mnt/nvme0n1
NUM_NEW_FILES=1000000 NUM_FILE_DELETES=10000
mkfs.btrfs -f $DEV mount -o ssd $DEV $MNT
mkdir $MNT/testdir
for ((i = 1; i <= $NUM_NEW_FILES; i++)); do echo -n > $MNT/testdir/file_$i done
start=$(date +%s%N) xfs_io -c "fsync" $MNT/testdir end=$(date +%s%N)
dur=$(( (end - start) / 1000000 )) echo "dir fsync took $dur ms after adding $NUM_NEW_FILES files"
# sync to force transaction commit and wipeout the log. sync
del_inc=$(( $NUM_NEW_FILES / $NUM_FILE_DELETES )) for ((i = 1; i <= $NUM_NEW_FILES; i += $del_inc)); do rm -f $MNT/testdir/file_$i done
start=$(date +%s%N) xfs_io -c "fsync" $MNT/testdir end=$(date +%s%N)
dur=$(( (end - start) / 1000000 )) echo "dir fsync took $dur ms after deleting $NUM_FILE_DELETES files" echo
umount $MNT
The tests were run on a physical machine, with a non-debug kernel (Debian's default kernel config), for different values of $NUM_NEW_FILES and $NUM_FILE_DELETES, and the results were the following:
** Before patch, NUM_NEW_FILES = 1 000 000, NUM_DELETE_FILES = 10 000 **
dir fsync took 8412 ms after adding 1000000 files dir fsync took 500 ms after deleting 10000 files
** After patch, NUM_NEW_FILES = 1 000 000, NUM_DELETE_FILES = 10 000 **
dir fsync took 4252 ms after adding 1000000 files (-49.5%) dir fsync took 269 ms after deleting 10000 files (-46.2%)
** Before patch, NUM_NEW_FILES = 100 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 745 ms after adding 100000 files dir fsync took 59 ms after deleting 1000 files
** After patch, NUM_NEW_FILES = 100 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 404 ms after adding 100000 files (-45.8%) dir fsync took 31 ms after deleting 1000 files (-47.5%)
** Before patch, NUM_NEW_FILES = 10 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 67 ms after adding 10000 files dir fsync took 9 ms after deleting 1000 files
** After patch, NUM_NEW_FILES = 10 000, NUM_DELETE_FILES = 1 000 **
dir fsync took 36 ms after adding 10000 files (-46.3%) dir fsync took 5 ms after deleting 1000 files (-44.4%)
** Before patch, NUM_NEW_FILES = 1 000, NUM_DELETE_FILES = 100 **
dir fsync took 9 ms after adding 1000 files dir fsync took 4 ms after deleting 100 files
** After patch, NUM_NEW_FILES = 1 000, NUM_DELETE_FILES = 100 **
dir fsync took 7 ms after adding 1000 files (-22.2%) dir fsync took 3 ms after deleting 100 files (-25.0%)
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.14.14, v5.14.13, v5.14.12, v5.14.11 |
|
#
47926ab5 |
| 08-Oct-2021 |
Qu Wenruo <wqu@suse.com> |
btrfs: rename btrfs_dio_private::logical_offset to file_offset
The naming of "logical_offset" can be confused with logical bytenr of the dio range.
In fact it's file offset, and the naming "file_of
btrfs: rename btrfs_dio_private::logical_offset to file_offset
The naming of "logical_offset" can be confused with logical bytenr of the dio range.
In fact it's file offset, and the naming "file_offset" is already widely used in all other sites.
Just do the rename to avoid confusion.
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66 |
|
#
dc287224 |
| 16-Sep-2021 |
Filipe Manana <fdmanana@suse.com> |
btrfs: keep track of the last logged keys when logging a directory
After the first time we log a directory in the current transaction, for each directory item in a changed leaf of the subvolume tree
btrfs: keep track of the last logged keys when logging a directory
After the first time we log a directory in the current transaction, for each directory item in a changed leaf of the subvolume tree, we have to check if we previously logged the item, in order to overwrite it in case its data changed or skip it in case its data hasn't changed.
Checking if we have logged each item before not only wastes times, but it also adds lock contention on the log tree. So in order to minimize the number of times we do such checks, keep track of the offset of the last key we logged for a directory and, on the next time we log the directory, skip the checks for any new keys that have an offset greater than the offset we have previously saved. This is specially effective for index keys, because the offset for these keys comes from a monotonically increasing counter.
This patch is part of a patchset comprised of the following 5 patches:
btrfs: remove root argument from btrfs_log_inode() and its callees btrfs: remove redundant log root assignment from log_dir_items() btrfs: factor out the copying loop of dir items from log_dir_items() btrfs: insert items in batches when logging a directory when possible btrfs: keep track of the last logged keys when logging a directory
This is patch 5/5.
The following test was used on a non-debug kernel to measure the impact it has on a directory fsync:
$ cat test-dir-fsync.sh #!/bin/bash
DEV=/dev/nvme0n1 MNT=/mnt/nvme0n1
NUM_NEW_FILES=100000 NUM_FILE_DELETES=1000
mkfs.btrfs -f $DEV mount -o ssd $DEV $MNT
mkdir $MNT/testdir
for ((i = 1; i <= $NUM_NEW_FILES; i++)); do echo -n > $MNT/testdir/file_$i done
# fsync the directory, this will log the new dir items and the inodes # they point to, because these are new inodes. start=$(date +%s%N) xfs_io -c "fsync" $MNT/testdir end=$(date +%s%N)
dur=$(( (end - start) / 1000000 )) echo "dir fsync took $dur ms after adding $NUM_NEW_FILES files"
# sync to force transaction commit and wipeout the log. sync
del_inc=$(( $NUM_NEW_FILES / $NUM_FILE_DELETES )) for ((i = 1; i <= $NUM_NEW_FILES; i += $del_inc)); do rm -f $MNT/testdir/file_$i done
# fsync the directory, this will only log dir items, there are no # dentries pointing to new inodes. start=$(date +%s%N) xfs_io -c "fsync" $MNT/testdir end=$(date +%s%N)
dur=$(( (end - start) / 1000000 )) echo "dir fsync took $dur ms after deleting $NUM_FILE_DELETES files"
umount $MNT
Test results with NUM_NEW_FILES set to 100 000 and 1 000 000:
**** before patchset, 100 000 files, 1000 deletes ****
dir fsync took 848 ms after adding 100000 files dir fsync took 175 ms after deleting 1000 files
**** after patchset, 100 000 files, 1000 deletes ****
dir fsync took 758 ms after adding 100000 files (-11.2%) dir fsync took 63 ms after deleting 1000 files (-94.1%)
**** before patchset, 1 000 000 files, 1000 deletes ****
dir fsync took 9945 ms after adding 1000000 files dir fsync took 473 ms after deleting 1000 files
**** after patchset, 1 000 000 files, 1000 deletes ****
dir fsync took 8677 ms after adding 1000000 files (-13.6%) dir fsync took 146 ms after deleting 1000 files (-105.6%)
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|