Revision tags: v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25, v4.19.24, v4.19.23, v4.19.22, v4.19.21, v4.19.20, v4.19.19, v4.19.18 |
|
#
1418bae1 |
| 23-Jan-2019 |
Qu Wenruo <wqu@suse.com> |
btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record [BUG] Btrfs/139 will fail with a high probability if the testing machine (VM) h
btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record [BUG] Btrfs/139 will fail with a high probability if the testing machine (VM) has only 2G RAM. Resulting the final write success while it should fail due to EDQUOT, and the fs will have quota exceeding the limit by 16K. The simplified reproducer will be: (needs a 2G ram VM) $ mkfs.btrfs -f $dev $ mount $dev $mnt $ btrfs subv create $mnt/subv $ btrfs quota enable $mnt $ btrfs quota rescan -w $mnt $ btrfs qgroup limit -e 1G $mnt/subv $ for i in $(seq -w 1 8); do xfs_io -f -c "pwrite 0 128M" $mnt/subv/file_$i > /dev/null echo "file $i written" > /dev/kmsg done $ sync $ btrfs qgroup show -pcre --raw $mnt The last pwrite will not trigger EDQUOT and final 'qgroup show' will show something like: qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/5 16384 16384 none none --- --- 0/256 1073758208 1073758208 none 1073741824 --- --- And 1073758208 is larger than > 1073741824. [CAUSE] It's a bug in btrfs qgroup data reserved space management. For quota limit, we must ensure that: reserved (data + metadata) + rfer/excl <= limit Since rfer/excl is only updated at transaction commmit time, reserved space needs to be taken special care. One important part of reserved space is data, and for a new data extent written to disk, we still need to take the reserved space until rfer/excl numbers get updated. Originally when an ordered extent finishes, we migrate the reserved qgroup data space from extent_io tree to delayed ref head of the data extent, expecting delayed ref will only be cleaned up at commit transaction time. However for small RAM machine, due to memory pressure dirty pages can be flushed back to disk without committing a transaction. The related events will be something like: file 1 written btrfs_finish_ordered_io: ino=258 ordered offset=0 len=54947840 btrfs_finish_ordered_io: ino=258 ordered offset=54947840 len=5636096 btrfs_finish_ordered_io: ino=258 ordered offset=61153280 len=57344 btrfs_finish_ordered_io: ino=258 ordered offset=61210624 len=8192 btrfs_finish_ordered_io: ino=258 ordered offset=60583936 len=569344 cleanup_ref_head: num_bytes=54947840 cleanup_ref_head: num_bytes=5636096 cleanup_ref_head: num_bytes=569344 cleanup_ref_head: num_bytes=57344 cleanup_ref_head: num_bytes=8192 ^^^^^^^^^^^^^^^^ This will free qgroup data reserved space file 2 written ... file 8 written cleanup_ref_head: num_bytes=8192 ... btrfs_commit_transaction <<< the only transaction committed during the test When file 2 is written, we have already freed 128M reserved qgroup data space for ino 258. Thus later write won't trigger EDQUOT. This allows us to write more data beyond qgroup limit. In my 2G ram VM, it could reach about 1.2G before hitting EDQUOT. [FIX] By moving reserved qgroup data space from btrfs_delayed_ref_head to btrfs_qgroup_extent_record, we can ensure that reserved qgroup data space won't be freed half way before commit transaction, thus fix the problem. Fixes: f64d5ca86821 ("btrfs: delayed_ref: Add new function to record reserved space into delayed ref") Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7 |
|
#
d7baffda |
| 03-Dec-2018 |
Josef Bacik <jbacik@fb.com> |
btrfs: add btrfs_delete_ref_head helper We do this dance in cleanup_ref_head and check_ref_cleanup, unify it into a helper and cleanup the calling functions. Reviewed-by: Omar S
btrfs: add btrfs_delete_ref_head helper We do this dance in cleanup_ref_head and check_ref_cleanup, unify it into a helper and cleanup the calling functions. Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15, v4.18.14 |
|
#
9e920a6f |
| 11-Oct-2018 |
Lu Fengqi <lufq.fnst@cn.fujitsu.com> |
btrfs: delayed-ref: pass delayed_refs directly to btrfs_delayed_ref_lock Since trans is only used for referring to delayed_refs, there is no need to pass it instead of delayed_refs to bt
btrfs: delayed-ref: pass delayed_refs directly to btrfs_delayed_ref_lock Since trans is only used for referring to delayed_refs, there is no need to pass it instead of delayed_refs to btrfs_delayed_ref_lock(). No functional change. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5637c74b |
| 11-Oct-2018 |
Lu Fengqi <lufq.fnst@cn.fujitsu.com> |
btrfs: delayed-ref: pass delayed_refs directly to btrfs_select_ref_head Since trans is only used for referring to delayed_refs, there is no need to pass it instead of delayed_refs to btr
btrfs: delayed-ref: pass delayed_refs directly to btrfs_select_ref_head Since trans is only used for referring to delayed_refs, there is no need to pass it instead of delayed_refs to btrfs_select_ref_head(). No functional change. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.18.13, v4.18.12, v4.18.11, v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5 |
|
#
e3d03965 |
| 22-Aug-2018 |
Liu Bo <bo.liu@linux.alibaba.com> |
Btrfs: delayed-refs: use rb_first_cached for ref_tree rb_first_cached() trades an extra pointer "leftmost" for doing the same job as rb_first() but in O(1). Functions manipulati
Btrfs: delayed-refs: use rb_first_cached for ref_tree rb_first_cached() trades an extra pointer "leftmost" for doing the same job as rb_first() but in O(1). Functions manipulating href->ref_tree need to get the first entry, this converts href->ref_tree to use rb_first_cached(). For more details about the optimization see patch "Btrfs: delayed-refs: use rb_first_cached for href_root". Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5c9d028b |
| 22-Aug-2018 |
Liu Bo <bo.liu@linux.alibaba.com> |
Btrfs: delayed-refs: use rb_first_cached for href_root rb_first_cached() trades an extra pointer "leftmost" for doing the same job as rb_first() but in O(1). Functions manipulat
Btrfs: delayed-refs: use rb_first_cached for href_root rb_first_cached() trades an extra pointer "leftmost" for doing the same job as rb_first() but in O(1). Functions manipulating href_root need to get the first entry, this converts href_root to use rb_first_cached(). This patch is first in the sequenct of similar updates to other rbtrees and this is analysis of the expected behaviour and improvements. There's a common pattern: while (node = rb_first) { entry = rb_entry(node) next = rb_next(node) rb_erase(node) cleanup(entry) } rb_first needs to traverse the tree up to logN depth, rb_erase can completely reshuffle the tree. With the caching we'll skip the traversal in rb_first. That's a cached memory access vs looped pointer dereference trade-off that IMHO has a clear winner. Measurements show there's not much difference in a sample tree with 10000 nodes: 4.5s / rb_first and 4.8s / rb_first_cached. Real effects of caching and pointer chasing are unpredictable though. Further optimzations can be done to avoid the expensive rb_erase step. In some cases it's ok to process the nodes in any order, so the tree can be traversed in post-order, not rebalancing the children nodes and just calling free. Care must be taken regarding the next node. Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog from mail discussions ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3 |
|
#
88a979c6 |
| 20-Jun-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Remove fs_info from btrfs_add_delayed_data_ref This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes.
btrfs: Remove fs_info from btrfs_add_delayed_data_ref This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
44e1c47d |
| 20-Jun-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Remove fs_info from btrfs_add_delayed_tree_ref This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes.
btrfs: Remove fs_info from btrfs_add_delayed_tree_ref This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.17.2, v4.17.1, v4.17 |
|
#
be97f133 |
| 19-Apr-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Drop fs_info parameter from btrfs_merge_delayed_refs It's provided by the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Ster
btrfs: Drop fs_info parameter from btrfs_merge_delayed_refs It's provided by the transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
41d0bd3b |
| 04-Apr-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Drop delayed_refs argument from btrfs_check_delayed_seq It's used to print its pointer in a debug statement but doesn't really bring any useful information to the error message.
btrfs: Drop delayed_refs argument from btrfs_check_delayed_seq It's used to print its pointer in a debug statement but doesn't really bring any useful information to the error message. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5e388e95 |
| 18-Apr-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Fix race condition between delayed refs and blockgroup removal When the delayed refs for a head are all run, eventually cleanup_ref_head is called which (in case of deletion) obta
btrfs: Fix race condition between delayed refs and blockgroup removal When the delayed refs for a head are all run, eventually cleanup_ref_head is called which (in case of deletion) obtains a reference for the relevant btrfs_space_info struct by querying the bg for the range. This is problematic because when the last extent of a bg is deleted a race window emerges between removal of that bg and the subsequent invocation of cleanup_ref_head. This can result in cache being null and either a null pointer dereference or assertion failure. task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000 RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs] RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292 RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8 RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001 R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0 R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0 FS: 00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs] btrfs_run_delayed_refs+0x68/0x250 [btrfs] btrfs_should_end_transaction+0x42/0x60 [btrfs] btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs] btrfs_evict_inode+0x4c6/0x5c0 [btrfs] evict+0xc6/0x190 do_unlinkat+0x19c/0x300 do_syscall_64+0x74/0x140 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fbf589c57a7 To fix this, introduce a new flag "is_system" to head_ref structs, which is populated at insertion time. This allows to decouple the querying for the spaceinfo from querying the possibly deleted bg. Fixes: d7eae3403f46 ("Btrfs: rework delayed ref total_bytes_pinned accounting") CC: stable@vger.kernel.org # 4.14+ Suggested-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
9888c340 |
| 03-Apr-2018 |
David Sterba <dsterba@suse.com> |
btrfs: replace GPL boilerplate by SPDX -- headers Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Ad
btrfs: replace GPL boilerplate by SPDX -- headers Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Unify the include protection macros to match the file names. Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.16 |
|
#
e67c718b |
| 19-Feb-2018 |
David Sterba <dsterba@suse.com> |
btrfs: add more __cold annotations The __cold functions are placed to a special section, as they're expected to be called rarely. This could help i-cache prefetches or help compiler
btrfs: add more __cold annotations The __cold functions are placed to a special section, as they're expected to be called rarely. This could help i-cache prefetches or help compiler to decide which branches are more/less likely to be taken without any other annotations needed. Though we can't add more __exit annotations, it's still possible to add __cold (that's also added with __exit). That way the following function categories are tagged: - printf wrappers, error messages - exit helpers Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.15, v4.13.16, v4.14 |
|
#
f5c29bd9 |
| 02-Nov-2017 |
Liu Bo <bo.li.liu@oracle.com> |
Btrfs: add __init macro to btrfs init functions Adding __init macro gives kernel a hint that this function is only used during the initialization phase and its memory resources can be fr
Btrfs: add __init macro to btrfs init functions Adding __init macro gives kernel a hint that this function is only used during the initialization phase and its memory resources can be freed up after. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
0e0adbcf |
| 19-Oct-2017 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: track refs in a rb_tree instead of a list If we get a significant amount of delayed refs for a single block (think modifying multiple snapshots) we can end up spending an ungodly
btrfs: track refs in a rb_tree instead of a list If we get a significant amount of delayed refs for a single block (think modifying multiple snapshots) we can end up spending an ungodly amount of time looping through all of the entries trying to see if they can be merged. This is because we only add them to a list, so we have O(2n) for every ref head. This doesn't make any sense as we likely have refs for different roots, and so they cannot be merged. Tracking in a tree will allow us to break as soon as we hit an entry that doesn't match, making our worst case O(n). With this we can also merge entries more easily. Before we had to hope that matching refs were on the ends of our list, but with the tree we can search down to exact matches and merge them at insert time. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.13.5 |
|
#
d278850e |
| 29-Sep-2017 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: remove delayed_ref_node from ref_head This is just excessive information in the ref_head, and makes the code complicated. It is a relic from when we had the heads and the refs in
btrfs: remove delayed_ref_node from ref_head This is just excessive information in the ref_head, and makes the code complicated. It is a relic from when we had the heads and the refs in the same tree, which is no longer the case. With this removal I've cleaned up a bunch of the cruft around this old assumption as well. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.13, v4.12 |
|
#
7be07912 |
| 06-Jun-2017 |
Omar Sandoval <osandov@fb.com> |
Btrfs: return old and new total ref mods when adding delayed refs We need this to decide when to account pinned bytes. Signed-off-by: Omar Sandoval <osandov@fb.com> Tested-by: H
Btrfs: return old and new total ref mods when adding delayed refs We need this to decide when to account pinned bytes. Signed-off-by: Omar Sandoval <osandov@fb.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.10.17, v4.10.16, v4.10.15, v4.10.14, v4.10.13, v4.10.12, v4.10.11, v4.10.10, v4.10.9, v4.10.8, v4.10.7, v4.10.6, v4.10.5, v4.10.4, v4.10.3, v4.10.2 |
|
#
6df8cdf5 |
| 03-Mar-2017 |
Elena Reshetova <elena.reshetova@intel.com> |
btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a referen
btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.10.1, v4.10 |
|
#
f72ad18e |
| 30-Jan-2017 |
Liu Bo <bo.li.liu@oracle.com> |
Btrfs: pass delayed_refs directly to btrfs_find_delayed_ref_head All we need is @delayed_refs, all callers have get it ahead of calling btrfs_find_delayed_ref_head since lock needs to be
Btrfs: pass delayed_refs directly to btrfs_find_delayed_ref_head All we need is @delayed_refs, all callers have get it ahead of calling btrfs_find_delayed_ref_head since lock needs to be acquired firstly, there is no reason to deference it again inside the function. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
fef394f7 |
| 13-Dec-2016 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: drop unused extent_op arg from btrfs_add_delayed_data_ref btrfs_add_delayed_data_ref is always called with a NULL extent_op, so let's drop the argument. Signed-off-by: Je
btrfs: drop unused extent_op arg from btrfs_add_delayed_data_ref btrfs_add_delayed_data_ref is always called with a NULL extent_op, so let's drop the argument. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5f52a2c5 |
| 13-Dec-2016 |
Chris Mason <clm@fb.com> |
Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10 Patches queued up by Filipe: The most important change is still the
Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10 Patches queued up by Filipe: The most important change is still the fix for the extent tree corruption that happens due to balance when qgroups are enabled (a regression introduced in 4.7 by a fix for a regression from the last qgroups rework). This has been hitting SLE and openSUSE users and QA very badly, where transactions keep getting aborted when running delayed references leaving the root filesystem in RO mode and nearly unusable. There are fixes here that allow us to run xfstests again with the integrity checker enabled, which has been impossible since 4.8 (apparently I'm the only one running xfstests with the integrity checker enabled, which is useful to validate dirtied leafs, like checking if there are keys out of order, etc). The rest are just some trivial fixes, most of them tagged for stable, and two cleanups. Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
Revision tags: v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31, v4.4.30, v4.4.29, v4.4.28 |
|
#
1d57ee94 |
| 26-Oct-2016 |
Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> |
btrfs: improve delayed refs iterations This issue was found when I tried to delete a heavily reflinked file, when deleting such files, other transaction operation will not have a cha
btrfs: improve delayed refs iterations This issue was found when I tried to delete a heavily reflinked file, when deleting such files, other transaction operation will not have a chance to make progress, for example, start_transaction() will blocked in wait_current_trans(root) for long time, sometimes it even triggers soft lockups, and the time taken to delete such heavily reflinked file is also very large, often hundreds of seconds. Using perf top, it reports that: PerfTop: 7416 irqs/sec kernel:99.8% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) --------------------------------------------------------------------------------------- 84.37% [btrfs] [k] __btrfs_run_delayed_refs.constprop.80 11.02% [kernel] [k] delay_tsc 0.79% [kernel] [k] _raw_spin_unlock_irq 0.78% [kernel] [k] _raw_spin_unlock_irqrestore 0.45% [kernel] [k] do_raw_spin_lock 0.18% [kernel] [k] __slab_alloc It seems __btrfs_run_delayed_refs() took most cpu time, after some debug work, I found it's select_delayed_ref() causing this issue, for a delayed head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but select_delayed_ref() will firstly try to iterate node list to find BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and waste much time. To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head, then in select_delayed_ref(), if this list is not empty, we can directly use nodes in this list. With this patch, it just took about 10~15 seconds to delte the same file. Now using perf top, it reports that: PerfTop: 2734 irqs/sec kernel:99.5% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) ---------------------------------------------------------------------------------------- 20.74% [kernel] [k] _raw_spin_unlock_irqrestore 16.33% [kernel] [k] __slab_alloc 5.41% [kernel] [k] lock_acquired 4.42% [kernel] [k] lock_acquire 4.05% [kernel] [k] lock_release 3.37% [kernel] [k] _raw_spin_unlock_irq For normal files, this patch also gives help, at least we do not need to iterate whole list to found BTRFS_ADD_DELAYED_REF nodes. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
2a2a83de |
| 02-Nov-2016 |
Filipe Manana <fdmanana@suse.com> |
Btrfs: remove rb_node field from the delayed ref node structure After the last big change in the delayed references code that was needed for the last qgroups rework, the red black tree n
Btrfs: remove rb_node field from the delayed ref node structure After the last big change in the delayed references code that was needed for the last qgroups rework, the red black tree node field of struct btrfs_delayed_ref_node is no longer used, so just remove it, this helps us save some memory (since struct rb_node is 24 bytes on x86_64) for these structures. Signed-off-by: Filipe Manana <fdmanana@suse.com>
show more ...
|
Revision tags: v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4, v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18, v4.4.17, openbmc-4.4-20160804-1 |
|
#
e6571499 |
| 03-Aug-2016 |
Filipe Manana <fdmanana@suse.com> |
Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve() No longer used as of commit 5846a3c26873 ("btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans").
Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve() No longer used as of commit 5846a3c26873 ("btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans"). Signed-off-by: Filipe Manana <fdmanana@suse.com>
show more ...
|
Revision tags: v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1, openbmc-20160713-1, v4.4.15, v4.6.4, v4.6.3, v4.4.14, v4.6.2, v4.4.13, openbmc-20160606-1, v4.6.1, v4.4.12, openbmc-20160521-1 |
|
#
01327610 |
| 19-May-2016 |
Nicholas D Steeves <nsteeves@gmail.com> |
btrfs: fix string and comment grammatical issues and typos Signed-off-by: Nicholas D Steeves <nsteeves@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
|