History log of /openbmc/linux/fs/btrfs/delayed-ref.h (Results 101 – 125 of 160)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25, v4.19.24, v4.19.23, v4.19.22, v4.19.21, v4.19.20, v4.19.19, v4.19.18
# 1418bae1 23-Jan-2019 Qu Wenruo <wqu@suse.com>

btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record

[BUG]
Btrfs/139 will fail with a high probability if the testing machine (VM)
h

btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record

[BUG]
Btrfs/139 will fail with a high probability if the testing machine (VM)
has only 2G RAM.

Resulting the final write success while it should fail due to EDQUOT,
and the fs will have quota exceeding the limit by 16K.

The simplified reproducer will be: (needs a 2G ram VM)

$ mkfs.btrfs -f $dev
$ mount $dev $mnt

$ btrfs subv create $mnt/subv
$ btrfs quota enable $mnt
$ btrfs quota rescan -w $mnt
$ btrfs qgroup limit -e 1G $mnt/subv

$ for i in $(seq -w 1 8); do
xfs_io -f -c "pwrite 0 128M" $mnt/subv/file_$i > /dev/null
echo "file $i written" > /dev/kmsg
done
$ sync
$ btrfs qgroup show -pcre --raw $mnt

The last pwrite will not trigger EDQUOT and final 'qgroup show' will
show something like:

qgroupid rfer excl max_rfer max_excl parent child
-------- ---- ---- -------- -------- ------ -----
0/5 16384 16384 none none --- ---
0/256 1073758208 1073758208 none 1073741824 --- ---

And 1073758208 is larger than
> 1073741824.

[CAUSE]
It's a bug in btrfs qgroup data reserved space management.

For quota limit, we must ensure that:
reserved (data + metadata) + rfer/excl <= limit

Since rfer/excl is only updated at transaction commmit time, reserved
space needs to be taken special care.

One important part of reserved space is data, and for a new data extent
written to disk, we still need to take the reserved space until
rfer/excl numbers get updated.

Originally when an ordered extent finishes, we migrate the reserved
qgroup data space from extent_io tree to delayed ref head of the data
extent, expecting delayed ref will only be cleaned up at commit
transaction time.

However for small RAM machine, due to memory pressure dirty pages can be
flushed back to disk without committing a transaction.

The related events will be something like:

file 1 written
btrfs_finish_ordered_io: ino=258 ordered offset=0 len=54947840
btrfs_finish_ordered_io: ino=258 ordered offset=54947840 len=5636096
btrfs_finish_ordered_io: ino=258 ordered offset=61153280 len=57344
btrfs_finish_ordered_io: ino=258 ordered offset=61210624 len=8192
btrfs_finish_ordered_io: ino=258 ordered offset=60583936 len=569344
cleanup_ref_head: num_bytes=54947840
cleanup_ref_head: num_bytes=5636096
cleanup_ref_head: num_bytes=569344
cleanup_ref_head: num_bytes=57344
cleanup_ref_head: num_bytes=8192
^^^^^^^^^^^^^^^^ This will free qgroup data reserved space
file 2 written
...
file 8 written
cleanup_ref_head: num_bytes=8192
...
btrfs_commit_transaction <<< the only transaction committed during
the test

When file 2 is written, we have already freed 128M reserved qgroup data
space for ino 258. Thus later write won't trigger EDQUOT.

This allows us to write more data beyond qgroup limit.

In my 2G ram VM, it could reach about 1.2G before hitting EDQUOT.

[FIX]
By moving reserved qgroup data space from btrfs_delayed_ref_head to
btrfs_qgroup_extent_record, we can ensure that reserved qgroup data
space won't be freed half way before commit transaction, thus fix the
problem.

Fixes: f64d5ca86821 ("btrfs: delayed_ref: Add new function to record reserved space into delayed ref")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7
# d7baffda 03-Dec-2018 Josef Bacik <jbacik@fb.com>

btrfs: add btrfs_delete_ref_head helper

We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
into a helper and cleanup the calling functions.

Reviewed-by: Omar S

btrfs: add btrfs_delete_ref_head helper

We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
into a helper and cleanup the calling functions.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15, v4.18.14
# 9e920a6f 11-Oct-2018 Lu Fengqi <lufq.fnst@cn.fujitsu.com>

btrfs: delayed-ref: pass delayed_refs directly to btrfs_delayed_ref_lock

Since trans is only used for referring to delayed_refs, there is no need
to pass it instead of delayed_refs to bt

btrfs: delayed-ref: pass delayed_refs directly to btrfs_delayed_ref_lock

Since trans is only used for referring to delayed_refs, there is no need
to pass it instead of delayed_refs to btrfs_delayed_ref_lock().

No functional change.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 5637c74b 11-Oct-2018 Lu Fengqi <lufq.fnst@cn.fujitsu.com>

btrfs: delayed-ref: pass delayed_refs directly to btrfs_select_ref_head

Since trans is only used for referring to delayed_refs, there is no need
to pass it instead of delayed_refs to btr

btrfs: delayed-ref: pass delayed_refs directly to btrfs_select_ref_head

Since trans is only used for referring to delayed_refs, there is no need
to pass it instead of delayed_refs to btrfs_select_ref_head(). No
functional change.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v4.18.13, v4.18.12, v4.18.11, v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5
# e3d03965 22-Aug-2018 Liu Bo <bo.liu@linux.alibaba.com>

Btrfs: delayed-refs: use rb_first_cached for ref_tree

rb_first_cached() trades an extra pointer "leftmost" for doing the same
job as rb_first() but in O(1).

Functions manipulati

Btrfs: delayed-refs: use rb_first_cached for ref_tree

rb_first_cached() trades an extra pointer "leftmost" for doing the same
job as rb_first() but in O(1).

Functions manipulating href->ref_tree need to get the first entry, this
converts href->ref_tree to use rb_first_cached().

For more details about the optimization see patch "Btrfs: delayed-refs:
use rb_first_cached for href_root".

Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 5c9d028b 22-Aug-2018 Liu Bo <bo.liu@linux.alibaba.com>

Btrfs: delayed-refs: use rb_first_cached for href_root

rb_first_cached() trades an extra pointer "leftmost" for doing the same
job as rb_first() but in O(1).

Functions manipulat

Btrfs: delayed-refs: use rb_first_cached for href_root

rb_first_cached() trades an extra pointer "leftmost" for doing the same
job as rb_first() but in O(1).

Functions manipulating href_root need to get the first entry, this
converts href_root to use rb_first_cached().

This patch is first in the sequenct of similar updates to other rbtrees
and this is analysis of the expected behaviour and improvements.

There's a common pattern:

while (node = rb_first) {
entry = rb_entry(node)
next = rb_next(node)
rb_erase(node)
cleanup(entry)
}

rb_first needs to traverse the tree up to logN depth, rb_erase can
completely reshuffle the tree. With the caching we'll skip the traversal
in rb_first. That's a cached memory access vs looped pointer
dereference trade-off that IMHO has a clear winner.

Measurements show there's not much difference in a sample tree with
10000 nodes: 4.5s / rb_first and 4.8s / rb_first_cached. Real effects of
caching and pointer chasing are unpredictable though.

Further optimzations can be done to avoid the expensive rb_erase step.
In some cases it's ok to process the nodes in any order, so the tree can
be traversed in post-order, not rebalancing the children nodes and just
calling free. Care must be taken regarding the next node.

Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog from mail discussions ]
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3
# 88a979c6 20-Jun-2018 Nikolay Borisov <nborisov@suse.com>

btrfs: Remove fs_info from btrfs_add_delayed_data_ref

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

btrfs: Remove fs_info from btrfs_add_delayed_data_ref

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 44e1c47d 20-Jun-2018 Nikolay Borisov <nborisov@suse.com>

btrfs: Remove fs_info from btrfs_add_delayed_tree_ref

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

btrfs: Remove fs_info from btrfs_add_delayed_tree_ref

This function is always called with a valid transaction handle from
where fs_info can be referenced. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v4.17.2, v4.17.1, v4.17
# be97f133 19-Apr-2018 Nikolay Borisov <nborisov@suse.com>

btrfs: Drop fs_info parameter from btrfs_merge_delayed_refs

It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Ster

btrfs: Drop fs_info parameter from btrfs_merge_delayed_refs

It's provided by the transaction handle.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 41d0bd3b 04-Apr-2018 Nikolay Borisov <nborisov@suse.com>

btrfs: Drop delayed_refs argument from btrfs_check_delayed_seq

It's used to print its pointer in a debug statement but doesn't really
bring any useful information to the error message.

btrfs: Drop delayed_refs argument from btrfs_check_delayed_seq

It's used to print its pointer in a debug statement but doesn't really
bring any useful information to the error message.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 5e388e95 18-Apr-2018 Nikolay Borisov <nborisov@suse.com>

btrfs: Fix race condition between delayed refs and blockgroup removal

When the delayed refs for a head are all run, eventually
cleanup_ref_head is called which (in case of deletion) obta

btrfs: Fix race condition between delayed refs and blockgroup removal

When the delayed refs for a head are all run, eventually
cleanup_ref_head is called which (in case of deletion) obtains a
reference for the relevant btrfs_space_info struct by querying the bg
for the range. This is problematic because when the last extent of a
bg is deleted a race window emerges between removal of that bg and the
subsequent invocation of cleanup_ref_head. This can result in cache being null
and either a null pointer dereference or assertion failure.

task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000
RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292
RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8
RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001
R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0
R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0
FS: 00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
btrfs_run_delayed_refs+0x68/0x250 [btrfs]
btrfs_should_end_transaction+0x42/0x60 [btrfs]
btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
evict+0xc6/0x190
do_unlinkat+0x19c/0x300
do_syscall_64+0x74/0x140
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fbf589c57a7

To fix this, introduce a new flag "is_system" to head_ref structs,
which is populated at insertion time. This allows to decouple the
querying for the spaceinfo from querying the possibly deleted bg.

Fixes: d7eae3403f46 ("Btrfs: rework delayed ref total_bytes_pinned accounting")
CC: stable@vger.kernel.org # 4.14+
Suggested-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 9888c340 03-Apr-2018 David Sterba <dsterba@suse.com>

btrfs: replace GPL boilerplate by SPDX -- headers

Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Ad

btrfs: replace GPL boilerplate by SPDX -- headers

Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.

Unify the include protection macros to match the file names.

Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v4.16
# e67c718b 19-Feb-2018 David Sterba <dsterba@suse.com>

btrfs: add more __cold annotations

The __cold functions are placed to a special section, as they're
expected to be called rarely. This could help i-cache prefetches or help
compiler

btrfs: add more __cold annotations

The __cold functions are placed to a special section, as they're
expected to be called rarely. This could help i-cache prefetches or help
compiler to decide which branches are more/less likely to be taken
without any other annotations needed.

Though we can't add more __exit annotations, it's still possible to add
__cold (that's also added with __exit). That way the following function
categories are tagged:

- printf wrappers, error messages
- exit helpers

Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v4.15, v4.13.16, v4.14
# f5c29bd9 02-Nov-2017 Liu Bo <bo.li.liu@oracle.com>

Btrfs: add __init macro to btrfs init functions

Adding __init macro gives kernel a hint that this function is only used
during the initialization phase and its memory resources can be fr

Btrfs: add __init macro to btrfs init functions

Adding __init macro gives kernel a hint that this function is only used
during the initialization phase and its memory resources can be freed up
after.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 0e0adbcf 19-Oct-2017 Josef Bacik <josef@toxicpanda.com>

btrfs: track refs in a rb_tree instead of a list

If we get a significant amount of delayed refs for a single block (think
modifying multiple snapshots) we can end up spending an ungodly

btrfs: track refs in a rb_tree instead of a list

If we get a significant amount of delayed refs for a single block (think
modifying multiple snapshots) we can end up spending an ungodly amount
of time looping through all of the entries trying to see if they can be
merged. This is because we only add them to a list, so we have O(2n)
for every ref head. This doesn't make any sense as we likely have refs
for different roots, and so they cannot be merged. Tracking in a tree
will allow us to break as soon as we hit an entry that doesn't match,
making our worst case O(n).

With this we can also merge entries more easily. Before we had to hope
that matching refs were on the ends of our list, but with the tree we
can search down to exact matches and merge them at insert time.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v4.13.5
# d278850e 29-Sep-2017 Josef Bacik <josef@toxicpanda.com>

btrfs: remove delayed_ref_node from ref_head

This is just excessive information in the ref_head, and makes the code
complicated. It is a relic from when we had the heads and the refs in

btrfs: remove delayed_ref_node from ref_head

This is just excessive information in the ref_head, and makes the code
complicated. It is a relic from when we had the heads and the refs in
the same tree, which is no longer the case. With this removal I've
cleaned up a bunch of the cruft around this old assumption as well.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v4.13, v4.12
# 7be07912 06-Jun-2017 Omar Sandoval <osandov@fb.com>

Btrfs: return old and new total ref mods when adding delayed refs

We need this to decide when to account pinned bytes.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: H

Btrfs: return old and new total ref mods when adding delayed refs

We need this to decide when to account pinned bytes.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v4.10.17, v4.10.16, v4.10.15, v4.10.14, v4.10.13, v4.10.12, v4.10.11, v4.10.10, v4.10.9, v4.10.8, v4.10.7, v4.10.6, v4.10.5, v4.10.4, v4.10.3, v4.10.2
# 6df8cdf5 03-Mar-2017 Elena Reshetova <elena.reshetova@intel.com>

btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_t

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a referen

btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_t

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v4.10.1, v4.10
# f72ad18e 30-Jan-2017 Liu Bo <bo.li.liu@oracle.com>

Btrfs: pass delayed_refs directly to btrfs_find_delayed_ref_head

All we need is @delayed_refs, all callers have get it ahead of calling
btrfs_find_delayed_ref_head since lock needs to be

Btrfs: pass delayed_refs directly to btrfs_find_delayed_ref_head

All we need is @delayed_refs, all callers have get it ahead of calling
btrfs_find_delayed_ref_head since lock needs to be acquired firstly,
there is no reason to deference it again inside the function.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# fef394f7 13-Dec-2016 Jeff Mahoney <jeffm@suse.com>

btrfs: drop unused extent_op arg from btrfs_add_delayed_data_ref

btrfs_add_delayed_data_ref is always called with a NULL extent_op,
so let's drop the argument.

Signed-off-by: Je

btrfs: drop unused extent_op arg from btrfs_add_delayed_data_ref

btrfs_add_delayed_data_ref is always called with a NULL extent_op,
so let's drop the argument.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 5f52a2c5 13-Dec-2016 Chris Mason <clm@fb.com>

Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10

Patches queued up by Filipe:

The most important change is still the

Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10

Patches queued up by Filipe:

The most important change is still the fix for the extent tree
corruption that happens due to balance when qgroups are enabled (a
regression introduced in 4.7 by a fix for a regression from the last
qgroups rework). This has been hitting SLE and openSUSE users and QA
very badly, where transactions keep getting aborted when running
delayed references leaving the root filesystem in RO mode and nearly
unusable. There are fixes here that allow us to run xfstests again
with the integrity checker enabled, which has been impossible since 4.8
(apparently I'm the only one running xfstests with the integrity
checker enabled, which is useful to validate dirtied leafs, like
checking if there are keys out of order, etc). The rest are just some
trivial fixes, most of them tagged for stable, and two cleanups.

Signed-off-by: Chris Mason <clm@fb.com>

show more ...


Revision tags: v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31, v4.4.30, v4.4.29, v4.4.28
# 1d57ee94 26-Oct-2016 Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>

btrfs: improve delayed refs iterations

This issue was found when I tried to delete a heavily reflinked file,
when deleting such files, other transaction operation will not have a
cha

btrfs: improve delayed refs iterations

This issue was found when I tried to delete a heavily reflinked file,
when deleting such files, other transaction operation will not have a
chance to make progress, for example, start_transaction() will blocked
in wait_current_trans(root) for long time, sometimes it even triggers
soft lockups, and the time taken to delete such heavily reflinked file
is also very large, often hundreds of seconds. Using perf top, it reports
that:

PerfTop: 7416 irqs/sec kernel:99.8% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs)
---------------------------------------------------------------------------------------
84.37% [btrfs] [k] __btrfs_run_delayed_refs.constprop.80
11.02% [kernel] [k] delay_tsc
0.79% [kernel] [k] _raw_spin_unlock_irq
0.78% [kernel] [k] _raw_spin_unlock_irqrestore
0.45% [kernel] [k] do_raw_spin_lock
0.18% [kernel] [k] __slab_alloc
It seems __btrfs_run_delayed_refs() took most cpu time, after some debug
work, I found it's select_delayed_ref() causing this issue, for a delayed
head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but
select_delayed_ref() will firstly try to iterate node list to find
BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and
waste much time.

To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head,
then in select_delayed_ref(), if this list is not empty, we can directly use
nodes in this list. With this patch, it just took about 10~15 seconds to
delte the same file. Now using perf top, it reports that:

PerfTop: 2734 irqs/sec kernel:99.5% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs)
----------------------------------------------------------------------------------------

20.74% [kernel] [k] _raw_spin_unlock_irqrestore
16.33% [kernel] [k] __slab_alloc
5.41% [kernel] [k] lock_acquired
4.42% [kernel] [k] lock_acquire
4.05% [kernel] [k] lock_release
3.37% [kernel] [k] _raw_spin_unlock_irq

For normal files, this patch also gives help, at least we do not need to
iterate whole list to found BTRFS_ADD_DELAYED_REF nodes.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 2a2a83de 02-Nov-2016 Filipe Manana <fdmanana@suse.com>

Btrfs: remove rb_node field from the delayed ref node structure

After the last big change in the delayed references code that was needed
for the last qgroups rework, the red black tree n

Btrfs: remove rb_node field from the delayed ref node structure

After the last big change in the delayed references code that was needed
for the last qgroups rework, the red black tree node field of struct
btrfs_delayed_ref_node is no longer used, so just remove it, this helps
us save some memory (since struct rb_node is 24 bytes on x86_64) for
these structures.

Signed-off-by: Filipe Manana <fdmanana@suse.com>

show more ...


Revision tags: v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4, v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18, v4.4.17, openbmc-4.4-20160804-1
# e6571499 03-Aug-2016 Filipe Manana <fdmanana@suse.com>

Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve()

No longer used as of commit 5846a3c26873 ("btrfs: qgroup: Fix a race in
delayed_ref which leads to abort trans").

Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve()

No longer used as of commit 5846a3c26873 ("btrfs: qgroup: Fix a race in
delayed_ref which leads to abort trans").

Signed-off-by: Filipe Manana <fdmanana@suse.com>

show more ...


Revision tags: v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1, openbmc-20160713-1, v4.4.15, v4.6.4, v4.6.3, v4.4.14, v4.6.2, v4.4.13, openbmc-20160606-1, v4.6.1, v4.4.12, openbmc-20160521-1
# 01327610 19-May-2016 Nicholas D Steeves <nsteeves@gmail.com>

btrfs: fix string and comment grammatical issues and typos

Signed-off-by: Nicholas D Steeves <nsteeves@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>


1234567