#
66182050 |
| 12-Oct-2021 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: add additional parameters to btrfs_init_tree_ref/btrfs_init_data_ref
[ Upstream commit f42c5da6c12e990d8ec415199600b4d593c63bf5 ]
In order to make 'real_root' used only in ref-verify it's re
btrfs: add additional parameters to btrfs_init_tree_ref/btrfs_init_data_ref
[ Upstream commit f42c5da6c12e990d8ec415199600b4d593c63bf5 ]
In order to make 'real_root' used only in ref-verify it's required to have the necessary context to perform the same checks that this member is used for. So add 'mod_root' which will contain the root on behalf of which a delayed ref was created and a 'skip_group' parameter which will contain callsite-specific override of skip_qgroup.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
66182050 |
| 12-Oct-2021 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: add additional parameters to btrfs_init_tree_ref/btrfs_init_data_ref
[ Upstream commit f42c5da6c12e990d8ec415199600b4d593c63bf5 ]
In order to make 'real_root' used only in ref-verify it's re
btrfs: add additional parameters to btrfs_init_tree_ref/btrfs_init_data_ref
[ Upstream commit f42c5da6c12e990d8ec415199600b4d593c63bf5 ]
In order to make 'real_root' used only in ref-verify it's required to have the necessary context to perform the same checks that this member is used for. So add 'mod_root' which will contain the root on behalf of which a delayed ref was created and a 'skip_group' parameter which will contain callsite-specific override of skip_qgroup.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
66182050 |
| 12-Oct-2021 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: add additional parameters to btrfs_init_tree_ref/btrfs_init_data_ref
[ Upstream commit f42c5da6c12e990d8ec415199600b4d593c63bf5 ]
In order to make 'real_root' used only in ref-verify it's re
btrfs: add additional parameters to btrfs_init_tree_ref/btrfs_init_data_ref
[ Upstream commit f42c5da6c12e990d8ec415199600b4d593c63bf5 ]
In order to make 'real_root' used only in ref-verify it's required to have the necessary context to perform the same checks that this member is used for. So add 'mod_root' which will contain the root on behalf of which a delayed ref was created and a 'skip_group' parameter which will contain callsite-specific override of skip_qgroup.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
66182050 |
| 12-Oct-2021 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: add additional parameters to btrfs_init_tree_ref/btrfs_init_data_ref
[ Upstream commit f42c5da6c12e990d8ec415199600b4d593c63bf5 ]
In order to make 'real_root' used only in ref-verify it's re
btrfs: add additional parameters to btrfs_init_tree_ref/btrfs_init_data_ref
[ Upstream commit f42c5da6c12e990d8ec415199600b4d593c63bf5 ]
In order to make 'real_root' used only in ref-verify it's required to have the necessary context to perform the same checks that this member is used for. So add 'mod_root' which will contain the root on behalf of which a delayed ref was created and a 'skip_group' parameter which will contain callsite-specific override of skip_qgroup.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
e19eb11f |
| 18-Dec-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: only let one thread pre-flush delayed refs in commit
I've been running a stress test that runs 20 workers in their own subvolume, which are running an fsstress instance with 4 threads per wor
btrfs: only let one thread pre-flush delayed refs in commit
I've been running a stress test that runs 20 workers in their own subvolume, which are running an fsstress instance with 4 threads per worker, which is 80 total fsstress threads. In addition to this I'm running balance in the background as well as creating and deleting snapshots. This test takes around 12 hours to run normally, going slower and slower as the test goes on.
The reason for this is because fsstress is running fsync sometimes, and because we're messing with block groups we often fall through to btrfs_commit_transaction, so will often have 20-30 threads all calling btrfs_commit_transaction at the same time.
These all get stuck contending on the extent tree while they try to run delayed refs during the initial part of the commit.
This is suboptimal, really because the extent tree is a single point of failure we only want one thread acting on that tree at once to reduce lock contention.
Fix this by making the flushing mechanism a bit operation, to make it easy to use test_and_set_bit() in order to make sure only one task does this initial flush.
Once we're into the transaction commit we only have one thread doing delayed ref running, it's just this initial pre-flush that is problematic. With this patch my stress test takes around 90 minutes to run, instead of 12 hours.
The memory barrier is not necessary for the flushing bit as it's ordered, unlike plain int. The transaction state accessed in btrfs_should_end_transaction could be affected by that too as it's not always used under transaction lock. Upon Nikolay's analysis in [1] it's not necessary:
In should_end_transaction it's read without holding any locks. (U)
It's modified in btrfs_cleanup_transaction without holding the fs_info->trans_lock (U), but the STATE_ERROR flag is going to be set.
set in cleanup_transaction under fs_info->trans_lock (L) set in btrfs_commit_trans to COMMIT_START under fs_info->trans_lock.(L) set in btrfs_commit_trans to COMMIT_DOING under fs_info->trans_lock.(L) set in btrfs_commit_trans to COMMIT_UNBLOCK under fs_info->trans_lock.(L)
set in btrfs_commit_trans to COMMIT_COMPLETED without locks but at this point the transaction is finished and fs_info->running_trans is NULL (U but irrelevant).
So by the looks of it we can have a concurrent READ race with a WRITE, due to reads not taking a lock. In this case what we want to ensure is we either see new or old state. I consulted with Will Deacon and he said that in such a case we'd want to annotate the accesses to ->state with (READ|WRITE)_ONCE so as to avoid a theoretical tear, in this case I don't think this could happen but I imagine at some point KCSAN would flag such an access as racy (which it is).
[1] https://lore.kernel.org/linux-btrfs/e1fd5cc1-0f28-f670-69f4-e9958b4964e6@suse.com
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ add comments regarding memory barrier ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
2187374f |
| 15-Jan-2021 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
Currently we pass things around to figure out if we maybe freeing data based on the state of the delayed refs head. This m
btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
Currently we pass things around to figure out if we maybe freeing data based on the state of the delayed refs head. This makes the accounting sort of confusing and hard to follow, as it's distinctly separate from the delayed ref heads stuff, but also depends on it entirely.
Fix this by explicitly adjusting the space_info->total_bytes_pinned in the delayed refs code. We now have two places where we modify this counter, once where we create the delayed and destroy the delayed refs, and once when we pin and unpin the extents. This means there is a slight overlap between delayed refs and the pin/unpin mechanisms, but this is simply used by the ENOSPC infrastructure to determine if we need to commit the transaction, so there's no adverse affect from this, we might simply commit thinking it will give us enough space when it might not.
CC: stable@vger.kernel.org # 5.10 Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
7ec1536e |
| 15-Jan-2021 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
commit 2187374f35fe9cadbddaa9fcf0c4121365d914e8 upstream.
Currently we pass things around to figure out if we maybe freein
btrfs: handle space_info::total_bytes_pinned inside the delayed ref itself
commit 2187374f35fe9cadbddaa9fcf0c4121365d914e8 upstream.
Currently we pass things around to figure out if we maybe freeing data based on the state of the delayed refs head. This makes the accounting sort of confusing and hard to follow, as it's distinctly separate from the delayed ref heads stuff, but also depends on it entirely.
Fix this by explicitly adjusting the space_info->total_bytes_pinned in the delayed refs code. We now have two places where we modify this counter, once where we create the delayed and destroy the delayed refs, and once when we pin and unpin the extents. This means there is a slight overlap between delayed refs and the pin/unpin mechanisms, but this is simply used by the ENOSPC infrastructure to determine if we need to commit the transaction, so there's no adverse affect from this, we might simply commit thinking it will give us enough space when it might not.
CC: stable@vger.kernel.org # 5.10 Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
6ef03deb |
| 19-Jun-2019 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: migrate the delayed refs rsv code
These belong with the delayed refs related code, not in extent-tree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterb
btrfs: migrate the delayed refs rsv code
These belong with the delayed refs related code, not in extent-tree.c.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
c6e340bc |
| 20-Mar-2019 |
David Sterba <dsterba@suse.com> |
btrfs: remove unused parameter fs_info from btrfs_add_delayed_extent_op
Signed-off-by: David Sterba <dsterba@suse.com>
|
#
76675593 |
| 04-Apr-2019 |
Qu Wenruo <wqu@suse.com> |
btrfs: delayed-ref: Use btrfs_ref to refactor btrfs_add_delayed_data_ref()
Just like btrfs_add_delayed_tree_ref(), use btrfs_ref to refactor btrfs_add_delayed_data_ref().
Signed-off-by: Qu Wenruo <
btrfs: delayed-ref: Use btrfs_ref to refactor btrfs_add_delayed_data_ref()
Just like btrfs_add_delayed_tree_ref(), use btrfs_ref to refactor btrfs_add_delayed_data_ref().
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
ed4f255b |
| 04-Apr-2019 |
Qu Wenruo <wqu@suse.com> |
btrfs: delayed-ref: Use btrfs_ref to refactor btrfs_add_delayed_tree_ref()
btrfs_add_delayed_tree_ref() has a longer and longer parameter list, and some callers like btrfs_inc_extent_ref() are using
btrfs: delayed-ref: Use btrfs_ref to refactor btrfs_add_delayed_tree_ref()
btrfs_add_delayed_tree_ref() has a longer and longer parameter list, and some callers like btrfs_inc_extent_ref() are using @owner as level for delayed tree ref.
Instead of making the parameter list longer, use btrfs_ref to refactor it, so each parameter assignment should be self-explaining without dirty level/owner trick, and provides the basis for later refactoring.
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
b28b1f0c |
| 04-Apr-2019 |
Qu Wenruo <wqu@suse.com> |
btrfs: delayed-ref: Introduce better documented delayed ref structures
Current delayed ref interface has several problems:
- Longer and longer parameter lists bytenr num_bytes parent ------
btrfs: delayed-ref: Introduce better documented delayed ref structures
Current delayed ref interface has several problems:
- Longer and longer parameter lists bytenr num_bytes parent ---------- so far so good ref_root owner offset ---------- I don't feel good now
- Different interpretation of the same parameter
Above @owner for data ref is inode number (u64), while for tree ref, it's level (int).
They are even in different size range. For level we only need 0 ~ 8, while for ino it's BTRFS_FIRST_FREE_OBJECTID ~ BTRFS_LAST_FREE_OBJECTID.
And @offset doesn't even make sense for tree ref.
Such parameter reuse may look clever as an hidden union, but it destroys code readability.
To solve both problems, we introduce a new structure, btrfs_ref to solve them:
- Structure instead of long parameter list This makes later expansion easier, and is better documented.
- Use btrfs_ref::type to distinguish data and tree ref
- Use proper union to store data/tree ref specific structures.
- Use separate functions to fill data/tree ref data, with a common generic function to fill common bytenr/num_bytes members.
All parameters will find its place in btrfs_ref, and an extra member, @real_root, inspired by ref-verify code, is newly introduced for later qgroup code, to record which tree is triggered by this extent modification.
This patch doesn't touch any code, but provides the basis for further refactoring.
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25, v4.19.24, v4.19.23, v4.19.22, v4.19.21, v4.19.20, v4.19.19, v4.19.18 |
|
#
1418bae1 |
| 23-Jan-2019 |
Qu Wenruo <wqu@suse.com> |
btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record
[BUG] Btrfs/139 will fail with a high probability if the testing machine (VM) has only 2G RAM.
btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record
[BUG] Btrfs/139 will fail with a high probability if the testing machine (VM) has only 2G RAM.
Resulting the final write success while it should fail due to EDQUOT, and the fs will have quota exceeding the limit by 16K.
The simplified reproducer will be: (needs a 2G ram VM)
$ mkfs.btrfs -f $dev $ mount $dev $mnt
$ btrfs subv create $mnt/subv $ btrfs quota enable $mnt $ btrfs quota rescan -w $mnt $ btrfs qgroup limit -e 1G $mnt/subv
$ for i in $(seq -w 1 8); do xfs_io -f -c "pwrite 0 128M" $mnt/subv/file_$i > /dev/null echo "file $i written" > /dev/kmsg done $ sync $ btrfs qgroup show -pcre --raw $mnt
The last pwrite will not trigger EDQUOT and final 'qgroup show' will show something like:
qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/5 16384 16384 none none --- --- 0/256 1073758208 1073758208 none 1073741824 --- ---
And 1073758208 is larger than > 1073741824.
[CAUSE] It's a bug in btrfs qgroup data reserved space management.
For quota limit, we must ensure that: reserved (data + metadata) + rfer/excl <= limit
Since rfer/excl is only updated at transaction commmit time, reserved space needs to be taken special care.
One important part of reserved space is data, and for a new data extent written to disk, we still need to take the reserved space until rfer/excl numbers get updated.
Originally when an ordered extent finishes, we migrate the reserved qgroup data space from extent_io tree to delayed ref head of the data extent, expecting delayed ref will only be cleaned up at commit transaction time.
However for small RAM machine, due to memory pressure dirty pages can be flushed back to disk without committing a transaction.
The related events will be something like:
file 1 written btrfs_finish_ordered_io: ino=258 ordered offset=0 len=54947840 btrfs_finish_ordered_io: ino=258 ordered offset=54947840 len=5636096 btrfs_finish_ordered_io: ino=258 ordered offset=61153280 len=57344 btrfs_finish_ordered_io: ino=258 ordered offset=61210624 len=8192 btrfs_finish_ordered_io: ino=258 ordered offset=60583936 len=569344 cleanup_ref_head: num_bytes=54947840 cleanup_ref_head: num_bytes=5636096 cleanup_ref_head: num_bytes=569344 cleanup_ref_head: num_bytes=57344 cleanup_ref_head: num_bytes=8192 ^^^^^^^^^^^^^^^^ This will free qgroup data reserved space file 2 written ... file 8 written cleanup_ref_head: num_bytes=8192 ... btrfs_commit_transaction <<< the only transaction committed during the test
When file 2 is written, we have already freed 128M reserved qgroup data space for ino 258. Thus later write won't trigger EDQUOT.
This allows us to write more data beyond qgroup limit.
In my 2G ram VM, it could reach about 1.2G before hitting EDQUOT.
[FIX] By moving reserved qgroup data space from btrfs_delayed_ref_head to btrfs_qgroup_extent_record, we can ensure that reserved qgroup data space won't be freed half way before commit transaction, thus fix the problem.
Fixes: f64d5ca86821 ("btrfs: delayed_ref: Add new function to record reserved space into delayed ref") Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7 |
|
#
d7baffda |
| 03-Dec-2018 |
Josef Bacik <jbacik@fb.com> |
btrfs: add btrfs_delete_ref_head helper
We do this dance in cleanup_ref_head and check_ref_cleanup, unify it into a helper and cleanup the calling functions.
Reviewed-by: Omar Sandoval <osandov@fb.
btrfs: add btrfs_delete_ref_head helper
We do this dance in cleanup_ref_head and check_ref_cleanup, unify it into a helper and cleanup the calling functions.
Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15, v4.18.14 |
|
#
9e920a6f |
| 11-Oct-2018 |
Lu Fengqi <lufq.fnst@cn.fujitsu.com> |
btrfs: delayed-ref: pass delayed_refs directly to btrfs_delayed_ref_lock
Since trans is only used for referring to delayed_refs, there is no need to pass it instead of delayed_refs to btrfs_delayed_
btrfs: delayed-ref: pass delayed_refs directly to btrfs_delayed_ref_lock
Since trans is only used for referring to delayed_refs, there is no need to pass it instead of delayed_refs to btrfs_delayed_ref_lock().
No functional change.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5637c74b |
| 11-Oct-2018 |
Lu Fengqi <lufq.fnst@cn.fujitsu.com> |
btrfs: delayed-ref: pass delayed_refs directly to btrfs_select_ref_head
Since trans is only used for referring to delayed_refs, there is no need to pass it instead of delayed_refs to btrfs_select_re
btrfs: delayed-ref: pass delayed_refs directly to btrfs_select_ref_head
Since trans is only used for referring to delayed_refs, there is no need to pass it instead of delayed_refs to btrfs_select_ref_head(). No functional change.
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.18.13, v4.18.12, v4.18.11, v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5 |
|
#
e3d03965 |
| 22-Aug-2018 |
Liu Bo <bo.liu@linux.alibaba.com> |
Btrfs: delayed-refs: use rb_first_cached for ref_tree
rb_first_cached() trades an extra pointer "leftmost" for doing the same job as rb_first() but in O(1).
Functions manipulating href->ref_tree ne
Btrfs: delayed-refs: use rb_first_cached for ref_tree
rb_first_cached() trades an extra pointer "leftmost" for doing the same job as rb_first() but in O(1).
Functions manipulating href->ref_tree need to get the first entry, this converts href->ref_tree to use rb_first_cached().
For more details about the optimization see patch "Btrfs: delayed-refs: use rb_first_cached for href_root".
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5c9d028b |
| 22-Aug-2018 |
Liu Bo <bo.liu@linux.alibaba.com> |
Btrfs: delayed-refs: use rb_first_cached for href_root
rb_first_cached() trades an extra pointer "leftmost" for doing the same job as rb_first() but in O(1).
Functions manipulating href_root need t
Btrfs: delayed-refs: use rb_first_cached for href_root
rb_first_cached() trades an extra pointer "leftmost" for doing the same job as rb_first() but in O(1).
Functions manipulating href_root need to get the first entry, this converts href_root to use rb_first_cached().
This patch is first in the sequenct of similar updates to other rbtrees and this is analysis of the expected behaviour and improvements.
There's a common pattern:
while (node = rb_first) { entry = rb_entry(node) next = rb_next(node) rb_erase(node) cleanup(entry) }
rb_first needs to traverse the tree up to logN depth, rb_erase can completely reshuffle the tree. With the caching we'll skip the traversal in rb_first. That's a cached memory access vs looped pointer dereference trade-off that IMHO has a clear winner.
Measurements show there's not much difference in a sample tree with 10000 nodes: 4.5s / rb_first and 4.8s / rb_first_cached. Real effects of caching and pointer chasing are unpredictable though.
Further optimzations can be done to avoid the expensive rb_erase step. In some cases it's ok to process the nodes in any order, so the tree can be traversed in post-order, not rebalancing the children nodes and just calling free. Care must be taken regarding the next node.
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog from mail discussions ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3 |
|
#
88a979c6 |
| 20-Jun-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Remove fs_info from btrfs_add_delayed_data_ref
This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes.
Signed-off-by: Niko
btrfs: Remove fs_info from btrfs_add_delayed_data_ref
This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
44e1c47d |
| 20-Jun-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Remove fs_info from btrfs_add_delayed_tree_ref
This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes.
Signed-off-by: Niko
btrfs: Remove fs_info from btrfs_add_delayed_tree_ref
This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.17.2, v4.17.1, v4.17 |
|
#
be97f133 |
| 19-Apr-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Drop fs_info parameter from btrfs_merge_delayed_refs
It's provided by the transaction handle.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
41d0bd3b |
| 04-Apr-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Drop delayed_refs argument from btrfs_check_delayed_seq
It's used to print its pointer in a debug statement but doesn't really bring any useful information to the error message.
Signed-off-b
btrfs: Drop delayed_refs argument from btrfs_check_delayed_seq
It's used to print its pointer in a debug statement but doesn't really bring any useful information to the error message.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5e388e95 |
| 18-Apr-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Fix race condition between delayed refs and blockgroup removal
When the delayed refs for a head are all run, eventually cleanup_ref_head is called which (in case of deletion) obtains a refere
btrfs: Fix race condition between delayed refs and blockgroup removal
When the delayed refs for a head are all run, eventually cleanup_ref_head is called which (in case of deletion) obtains a reference for the relevant btrfs_space_info struct by querying the bg for the range. This is problematic because when the last extent of a bg is deleted a race window emerges between removal of that bg and the subsequent invocation of cleanup_ref_head. This can result in cache being null and either a null pointer dereference or assertion failure.
task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000 RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs] RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292 RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8 RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001 R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0 R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0 FS: 00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs] btrfs_run_delayed_refs+0x68/0x250 [btrfs] btrfs_should_end_transaction+0x42/0x60 [btrfs] btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs] btrfs_evict_inode+0x4c6/0x5c0 [btrfs] evict+0xc6/0x190 do_unlinkat+0x19c/0x300 do_syscall_64+0x74/0x140 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fbf589c57a7
To fix this, introduce a new flag "is_system" to head_ref structs, which is populated at insertion time. This allows to decouple the querying for the spaceinfo from querying the possibly deleted bg.
Fixes: d7eae3403f46 ("Btrfs: rework delayed ref total_bytes_pinned accounting") CC: stable@vger.kernel.org # 4.14+ Suggested-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
9888c340 |
| 03-Apr-2018 |
David Sterba <dsterba@suse.com> |
btrfs: replace GPL boilerplate by SPDX -- headers
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX h
btrfs: replace GPL boilerplate by SPDX -- headers
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header.
Unify the include protection macros to match the file names.
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.16 |
|
#
e67c718b |
| 19-Feb-2018 |
David Sterba <dsterba@suse.com> |
btrfs: add more __cold annotations
The __cold functions are placed to a special section, as they're expected to be called rarely. This could help i-cache prefetches or help compiler to decide which
btrfs: add more __cold annotations
The __cold functions are placed to a special section, as they're expected to be called rarely. This could help i-cache prefetches or help compiler to decide which branches are more/less likely to be taken without any other annotations needed.
Though we can't add more __exit annotations, it's still possible to add __cold (that's also added with __exit). That way the following function categories are tagged:
- printf wrappers, error messages - exit helpers
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|