#
1fc28d8e |
| 17-May-2018 |
Liu Bo <bo.liu@linux.alibaba.com> |
Btrfs: move get root out of btrfs_search_slot to a helper
It's good to have a helper instead of having all get-root details open-coded. The new helper locks (if necessary) and sets root node of the
Btrfs: move get root out of btrfs_search_slot to a helper
It's good to have a helper instead of having all get-root details open-coded. The new helper locks (if necessary) and sets root node of the path.
Also invert the checks to make the code flow easier to read. There is no functional change in this commit.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
e6a1d6fd |
| 17-May-2018 |
Liu Bo <bo.liu@linux.alibaba.com> |
Btrfs: use more straightforward extent_buffer_uptodate check
If parent_transid "0" is passed to btrfs_buffer_uptodate(), btrfs_buffer_uptodate() is equivalent to extent_buffer_uptodate(), but extent
Btrfs: use more straightforward extent_buffer_uptodate check
If parent_transid "0" is passed to btrfs_buffer_uptodate(), btrfs_buffer_uptodate() is equivalent to extent_buffer_uptodate(), but extent_buffer_uptodate() is preferred since we don't have to look into verify_parent_transid().
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
ca19b4a6 |
| 17-May-2018 |
Liu Bo <bo.liu@linux.alibaba.com> |
Btrfs: remove superfluous free_extent_buffer in read_block_for_search
read_block_for_search() can be simplified as:
tmp = find_extent_buffer(); if (tmp) return;
...
free_extent_buffer(); read_
Btrfs: remove superfluous free_extent_buffer in read_block_for_search
read_block_for_search() can be simplified as:
tmp = find_extent_buffer(); if (tmp) return;
...
free_extent_buffer(); read_tree_block();
Apparently, @tmp must be NULL at this point, free_extent_buffer() is not needed.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
02a3307a |
| 15-May-2018 |
Liu Bo <bo.liu@linux.alibaba.com> |
btrfs: fix reading stale metadata blocks after degraded raid1 mounts
If a btree block, aka. extent buffer, is not available in the extent buffer cache, it'll be read out from the disk instead, i.e.
btrfs: fix reading stale metadata blocks after degraded raid1 mounts
If a btree block, aka. extent buffer, is not available in the extent buffer cache, it'll be read out from the disk instead, i.e.
btrfs_search_slot() read_block_for_search() # hold parent and its lock, go to read child btrfs_release_path() read_tree_block() # read child
Unfortunately, the parent lock got released before reading child, so commit 5bdd3536cbbe ("Btrfs: Fix block generation verification race") had used 0 as parent transid to read the child block. It forces read_tree_block() not to check if parent transid is different with the generation id of the child that it reads out from disk.
A simple PoC is included in btrfs/124,
0. A two-disk raid1 btrfs,
1. Right after mkfs.btrfs, block A is allocated to be device tree's root.
2. Mount this filesystem and put it in use, after a while, device tree's root got COW but block A hasn't been allocated/overwritten yet.
3. Umount it and reload the btrfs module to remove both disks from the global @fs_devices list.
4. mount -odegraded dev1 and write some data, so now block A is allocated to be a leaf in checksum tree. Note that only dev1 has the latest metadata of this filesystem.
5. Umount it and mount it again normally (with both disks), since raid1 can pick up one disk by the writer task's pid, if btrfs_search_slot() needs to read block A, dev2 which does NOT have the latest metadata might be read for block A, then we got a stale block A.
6. As parent transid is not checked, block A is marked as uptodate and put into the extent buffer cache, so the future search won't bother to read disk again, which means it'll make changes on this stale one and make it dirty and flush it onto disk.
To avoid the problem, parent transid needs to be passed to read_tree_block().
In order to get a valid parent transid, we need to hold the parent's lock until finishing reading child.
This patch needs to be slightly adapted for stable kernels, the &first_key parameter added to read_tree_block() is from 4.16+ (581c1760415c4). The fix is to replace 0 by 'gen'.
Fixes: 5bdd3536cbbe ("Btrfs: Fix block generation verification race") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
6f2f0b39 |
| 13-May-2018 |
Robbie Ko <robbieko@synology.com> |
Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting
[BUG] btrfs incremental send BUG happens when creating a snapshot of snapshot that is being used by send.
[REASON] The
Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting
[BUG] btrfs incremental send BUG happens when creating a snapshot of snapshot that is being used by send.
[REASON] The problem can happen if while we are doing a send one of the snapshots used (parent or send) is snapshotted, because snapshoting implies COWing the root of the source subvolume/snapshot.
1. When doing an incremental send, the send process will get the commit roots from the parent and send snapshots, and add references to them through extent_buffer_get().
2. When a snapshot/subvolume is snapshotted, its root node is COWed (transaction.c:create_pending_snapshot()).
3. COWing releases the space used by the node immediately, through:
__btrfs_cow_block() --btrfs_free_tree_block() ----btrfs_add_free_space(bytenr of node)
4. Because send doesn't hold a transaction open, it's possible that the transaction used to create the snapshot commits, switches the commit root and the old space used by the previous root node gets assigned to some other node allocation. Allocation of a new node will use the existing extent buffer found in memory, which we previously got a reference through extent_buffer_get(), and allow the extent buffer's content (pages) to be modified:
btrfs_alloc_tree_block --btrfs_reserve_extent ----find_free_extent (get bytenr of old node) --btrfs_init_new_buffer (use bytenr of old node) ----btrfs_find_create_tree_block ------alloc_extent_buffer --------find_extent_buffer (get old node)
5. So send can access invalid memory content and have unpredictable behaviour.
[FIX] So we fix the problem by copying the commit roots of the send and parent snapshots and use those copies.
CallTrace looks like this: ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:1861! invalid opcode: 0000 [#1] SMP CPU: 6 PID: 24235 Comm: btrfs Tainted: P O 3.10.105 #23721 ffff88046652d680 ti: ffff88041b720000 task.ti: ffff88041b720000 RIP: 0010:[<ffffffffa08dd0e8>] read_node_slot+0x108/0x110 [btrfs] RSP: 0018:ffff88041b723b68 EFLAGS: 00010246 RAX: ffff88043ca6b000 RBX: ffff88041b723c50 RCX: ffff880000000000 RDX: 000000000000004c RSI: ffff880314b133f8 RDI: ffff880458b24000 RBP: 0000000000000000 R08: 0000000000000001 R09: ffff88041b723c66 R10: 0000000000000001 R11: 0000000000001000 R12: ffff8803f3e48890 R13: ffff8803f3e48880 R14: ffff880466351800 R15: 0000000000000001 FS: 00007f8c321dc8c0(0000) GS:ffff88047fcc0000(0000) CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 R2: 00007efd1006d000 CR3: 0000000213a24000 CR4: 00000000003407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff88041b723c50 ffff8803f3e48880 ffff8803f3e48890 ffff8803f3e48880 ffff880466351800 0000000000000001 ffffffffa08dd9d7 ffff88041b723c50 ffff8803f3e48880 ffff88041b723c66 ffffffffa08dde85 a9ff88042d2c4400 Call Trace: [<ffffffffa08dd9d7>] ? tree_move_down.isra.33+0x27/0x50 [btrfs] [<ffffffffa08dde85>] ? tree_advance+0xb5/0xc0 [btrfs] [<ffffffffa08e83d4>] ? btrfs_compare_trees+0x2d4/0x760 [btrfs] [<ffffffffa0982050>] ? finish_inode_if_needed+0x870/0x870 [btrfs] [<ffffffffa09841ea>] ? btrfs_ioctl_send+0xeda/0x1050 [btrfs] [<ffffffffa094bd3d>] ? btrfs_ioctl+0x1e3d/0x33f0 [btrfs] [<ffffffff81111133>] ? handle_pte_fault+0x373/0x990 [<ffffffff8153a096>] ? atomic_notifier_call_chain+0x16/0x20 [<ffffffff81063256>] ? set_task_cpu+0xb6/0x1d0 [<ffffffff811122c3>] ? handle_mm_fault+0x143/0x2a0 [<ffffffff81539cc0>] ? __do_page_fault+0x1d0/0x500 [<ffffffff81062f07>] ? check_preempt_curr+0x57/0x90 [<ffffffff8115075a>] ? do_vfs_ioctl+0x4aa/0x990 [<ffffffff81034f83>] ? do_fork+0x113/0x3b0 [<ffffffff812dd7d7>] ? trace_hardirqs_off_thunk+0x3a/0x6c [<ffffffff81150cc8>] ? SyS_ioctl+0x88/0xa0 [<ffffffff8153e422>] ? system_call_fastpath+0x16/0x1b ---[ end trace 29576629ee80b2e1 ]---
Fixes: 7069830a9e38 ("Btrfs: add btrfs_compare_trees function") CC: stable@vger.kernel.org # 3.6+ Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
c1d7c514 |
| 03-Apr-2018 |
David Sterba <dsterba@suse.com> |
btrfs: replace GPL boilerplate by SPDX -- sources
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX h
btrfs: replace GPL boilerplate by SPDX -- sources
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header.
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.16 |
|
#
d1980131 |
| 15-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: update barrier in should_cow_block
Once there was a simple int force_cow that was used with the plain barriers, and then converted to a bit, so we should use the appropriate barrier helper.
btrfs: update barrier in should_cow_block
Once there was a simple int force_cow that was used with the plain barriers, and then converted to a bit, so we should use the appropriate barrier helper.
Other variables in the complex if condition do not depend on a barrier, so we should be fine in case the atomic barrier becomes a no-op.
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
581c1760 |
| 28-Mar-2018 |
Qu Wenruo <wqu@suse.com> |
btrfs: Validate child tree block's level and first key
We have several reports about node pointer points to incorrect child tree blocks, which could have even wrong owner and level but still with va
btrfs: Validate child tree block's level and first key
We have several reports about node pointer points to incorrect child tree blocks, which could have even wrong owner and level but still with valid generation and checksum.
Although btrfs check could handle it and print error message like: leaf parent key incorrect 60670574592
Kernel doesn't have enough check on this type of corruption correctly. At least add such check to read_tree_block() and btrfs_read_buffer(), where we need two new parameters @level and @first_key to verify the child tree block.
The new @level check is mandatory and all call sites are already modified to extract expected level from its call chain.
While @first_key is optional, the following call sites are skipping such check: 1) Root node/leaf As ROOT_ITEM doesn't contain the first key, skip @first_key check. 2) Direct backref Only parent bytenr and level is known and we need to resolve the key all by ourselves, skip @first_key check.
Another note of this verification is, it needs extra info from nodeptr or ROOT_ITEM, so it can't fit into current tree-checker framework, which is limited to node/leaf boundary.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
d9d19a01 |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: kill tree_mod_log_set_root_pointer helper
A useless wrapper around tree_mod_log_insert_root that hides missing error handling. Move it to the callers.
Reviewed-by: Nikolay Borisov <nborisov@
btrfs: kill tree_mod_log_set_root_pointer helper
A useless wrapper around tree_mod_log_insert_root that hides missing error handling. Move it to the callers.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
0e82bcfe |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: kill tree_mod_log_set_node_key helper
A trivial wrapper that can be simply opencoded and makes the GFP allocation request more visible. The error handling is now moved to the callers.
Review
btrfs: kill tree_mod_log_set_node_key helper
A trivial wrapper that can be simply opencoded and makes the GFP allocation request more visible. The error handling is now moved to the callers.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
bf1d3425 |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: kill trivial wrapper tree_mod_log_eb_move
The wrapper is effectively an alias for tree_mod_log_insert_move but also hides the missing error handling. To make that more visible, lift the BUG_O
btrfs: kill trivial wrapper tree_mod_log_eb_move
The wrapper is effectively an alias for tree_mod_log_insert_move but also hides the missing error handling. To make that more visible, lift the BUG_ON to the callers.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
b1a09f1e |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: remove trivial locking wrappers of tree mod log
The wrappers are trivial and do not bring any extra value on top of the plain locking primitives.
Reviewed-by: Nikolay Borisov <nborisov@suse.
btrfs: remove trivial locking wrappers of tree mod log
The wrappers are trivial and do not bring any extra value on top of the plain locking primitives.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
bcd24dab |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: drop fs_info parameter from __tree_mod_log_oldest_root
It's provided by the extent_buffer.
Signed-off-by: David Sterba <dsterba@suse.com>
|
#
b6dfa35b |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: embed tree_mod_move structure to tree_mod_elem
The tree_mod_move is not used anywhere and can be embedded as anonymous structure.
Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a446a979 |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: drop unused fs_info parameter from tree_mod_log_eb_move
Signed-off-by: David Sterba <dsterba@suse.com>
|
#
95b757c1 |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: drop fs_info parameter from tree_mod_log_free_eb
It's provided by the extent_buffer.
Signed-off-by: David Sterba <dsterba@suse.com>
|
#
db7279a2 |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: drop fs_info parameter from tree_mod_log_free_eb
It's provided by the extent_buffer.
Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e09c2efe |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: drop fs_info parameter from tree_mod_log_insert_key
It's provided by the extent_buffer.
Signed-off-by: David Sterba <dsterba@suse.com>
|
#
6074d45f |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: drop fs_info parameter from tree_mod_log_insert_move
It's provided by the extent_buffer.
Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3ac6de1a |
| 05-Mar-2018 |
David Sterba <dsterba@suse.com> |
btrfs: drop fs_info parameter from tree_mod_log_set_node_key
It's provided by the extent_buffer.
Signed-off-by: David Sterba <dsterba@suse.com>
|
#
448f3a17 |
| 27-Feb-2018 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Remove redundant comment from btrfs_search_forward
This function always sets keep_locks to 1 and saves the old value of keep_locks which is restored at the end. So there is no way it can be c
btrfs: Remove redundant comment from btrfs_search_forward
This function always sets keep_locks to 1 and saves the old value of keep_locks which is restored at the end. So there is no way it can be called without keep_locks being set. Remove comment imposing redundant requirement on callers.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.15 |
|
#
4271ecea |
| 13-Dec-2017 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Improve btrfs_search_slot description
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
a74b35ec |
| 08-Dec-2017 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Rename bin_search -> btrfs_bin_search
Currently there are 2 function doing binary search on btrfs nodes: bin_search and btrfs_bin_search. The latter being a simple wrapper for the former. So
btrfs: Rename bin_search -> btrfs_bin_search
Currently there are 2 function doing binary search on btrfs nodes: bin_search and btrfs_bin_search. The latter being a simple wrapper for the former. So eliminate the wrapper and just rename bin_search to btrfs_bin_search. No functional changes
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
9ea2c7c9 |
| 12-Dec-2017 |
Nikolay Borisov <nborisov@suse.com> |
btrfs: Fix out of bounds access in btrfs_search_slot
When modifying a tree where the root is at BTRFS_MAX_LEVEL - 1 then the level variable is going to be 7 (this is the max height of the tree). On
btrfs: Fix out of bounds access in btrfs_search_slot
When modifying a tree where the root is at BTRFS_MAX_LEVEL - 1 then the level variable is going to be 7 (this is the max height of the tree). On the other hand btrfs_cow_block is always called with "level + 1" as an index into the nodes and slots arrays. This leads to an out of bounds access. Admittdely this will be benign since an OOB access of the nodes array will likely read the 0th element from the slots array, which in this case is going to be 0 (since we start CoW at the top of the tree). The OOB access into the slots array in turn will read the 0th and 1st values of the locks array, which would both be 0 at the time. However, this benign behavior relies on the fact that the path being passed hasn't been initialised, if it has already been used to query a btree then it could potentially have populated the nodes/slots arrays.
Fix it by explicitly checking if we are at level 7 (the maximum allowed index in nodes/slots arrays) and explicitly call the CoW routine with NULL for parent's node/slot.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Fixes-coverity-id: 711515 Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v4.13.16 |
|
#
692826b2 |
| 21-Nov-2017 |
Jeff Mahoney <jeffm@suse.com> |
btrfs: handle errors while updating refcounts in update_ref_for_cow
Since commit fb235dc06fa (btrfs: qgroup: Move half of the qgroup accounting time out of commit trans) the assumption that btrfs_ad
btrfs: handle errors while updating refcounts in update_ref_for_cow
Since commit fb235dc06fa (btrfs: qgroup: Move half of the qgroup accounting time out of commit trans) the assumption that btrfs_add_delayed_{data,tree}_ref can only return 0 or -ENOMEM has been false. The qgroup operations call into btrfs_search_slot and friends and can now return the full spectrum of error codes.
Fortunately, the fix here is easy since update_ref_for_cow failing is already handled so we just need to bail early with the error code.
Fixes: fb235dc06fa (btrfs: qgroup: Move half of the qgroup accounting ...) Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Edmund Nadolski <enadolski@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|