Revision tags: v3.18-rc4, v3.18-rc3 |
|
#
084b6e7c |
| 30-Oct-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Fix a lockdep warning when running xfstest.
The following lockdep warning is triggered during xfstests:
[ 1702.980872] ========================================================= [ 1702.981181
btrfs: Fix a lockdep warning when running xfstest.
The following lockdep warning is triggered during xfstests:
[ 1702.980872] ========================================================= [ 1702.981181] [ INFO: possible irq lock inversion dependency detected ] [ 1702.981482] 3.18.0-rc1 #27 Not tainted [ 1702.981781] --------------------------------------------------------- [ 1702.982095] kswapd0/77 just changed the state of lock: [ 1702.982415] (&delayed_node->mutex){+.+.-.}, at: [<ffffffffa03b0b51>] __btrfs_release_delayed_node+0x41/0x1f0 [btrfs] [ 1702.982794] but this lock took another, RECLAIM_FS-unsafe lock in the past: [ 1702.983160] (&fs_info->dev_replace.lock){+.+.+.}
and interrupts could create inverse lock ordering between them.
[ 1702.984675] other info that might help us debug this: [ 1702.985524] Chain exists of: &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock
[ 1702.986799] Possible interrupt unsafe locking scenario:
[ 1702.987681] CPU0 CPU1 [ 1702.988137] ---- ---- [ 1702.988598] lock(&fs_info->dev_replace.lock); [ 1702.989069] local_irq_disable(); [ 1702.989534] lock(&delayed_node->mutex); [ 1702.990038] lock(&found->groups_sem); [ 1702.990494] <Interrupt> [ 1702.990938] lock(&delayed_node->mutex); [ 1702.991407] *** DEADLOCK ***
It is because the btrfs_kobj_{add/rm}_device() will call memory allocation with GFP_KERNEL, which may flush fs page cache to free space, waiting for it self to do the commit, causing the deadlock.
To solve the problem, move btrfs_kobj_{add/rm}_device() out of the dev_replace lock range, also involing split the btrfs_rm_dev_replace_srcdev() function into remove and free parts.
Now only btrfs_rm_dev_replace_remove_srcdev() is called in dev_replace lock range, and kobj_{add/rm} and btrfs_rm_dev_replace_free_srcdev() are called out of the lock range.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
Revision tags: v3.18-rc2, v3.18-rc1 |
|
#
2fc9f6ba |
| 12-Oct-2014 |
Eryu Guan <guaneryu@gmail.com> |
Btrfs: return failure if btrfs_dev_replace_finishing() failed
device replace could fail due to another running scrub process or any other errors btrfs_scrub_dev() may hit, but this failure doesn't g
Btrfs: return failure if btrfs_dev_replace_finishing() failed
device replace could fail due to another running scrub process or any other errors btrfs_scrub_dev() may hit, but this failure doesn't get returned to userspace.
The following steps could reproduce this issue
mkfs -t btrfs -f /dev/sdb1 /dev/sdb2 mount /dev/sdb1 /mnt/btrfs while true; do btrfs scrub start -B /mnt/btrfs >/dev/null 2>&1; done & btrfs replace start -Bf /dev/sdb2 /dev/sdb3 /mnt/btrfs # if this replace succeeded, do the following and repeat until # you see this log in dmesg # BTRFS: btrfs_scrub_dev(/dev/sdb2, 2, /dev/sdb3) failed -115 #btrfs replace start -Bf /dev/sdb3 /dev/sdb2 /mnt/btrfs
# once you see the error log in dmesg, check return value of # replace echo $?
Introduce a new dev replace result
BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS
to catch -EINPROGRESS explicitly and return other errors directly to userspace.
Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
Revision tags: v3.17, v3.17-rc7, v3.17-rc6, v3.17-rc5, v3.17-rc4 |
|
#
82372bc8 |
| 03-Sep-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: make the logic of source device removing more clear
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
67a2c45e |
| 03-Sep-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix use-after-free problem of the device during device replace
The problem is: Task0(device scan task) Task1(device replace task) scan_one_device() mutex_lock(&uuid_mutex) device = find_
Btrfs: fix use-after-free problem of the device during device replace
The problem is: Task0(device scan task) Task1(device replace task) scan_one_device() mutex_lock(&uuid_mutex) device = find_device() mutex_lock(&device_list_mutex) lock_chunk() rm_and_free_source_device unlock_chunk() mutex_unlock(&device_list_mutex) check device
Destroying the target device if device replace fails also has the same problem.
We fix this problem by locking uuid_mutex during destroying source device or target device, just like the device remove operation.
It is a temporary solution, we can fix this problem and make the code more clear by atomic counter in the future.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
#
2196d6e8 |
| 03-Sep-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: Fix misuse of chunk mutex
There were several problems about chunk mutex usage: - Lock chunk mutex when updating metadata. It would cause the nested deadlock because updating metadata might
Btrfs: Fix misuse of chunk mutex
There were several problems about chunk mutex usage: - Lock chunk mutex when updating metadata. It would cause the nested deadlock because updating metadata might need allocate new chunks that need acquire chunk mutex. We remove chunk mutex at this case, because b-tree lock and other lock mechanism can help us. - ABBA deadlock occured between device_list_mutex and chunk_mutex. When we update device status, we must acquire device_list_mutex at the beginning, and then we might get chunk_mutex during the device status update because we need allocate new chunks for metadata COW. But at most place, we acquire chunk_mutex at first and then acquire device list mutex. We need change the lock order. - Some place we needn't acquire chunk_mutex. For example we needn't get chunk_mutex when we free a empty seed fs_devices structure.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
#
7cc8e58d |
| 03-Sep-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix unprotected device's variants on 32bits machine
->total_bytes,->disk_total_bytes,->bytes_used is protected by chunk lock when we change them, but sometimes we read them without any lock,
Btrfs: fix unprotected device's variants on 32bits machine
->total_bytes,->disk_total_bytes,->bytes_used is protected by chunk lock when we change them, but sometimes we read them without any lock, and we might get unexpected value. We fix this problem like inode's i_size.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
#
ce7213c7 |
| 03-Sep-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix wrong device bytes_used in the super block
device->bytes_used will be changed when allocating a new chunk, and disk_total_size will be changed if resizing is successful. Meanwhile, the on
Btrfs: fix wrong device bytes_used in the super block
device->bytes_used will be changed when allocating a new chunk, and disk_total_size will be changed if resizing is successful. Meanwhile, the on-disk super blocks of the previous transaction might not be updated. Considering the consistency of the metadata in the previous transaction, We should use the size in the previous transaction to check if the super block is beyond the boundary of the device.
Though it is not big problem because we don't use it now, but anyway it is better that we make it be consistent with the common metadata, maybe we will use it in the future.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
#
935e5cc9 |
| 03-Sep-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix wrong disk size when writing super blocks
total_size will be changed when resizing a device, and disk_total_size will be changed if resizing is successful. Meanwhile, the on-disk super bl
Btrfs: fix wrong disk size when writing super blocks
total_size will be changed when resizing a device, and disk_total_size will be changed if resizing is successful. Meanwhile, the on-disk super blocks of the previous transaction might not be updated. Considering the consistency of the metadata in the previous transaction, We should use the size in the previous transaction to check if the super block is beyond the boundary of the device. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
#
1c43366d |
| 03-Sep-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix unprotected assignment of the target device
We didn't protect the assignment of the target device, it might cause the problem that the super block update was skipped because we might find
Btrfs: fix unprotected assignment of the target device
We didn't protect the assignment of the target device, it might cause the problem that the super block update was skipped because we might find wrong size of the target device during the assignment. Fix it by moving the assignment sentences into the initialization function of the target device. And there is another merit that we can check if the target device is suitable more early.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
#
c7662111 |
| 03-Sep-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: cleanup double assignment of device->bytes_used when device replace finishes
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
|
Revision tags: v3.17-rc3, v3.17-rc2 |
|
#
12b894cb |
| 20-Aug-2014 |
Qu Wenruo <quwenruo@cn.fujitsu.com> |
btrfs: Fix a deadlock in btrfs_dev_replace_finishing()
btrfs-transacion:5657 [stack snip] btrfs_bio_map() btrfs_bio_counter_inc_blocked() percpu_counter_inc(&fs_info->bio_counter) ###bi
btrfs: Fix a deadlock in btrfs_dev_replace_finishing()
btrfs-transacion:5657 [stack snip] btrfs_bio_map() btrfs_bio_counter_inc_blocked() percpu_counter_inc(&fs_info->bio_counter) ###bio_counter > 0(A) __btrfs_bio_map() btrfs_dev_replace_lock() mutex_lock(dev_replace->lock) ###wait mutex(B)
btrfs:32612 [stack snip] btrfs_dev_replace_start() btrfs_dev_replace_lock() mutex_lock(dev_replace->lock) ###hold mutex(B) btrfs_dev_replace_finishing() btrfs_rm_dev_replace_blocked() wait until percpu_counter_sum == 0 ###wait on bio_counter(A)
This bug can be triggered quite easily by the following test script: http://pastebin.com/MQmb37Cy
This patch will fix the ABBA problem by calling btrfs_dev_replace_unlock() before btrfs_rm_dev_replace_blocked().
The consistency of btrfs devices list and their superblocks is protected by device_list_mutex, not btrfs_dev_replace_lock/unlock(). So it is safe the move btrfs_dev_replace_unlock() before btrfs_rm_dev_replace_blocked().
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
Revision tags: v3.17-rc1 |
|
#
de4c296f |
| 13-Aug-2014 |
Anand Jain <Anand.Jain@oracle.com> |
btrfs: fix typo in the log message
there is no matching open parenthesis for the closing parenthesis
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
|
#
63dd86fa |
| 13-Aug-2014 |
Anand Jain <Anand.Jain@oracle.com> |
btrfs: fix rw_devices miss match after seed replace
reproducer: reproducer: mount /dev/sdb /btrfs btrfs dev add /dev/sdc /btrfs btrfs rep start -B /dev/sdb /dev/sdd /btrfs umount
btrfs: fix rw_devices miss match after seed replace
reproducer: reproducer: mount /dev/sdb /btrfs btrfs dev add /dev/sdc /btrfs btrfs rep start -B /dev/sdb /dev/sdd /btrfs umount /btrfs
WARNING: CPU: 0 PID: 3882 at fs/btrfs/volumes.c:892 __btrfs_close_devices+0x1c8/0x200 [btrfs]()
which is
WARN_ON(fs_devices->rw_devices);
The problem here is that we did not add one to the rw_devices when we replace the seed device with a writable device.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
Revision tags: v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3, v3.16-rc2, v3.16-rc1, v3.15 |
|
#
49c6f736 |
| 02-Jun-2014 |
Anand Jain <Anand.Jain@oracle.com> |
btrfs: dev replace should replace the sysfs entry
when we replace the device its corresponding sysfs entry has to be replaced as well
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by:
btrfs: dev replace should replace the sysfs entry
when we replace the device its corresponding sysfs entry has to be replaced as well
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
#
c81d5767 |
| 04-Jun-2014 |
Gui Hecheng <guihc.fnst@cn.fujitsu.com> |
btrfs: replace EINVAL with EOPNOTSUPP for dev_replace raid56
To return EOPNOTSUPP is more user friendly than to return EINVAL, and then user-space tool will show that the dev_replace operation for r
btrfs: replace EINVAL with EOPNOTSUPP for dev_replace raid56
To return EOPNOTSUPP is more user friendly than to return EINVAL, and then user-space tool will show that the dev_replace operation for raid56 is not currently supported rather than showing that there is an invalid argument.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
Revision tags: v3.15-rc8, v3.15-rc7, v3.15-rc6, v3.15-rc5, v3.15-rc4, v3.15-rc3, v3.15-rc2, v3.15-rc1, v3.14, v3.14-rc8, v3.14-rc7, v3.14-rc6 |
|
#
6c255e67 |
| 05-Mar-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock
We needn't flush all delalloc inodes when we doesn't get s_umount lock, or we would make the tasks wait for a long time.
Sig
Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock
We needn't flush all delalloc inodes when we doesn't get s_umount lock, or we would make the tasks wait for a long time.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com>
show more ...
|
Revision tags: v3.14-rc5, v3.14-rc4, v3.14-rc3, v3.14-rc2, v3.14-rc1 |
|
#
c404e0dc |
| 30-Jan-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix use-after-free in the finishing procedure of the device replace
During device replace test, we hit a null pointer deference (It was very easy to reproduce it by running xfstests' btrfs/01
Btrfs: fix use-after-free in the finishing procedure of the device replace
During device replace test, we hit a null pointer deference (It was very easy to reproduce it by running xfstests' btrfs/011 on the devices with the virtio scsi driver). There were two bugs that caused this problem: - We might allocate new chunks on the replaced device after we updated the mapping tree. And we forgot to replace the source device in those mapping of the new chunks. - We might get the mapping information which including the source device before the mapping information update. And then submit the bio which was based on that mapping information after we freed the source device.
For the first bug, we can fix it by doing mapping tree update and source device remove in the same context of the chunk mutex. The chunk mutex is used to protect the allocable device list, the above method can avoid the new chunk allocation, and after we remove the source device, all the new chunks will be allocated on the new device. So it can fix the first bug.
For the second bug, we need make sure all flighting bios are finished and no new bios are produced during we are removing the source device. To fix this problem, we introduced a global @bio_counter, we not only inc/dec @bio_counter outsize of map_blocks, but also inc it before submitting bio and dec @bio_counter when ending bios.
Since Raid56 is a little different and device replace dosen't support raid56 yet, it is not addressed in the patch and I add comments to make sure we will fix it in the future.
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com>
show more ...
|
#
391cd9df |
| 30-Jan-2014 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace
the alloc list of the filesystem is protected by ->chunk_mutex, we need get that mutex when we insert the new de
Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace
the alloc list of the filesystem is protected by ->chunk_mutex, we need get that mutex when we insert the new device into the list.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com>
show more ...
|
Revision tags: v3.13, v3.13-rc8, v3.13-rc7, v3.13-rc6, v3.13-rc5 |
|
#
efe120a0 |
| 20-Dec-2013 |
Frank Holton <fholton@gmail.com> |
Btrfs: convert printk to btrfs_ and fix BTRFS prefix
Convert all applicable cases of printk and pr_* to the btrfs_* macros.
Fix all uses of the BTRFS prefix.
Signed-off-by: Frank Holton <fholton@g
Btrfs: convert printk to btrfs_ and fix BTRFS prefix
Convert all applicable cases of printk and pr_* to the btrfs_* macros.
Fix all uses of the BTRFS prefix.
Signed-off-by: Frank Holton <fholton@gmail.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
show more ...
|
Revision tags: v3.13-rc4, v3.13-rc3, v3.13-rc2, v3.13-rc1 |
|
#
52a15759 |
| 14-Nov-2013 |
Anand Jain <Anand.Jain@oracle.com> |
btrfs: fix typo in the log message
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
#
91aef86f |
| 04-Nov-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: rename btrfs_start_all_delalloc_inodes
rename the function -- btrfs_start_all_delalloc_inodes(), and make its name be compatible to btrfs_wait_ordered_roots(), since they are always used at t
Btrfs: rename btrfs_start_all_delalloc_inodes
rename the function -- btrfs_start_all_delalloc_inodes(), and make its name be compatible to btrfs_wait_ordered_roots(), since they are always used at the same place.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
show more ...
|
#
b0244199 |
| 04-Nov-2013 |
Miao Xie <miaox@cn.fujitsu.com> |
Btrfs: don't wait for the completion of all the ordered extents
It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want t
Btrfs: don't wait for the completion of all the ordered extents
It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
show more ...
|
Revision tags: v3.12, v3.12-rc7, v3.12-rc6 |
|
#
8b558c5f |
| 16-Oct-2013 |
Zach Brown <zab@redhat.com> |
btrfs: remove fs/btrfs/compat.h
fs/btrfs/compat.h only contained trivial macro wrappers of drop_nlink() and inc_nlink(). This doesn't belong in mainline.
Signed-off-by: Zach Brown <zab@redhat.com>
btrfs: remove fs/btrfs/compat.h
fs/btrfs/compat.h only contained trivial macro wrappers of drop_nlink() and inc_nlink(). This doesn't belong in mainline.
Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
show more ...
|
#
4546bcae |
| 16-Oct-2013 |
Zach Brown <zab@redhat.com> |
btrfs: use get_seconds() instead of btrfs wrapper
Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
Revision tags: v3.12-rc5 |
|
#
e649e587 |
| 10-Oct-2013 |
Ilya Dryomov <idryomov@gmail.com> |
Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts
For both balance and replace, cancelling involves changing the on-disk state and committing a transaction, which is not a good thing to
Btrfs: disallow 'btrfs {balance,replace} cancel' on ro mounts
For both balance and replace, cancelling involves changing the on-disk state and committing a transaction, which is not a good thing to do on read-only filesystems.
Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
show more ...
|