#
57c21994 |
| 04-Dec-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: drop unused ttl_from parameter from fill_inode
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
1f08529c |
| 29-Oct-2019 |
Al Viro <viro@zeniv.linux.org.uk> |
ceph: add missing check in d_revalidate snapdir handling
We should not play with dcache without parent locked...
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-o
ceph: add missing check in d_revalidate snapdir handling
We should not play with dcache without parent locked...
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
c62498d7 |
| 25-Jul-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: update the mtime when truncating up
If we have Fx caps, and the we're truncating the size to be larger, then we'll cache the size attribute change, but the mtime won't be updated.
Move the si
ceph: update the mtime when truncating up
If we have Fx caps, and the we're truncating the size to be larger, then we'll cache the size attribute change, but the mtime won't be updated.
Move the size handling before the mtime, and add ATTR_MTIME to ia_valid in that case to make sure the mtime also gets updated.
This fixes xfstest generic/313.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
f4b97866 |
| 25-Jul-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: track and report error of async metadata operation
Use errseq_t to track and report errors of async metadata operations, similar to how kernel handles errors during writeback.
If any dirty ca
ceph: track and report error of async metadata operation
Use errseq_t to track and report errors of async metadata operations, similar to how kernel handles errors during writeback.
If any dirty caps or any unsafe request gets dropped during session eviction, record -EIO in corresponding inode's i_meta_err. The error will be reported by subsequent fsync,
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
75067034 |
| 23-Jul-2019 |
Luis Henriques <lhenriques@suse.com> |
ceph: fix directories inode i_blkbits initialization
When filling an inode with info from the MDS, i_blkbits is being initialized using fl_stripe_unit, which contains the stripe unit in bytes. Unfo
ceph: fix directories inode i_blkbits initialization
When filling an inode with info from the MDS, i_blkbits is being initialized using fl_stripe_unit, which contains the stripe unit in bytes. Unfortunately, this doesn't make sense for directories as they have fl_stripe_unit set to '0'. This means that i_blkbits will be set to 0xff, causing an UBSAN undefined behaviour in i_blocksize():
UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12 shift exponent 255 is too large for 32-bit type 'int'
Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit is zero.
Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
af8a85a4 |
| 19-Jul-2019 |
Luis Henriques <lhenriques@suse.com> |
ceph: fix buffer free while holding i_ceph_lock in fill_inode()
Calling ceph_buffer_put() in fill_inode() may result in freeing the i_xattrs.blob buffer while holding the i_ceph_lock. This can be f
ceph: fix buffer free while holding i_ceph_lock in fill_inode()
Calling ceph_buffer_put() in fill_inode() may result in freeing the i_xattrs.blob buffer while holding the i_ceph_lock. This can be fixed by postponing the call until later, when the lock is released.
The following backtrace was triggered by fstests generic/070.
BUG: sleeping function called from invalid context at mm/vmalloc.c:2283 in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: kworker/0:4 6 locks held by kworker/0:4/3852: #0: 000000004270f6bb ((wq_completion)ceph-msgr){+.+.}, at: process_one_work+0x1b8/0x5f0 #1: 00000000eb420803 ((work_completion)(&(&con->work)->work)){+.+.}, at: process_one_work+0x1b8/0x5f0 #2: 00000000be1c53a4 (&s->s_mutex){+.+.}, at: dispatch+0x288/0x1476 #3: 00000000559cb958 (&mdsc->snap_rwsem){++++}, at: dispatch+0x2eb/0x1476 #4: 000000000d5ebbae (&req->r_fill_mutex){+.+.}, at: dispatch+0x2fc/0x1476 #5: 00000000a83d0514 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: fill_inode.isra.0+0xf8/0xf70 CPU: 0 PID: 3852 Comm: kworker/0:4 Not tainted 5.2.0+ #441 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Workqueue: ceph-msgr ceph_con_workfn Call Trace: dump_stack+0x67/0x90 ___might_sleep.cold+0x9f/0xb1 vfree+0x4b/0x60 ceph_buffer_release+0x1b/0x60 fill_inode.isra.0+0xa9b/0xf70 ceph_fill_trace+0x13b/0xc70 ? dispatch+0x2eb/0x1476 dispatch+0x320/0x1476 ? __mutex_unlock_slowpath+0x4d/0x2a0 ceph_con_workfn+0xc97/0x2ec0 ? process_one_work+0x1b8/0x5f0 process_one_work+0x244/0x5f0 worker_thread+0x4d/0x3e0 kthread+0x105/0x140 ? process_one_work+0x5f0/0x5f0 ? kthread_park+0x90/0x90 ret_from_fork+0x3a/0x50
Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
52dd0f1b |
| 05-Jul-2019 |
Luis Henriques <lhenriques@suse.com> |
ceph: use generic_delete_inode() for ->drop_inode
ceph_drop_inode() implementation is not any different from the generic function, thus there's no point in keeping it around.
Signed-off-by: Luis He
ceph: use generic_delete_inode() for ->drop_inode
ceph_drop_inode() implementation is not any different from the generic function, thus there's no point in keeping it around.
Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
87bc5b89 |
| 01-Jun-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: use ceph_evict_inode to cleanup inode's resource
remove_session_caps() relies on __wait_on_freeing_inode(), to wait for freeing inode to remove its caps. But VFS wakes freeing inode waiters be
ceph: use ceph_evict_inode to cleanup inode's resource
remove_session_caps() relies on __wait_on_freeing_inode(), to wait for freeing inode to remove its caps. But VFS wakes freeing inode waiters before calling destroy_inode().
Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/40102 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
a35ead31 |
| 06-Jun-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: add change_attr field to ceph_inode_info
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
58981784 |
| 05-Jun-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: allow querying of STATX_BTIME in ceph_getattr
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
245ce991 |
| 29-May-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: add btime field to ceph_inode_info
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
ac6713cc |
| 26-May-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: add selinux support
When creating new file/directory, use security_dentry_init_security() to prepare selinux context for the new inode, then send openc/mkdir request to MDS, together with seli
ceph: add selinux support
When creating new file/directory, use security_dentry_init_security() to prepare selinux context for the new inode, then send openc/mkdir request to MDS, together with selinux xattr.
security_dentry_init_security() only supports single security module and only selinux has dentry_init_security hook. So only selinux is supported for now. We can add support for other security modules once kernel has a generic version of dentry_init_security()
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
d6e47819 |
| 22-May-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: hold i_ceph_lock when removing caps for freeing inode
ceph_d_revalidate(, LOOKUP_RCU) may call __ceph_caps_issued_mask() on a freeing inode.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Revi
ceph: hold i_ceph_lock when removing caps for freeing inode
ceph_d_revalidate(, LOOKUP_RCU) may call __ceph_caps_issued_mask() on a freeing inode.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
543212b3 |
| 22-May-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: close race between d_name_cmp() and update_dentry_lease()
d_name_cmp() and update_dentry_lease() lock and unlock dentry->d_lock respectively. Dentry may get renamed between them. The fix is mo
ceph: close race between d_name_cmp() and update_dentry_lease()
d_name_cmp() and update_dentry_lease() lock and unlock dentry->d_lock respectively. Dentry may get renamed between them. The fix is moving the dentry name compare into update_dentry_lease().
This patch introduce two version of update_dentry_lease(). One version is for the case that parent inode is locked. It does not need to check parent/target inode and dentry name. Another version is for the case that parent inode is not locked. It checks parent/target inode and dentry name after locking dentry->d_lock.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
193e7b37 |
| 18-Apr-2019 |
David Disseldorp <ddiss@suse.de> |
ceph: carry snapshot creation time with inodes
MDS InodeStat v3 wire structures include a trailing snapshot creation time member. Unmarshall this and retain it for a future vxattr.
Signed-off-by: D
ceph: carry snapshot creation time with inodes
MDS InodeStat v3 wire structures include a trailing snapshot creation time member. Unmarshall this and retain it for a future vxattr.
Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
3e1d0452 |
| 18-May-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: avoid iput_final() while holding mutex or in dispatch thread
iput_final() may wait for reahahead pages. The wait can cause deadlock. For example:
Workqueue: ceph-msgr ceph_con_workfn [libce
ceph: avoid iput_final() while holding mutex or in dispatch thread
iput_final() may wait for reahahead pages. The wait can cause deadlock. For example:
Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: schedule+0x36/0x80 io_schedule+0x16/0x40 __lock_page+0x101/0x140 truncate_inode_pages_range+0x556/0x9f0 truncate_inode_pages_final+0x4d/0x60 evict+0x182/0x1a0 iput+0x1d2/0x220 iterate_session_caps+0x82/0x230 [ceph] dispatch+0x678/0xa80 [ceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40
Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: __schedule+0x3d6/0x8b0 schedule+0x36/0x80 schedule_preempt_disabled+0xe/0x10 mutex_lock+0x2f/0x40 ceph_check_caps+0x505/0xa80 [ceph] ceph_put_wrbuffer_cap_refs+0x1e5/0x2c0 [ceph] writepages_finish+0x2d3/0x410 [ceph] __complete_request+0x26/0x60 [libceph] handle_reply+0x6c8/0xa10 [libceph] dispatch+0x29a/0xbb0 [libceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40
In above example, truncate_inode_pages_range() waits for readahead pages while holding s_mutex. ceph_check_caps() waits for s_mutex and blocks OSD dispatch thread. Later OSD replies (for readahead) can't be handled.
ceph_check_caps() also may lock snap_rwsem for read. So similar deadlock can happen if iput_final() is called while holding snap_rwsem.
In general, it's not good to call iput_final() inside MDS/OSD dispatch threads or while holding any mutex.
The fix is introducing ceph_async_iput(), which calls iput_final() in workqueue.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
1cf89a8d |
| 17-May-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: single workqueue for inode related works
We have three workqueue for inode works. Later patch will introduce one more work for inode. It's not good to introcuce more workqueue and add more 'st
ceph: single workqueue for inode related works
We have three workqueue for inode works. Later patch will introduce one more work for inode. It's not good to introcuce more workqueue and add more 'struct work_struct' to 'struct ceph_inode_info'.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
428bb68a |
| 11-Apr-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: properly handle granular statx requests
cephfs can benefit from statx. We can have the client just request caps sufficient for the needed attributes and leave off the rest.
Also, recognize wh
ceph: properly handle granular statx requests
cephfs can benefit from statx. We can have the client just request caps sufficient for the needed attributes and leave off the rest.
Also, recognize when AT_STATX_DONT_SYNC is set, and just scrape the inode without doing any call in that case. Force a call to the MDS in the event that AT_STATX_FORCE_SYNC is set.
Link: http://tracker.ceph.com/issues/39258 Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
cfa6d412 |
| 10-Apr-2019 |
Al Viro <viro@zeniv.linux.org.uk> |
ceph: use ->free_inode()
a lot of non-delayed work in this case; all of that is left in ->destroy_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4b822287 |
| 15-Apr-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: handle the case where a dentry has been renamed on outstanding req
It's possible for us to issue a lookup to revalidate a dentry concurrently with a rename. If done in the right order, then we
ceph: handle the case where a dentry has been renamed on outstanding req
It's possible for us to issue a lookup to revalidate a dentry concurrently with a rename. If done in the right order, then we could end up processing dentry info in the reply that no longer reflects the state of the dentry.
If req->r_dentry->d_name differs from the one in the trace, then just ignore the trace in the reply. We only need to do this however if the parent's i_rwsem is not held.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.0.5 |
|
#
daf5cc27 |
| 25-Mar-2019 |
Al Viro <viro@zeniv.linux.org.uk> |
ceph: fix use-after-free on symlink traversal
free the symlink body after the same RCU delay we have for freeing the struct inode itself, so that traversal during RCU pathwalk wouldn't step into fre
ceph: fix use-after-free on symlink traversal
free the symlink body after the same RCU delay we have for freeing the struct inode itself, so that traversal during RCU pathwalk wouldn't step into freed memory.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25, v4.19.24, v4.19.23, v4.19.22, v4.19.21, v4.19.20 |
|
#
37c4efc1 |
| 31-Jan-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: periodically trim stale dentries
Previous commit make VFS delete stale dentry when last reference is dropped. Lease also can become invalid when corresponding dentry has no reference. This pat
ceph: periodically trim stale dentries
Previous commit make VFS delete stale dentry when last reference is dropped. Lease also can become invalid when corresponding dentry has no reference. This patch make cephfs periodically scan lease list, delete corresponding dentry if lease is invalid.
There are two types of lease, dentry lease and dir lease. dentry lease has life time and applies to singe dentry. Dentry lease is added to tail of a list when it's updated, leases at front of the list will expire first. Dir lease is CEPH_CAP_FILE_SHARED on directory inode, it applies to all dentries in the directory. Dentries have dir leases are added to another list. Dentries in the list are periodically checked in a round robin manner.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v4.19.19 |
|
#
1e9c2eb6 |
| 28-Jan-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: delete stale dentry when last reference is dropped
introduce ceph_d_delete(), which checks if dentry has valid lease.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <j
ceph: delete stale dentry when last reference is dropped
introduce ceph_d_delete(), which checks if dentry has valid lease.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v4.19.18, v4.19.17, v4.19.16 |
|
#
e3ec8d68 |
| 14-Jan-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: send cap releases more aggressively
When pending cap releases fill up one message, start a work to send cap release message. (old way is sending cap releases every 5 seconds)
Signed-off-by: "
ceph: send cap releases more aggressively
When pending cap releases fill up one message, start a work to send cap release message. (old way is sending cap releases every 5 seconds)
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v4.19.15, v4.19.14 |
|
#
08796873 |
| 08-Jan-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: support getting ceph.dir.pin vxattr
Link: http://tracker.ceph.com/issues/37576 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|