#
1dd8d470 |
| 03-Sep-2020 |
Xiubo Li <xiubli@redhat.com> |
ceph: metrics for opened files, pinned caps and opened inodes In client for each inode, it may have many opened files and may have been pinned in more than one MDS servers. And some inod
ceph: metrics for opened files, pinned caps and opened inodes In client for each inode, it may have many opened files and may have been pinned in more than one MDS servers. And some inodes are idle, which have no any opened files. This patch will show these metrics in the debugfs, likes: item total ----------------------------------------- opened files / total inodes 14 / 5 pinned i_caps / total inodes 7 / 5 opened inodes / total inodes 3 / 5 Will send these metrics to ceph, which will be used by the `fs top`, later. [ jlayton: drop unrelated hunk, count hashed inodes instead of allocated ones ] URL: https://tracker.ceph.com/issues/47005 Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
2678da88 |
| 03-Sep-2020 |
Xiubo Li <xiubli@redhat.com> |
ceph: add ceph_sb_to_mdsc helper support to parse the mdsc This will help simplify the code. [ jlayton: fix minor merge conflict in quota.c ] Signed-off-by: Xiubo Li <xiubl
ceph: add ceph_sb_to_mdsc helper support to parse the mdsc This will help simplify the code. [ jlayton: fix minor merge conflict in quota.c ] Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
ebce3eb2 |
| 18-Aug-2020 |
Jeff Layton <jlayton@kernel.org> |
ceph: fix inode number handling on arches with 32-bit ino_t Tuan and Ulrich mentioned that they were hitting a problem on s390x, which has a 32-bit ino_t value, even though it's a 64-bit
ceph: fix inode number handling on arches with 32-bit ino_t Tuan and Ulrich mentioned that they were hitting a problem on s390x, which has a 32-bit ino_t value, even though it's a 64-bit arch (for historical reasons). I think the current handling of inode numbers in the ceph driver is wrong. It tries to use 32-bit inode numbers on 32-bit arches, but that's actually not a problem. 32-bit arches can deal with 64-bit inode numbers just fine when userland code is compiled with LFS support (the common case these days). What we really want to do is just use 64-bit numbers everywhere, unless someone has mounted with the ino32 mount option. In that case, we want to ensure that we hash the inode number down to something that will fit in 32 bits before presenting the value to userland. Add new helper functions that do this, and only do the conversion before presenting these values to userland in getattr and readdir. The inode table hashvalue is changed to just cast the inode number to unsigned long, as low-order bits are the most likely to vary anyway. While it's not strictly required, we do want to put something in inode->i_ino. Instead of basing it on BITS_PER_LONG, however, base it on the size of the ino_t type. NOTE: This is a user-visible change on 32-bit arches: 1/ inode numbers will be seen to have changed between kernel versions. 32-bit arches will see large inode numbers now instead of the hashed ones they saw before. 2/ any really old software not built with LFS support may start failing stat() calls with -EOVERFLOW on inode numbers >2^32. Nothing much we can do about these, but hopefully the intersection of people running such code on ceph will be very small. The workaround for both problems is to mount with "-o ino32". [ idryomov: changelog tweak ] URL: https://tracker.ceph.com/issues/46828 Reported-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Reported-and-Tested-by: Tuan Hoang1 <Tuan.Hoang1@ibm.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27 |
|
#
1af16d54 |
| 19-Mar-2020 |
Xiubo Li <xiubli@redhat.com> |
ceph: add caps perf metric for each superblock Count hits and misses in the caps cache. If the client has all of the necessary caps when a task needs references, then it's counted as
ceph: add caps perf metric for each superblock Count hits and misses in the caps cache. If the client has all of the necessary caps when a task needs references, then it's counted as a hit. Any other situation is a miss. URL: https://tracker.ceph.com/issues/43215 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.4.26, v5.4.25 |
|
#
ef915725 |
| 11-Mar-2020 |
Luis Henriques <lhenriques@suse.com> |
ceph: fix snapshot directory timestamps The .snap directory timestamps are kept at 0 (1970-01-01 00:00), which isn't consistent with what the fuse client does. This patch makes the
ceph: fix snapshot directory timestamps The .snap directory timestamps are kept at 0 (1970-01-01 00:00), which isn't consistent with what the fuse client does. This patch makes the behaviour consistent, by setting these timestamps (atime, btime, ctime, mtime) to those of the parent directory. Cc: Marc Roos <M.Roos@f1-outsourcing.eu> Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.4.24 |
|
#
bf73c62e |
| 05-Mar-2020 |
Yan, Zheng <zyan@redhat.com> |
ceph: check all mds' caps after page writeback If an inode has caps from multiple mds's, the following can happen: - non-auth mds revokes Fsc. Fcb is used, so page writeback is queu
ceph: check all mds' caps after page writeback If an inode has caps from multiple mds's, the following can happen: - non-auth mds revokes Fsc. Fcb is used, so page writeback is queued. - when writeback finishes, ceph_check_caps() is called with auth only flag. ceph_check_caps() invalidates pagecache, but skips checking any non-auth caps. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
135e671e |
| 05-Mar-2020 |
Yan, Zheng <zyan@redhat.com> |
ceph: simplify calling of ceph_get_fmode() Originally, calling ceph_get_fmode() for open files is by thread that handles request reply. There is a small window between updating caps and
ceph: simplify calling of ceph_get_fmode() Originally, calling ceph_get_fmode() for open files is by thread that handles request reply. There is a small window between updating caps and and waking the request initiator. We need to prevent ceph_check_caps() from releasing wanted caps in the window. Previous patches made fill_inode() call __ceph_touch_fmode() for open file requests. This prevented ceph_check_caps() from releasing wanted caps for 'caps_wanted_delay_min' seconds, enough for request initiator to get woken up and call ceph_get_fmode(). This allows us to now call ceph_get_fmode() in ceph_open() instead. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
a0d93e32 |
| 05-Mar-2020 |
Yan, Zheng <zyan@redhat.com> |
ceph: remove delay check logic from ceph_check_caps() __ceph_caps_file_wanted() already checks 'caps_wanted_delay_min' and 'caps_wanted_delay_max'. There is no need to duplicate the logi
ceph: remove delay check logic from ceph_check_caps() __ceph_caps_file_wanted() already checks 'caps_wanted_delay_min' and 'caps_wanted_delay_max'. There is no need to duplicate the logic in ceph_check_caps() and __send_cap() Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
719a2514 |
| 05-Mar-2020 |
Yan, Zheng <zyan@redhat.com> |
ceph: consider inode's last read/write when calculating wanted caps Add i_last_rd and i_last_wr to ceph_inode_info. These fields are used to track the last time the client acquired read/
ceph: consider inode's last read/write when calculating wanted caps Add i_last_rd and i_last_wr to ceph_inode_info. These fields are used to track the last time the client acquired read/write caps for the inode. If there is no read/write on an inode for 'caps_wanted_delay_max' seconds, __ceph_caps_file_wanted() does not request caps for read/write even there are open files. Call __ceph_touch_fmode() for dir operations. __ceph_caps_file_wanted() calculates dir's wanted caps according to last dir read/modification. If there is recent dir read, dir inode wants CEPH_CAP_ANY_SHARED caps. If there is recent dir modification, also wants CEPH_CAP_FILE_EXCL. Readdir is a special case. Dir inode wants CEPH_CAP_FILE_EXCL after readdir, as with that, modifications do not need to release CEPH_CAP_FILE_SHARED or invalidate all dentry leases issued by readdir. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8 |
|
#
785892fe |
| 02-Jan-2020 |
Jeff Layton <jlayton@kernel.org> |
ceph: cache layout in parent dir on first sync create If a create is done, then typically we'll end up writing to the file soon afterward. We don't want to wait for the reply before doin
ceph: cache layout in parent dir on first sync create If a create is done, then typically we'll end up writing to the file soon afterward. We don't want to wait for the reply before doing that when doing an async create, so that means we need the layout for the new file before we've gotten the response from the MDS. All files created in a directory will initially inherit the same layout, so copy off the requisite info from the first synchronous create in the directory, and save it in a new i_cached_layout field. Zero out the layout when we lose Dc caps in the dir. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3 |
|
#
966c7160 |
| 05-Dec-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: make ceph_fill_inode non-static Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
f5e17aed |
| 18-Feb-2020 |
Jeff Layton <jlayton@kernel.org> |
ceph: track primary dentry link Newer versions of the MDS will flag a dentry as "primary". In later patches, we'll need to consult this info, so track it in di->flags. Signed-of
ceph: track primary dentry link Newer versions of the MDS will flag a dentry as "primary". In later patches, we'll need to consult this info, so track it in di->flags. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.3.15, v5.4.2 |
|
#
3bb48b41 |
| 02-Dec-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: add flag to designate that a request is asynchronous ...and ensure that such requests are never queued. The MDS has need to know that a request is asynchronous so add flags and pro
ceph: add flag to designate that a request is asynchronous ...and ensure that such requests are never queued. The MDS has need to know that a request is asynchronous so add flags and proper infrastructure for that. Also, delegated inode numbers and directory caps are associated with the session, so ensure that async requests are always transmitted on the first attempt and are never queued to wait for session reestablishment. If it does end up looking like we'll need to queue the request, then have it return -EJUKEBOX so the caller can reattempt with a synchronous request. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12, v5.2.11, v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1, v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6, v5.1.5, v5.1.4, v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6 |
|
#
f85122af |
| 02-Apr-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: add refcounting for Fx caps In future patches we'll be taking and relying on Fx caps. Add proper refcounting for them. Signed-off-by: Jeff Layton <jlayton@kernel.org>
ceph: add refcounting for Fx caps In future patches we'll be taking and relying on Fx caps. Add proper refcounting for them. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
893e456b |
| 11-Dec-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: don't clear I_NEW until inode metadata is fully populated Currently, we could have an open-by-handle (or NFS server) call into the filesystem and start working with an inode before
ceph: don't clear I_NEW until inode metadata is fully populated Currently, we could have an open-by-handle (or NFS server) call into the filesystem and start working with an inode before it's properly filled out. Don't clear I_NEW until we have filled out the inode, and discard it properly if that fails. Note that we occasionally take an extra reference to the inode to ensure that we don't put the last reference in discard_new_inode, but rather leave it for ceph_async_iput. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
9a6bed4f |
| 05-Dec-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: ensure we have a new cap before continuing in fill_inode If the caller passes in a NULL cap_reservation, and we can't allocate one then ensure that we fail gracefully. Sig
ceph: ensure we have a new cap before continuing in fill_inode If the caller passes in a NULL cap_reservation, and we can't allocate one then ensure that we fail gracefully. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
57c21994 |
| 04-Dec-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: drop unused ttl_from parameter from fill_inode Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
1f08529c |
| 29-Oct-2019 |
Al Viro <viro@zeniv.linux.org.uk> |
ceph: add missing check in d_revalidate snapdir handling We should not play with dcache without parent locked... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.li
ceph: add missing check in d_revalidate snapdir handling We should not play with dcache without parent locked... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
c62498d7 |
| 25-Jul-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: update the mtime when truncating up If we have Fx caps, and the we're truncating the size to be larger, then we'll cache the size attribute change, but the mtime won't be updated.
ceph: update the mtime when truncating up If we have Fx caps, and the we're truncating the size to be larger, then we'll cache the size attribute change, but the mtime won't be updated. Move the size handling before the mtime, and add ATTR_MTIME to ia_valid in that case to make sure the mtime also gets updated. This fixes xfstest generic/313. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
f4b97866 |
| 25-Jul-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: track and report error of async metadata operation Use errseq_t to track and report errors of async metadata operations, similar to how kernel handles errors during writeback.
ceph: track and report error of async metadata operation Use errseq_t to track and report errors of async metadata operations, similar to how kernel handles errors during writeback. If any dirty caps or any unsafe request gets dropped during session eviction, record -EIO in corresponding inode's i_meta_err. The error will be reported by subsequent fsync, Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
75067034 |
| 23-Jul-2019 |
Luis Henriques <lhenriques@suse.com> |
ceph: fix directories inode i_blkbits initialization When filling an inode with info from the MDS, i_blkbits is being initialized using fl_stripe_unit, which contains the stripe unit in
ceph: fix directories inode i_blkbits initialization When filling an inode with info from the MDS, i_blkbits is being initialized using fl_stripe_unit, which contains the stripe unit in bytes. Unfortunately, this doesn't make sense for directories as they have fl_stripe_unit set to '0'. This means that i_blkbits will be set to 0xff, causing an UBSAN undefined behaviour in i_blocksize(): UBSAN: Undefined behaviour in ./include/linux/fs.h:731:12 shift exponent 255 is too large for 32-bit type 'int' Fix this by initializing i_blkbits to CEPH_BLOCK_SHIFT if fl_stripe_unit is zero. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
af8a85a4 |
| 19-Jul-2019 |
Luis Henriques <lhenriques@suse.com> |
ceph: fix buffer free while holding i_ceph_lock in fill_inode() Calling ceph_buffer_put() in fill_inode() may result in freeing the i_xattrs.blob buffer while holding the i_ceph_lock. T
ceph: fix buffer free while holding i_ceph_lock in fill_inode() Calling ceph_buffer_put() in fill_inode() may result in freeing the i_xattrs.blob buffer while holding the i_ceph_lock. This can be fixed by postponing the call until later, when the lock is released. The following backtrace was triggered by fstests generic/070. BUG: sleeping function called from invalid context at mm/vmalloc.c:2283 in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: kworker/0:4 6 locks held by kworker/0:4/3852: #0: 000000004270f6bb ((wq_completion)ceph-msgr){+.+.}, at: process_one_work+0x1b8/0x5f0 #1: 00000000eb420803 ((work_completion)(&(&con->work)->work)){+.+.}, at: process_one_work+0x1b8/0x5f0 #2: 00000000be1c53a4 (&s->s_mutex){+.+.}, at: dispatch+0x288/0x1476 #3: 00000000559cb958 (&mdsc->snap_rwsem){++++}, at: dispatch+0x2eb/0x1476 #4: 000000000d5ebbae (&req->r_fill_mutex){+.+.}, at: dispatch+0x2fc/0x1476 #5: 00000000a83d0514 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: fill_inode.isra.0+0xf8/0xf70 CPU: 0 PID: 3852 Comm: kworker/0:4 Not tainted 5.2.0+ #441 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014 Workqueue: ceph-msgr ceph_con_workfn Call Trace: dump_stack+0x67/0x90 ___might_sleep.cold+0x9f/0xb1 vfree+0x4b/0x60 ceph_buffer_release+0x1b/0x60 fill_inode.isra.0+0xa9b/0xf70 ceph_fill_trace+0x13b/0xc70 ? dispatch+0x2eb/0x1476 dispatch+0x320/0x1476 ? __mutex_unlock_slowpath+0x4d/0x2a0 ceph_con_workfn+0xc97/0x2ec0 ? process_one_work+0x1b8/0x5f0 process_one_work+0x244/0x5f0 worker_thread+0x4d/0x3e0 kthread+0x105/0x140 ? process_one_work+0x5f0/0x5f0 ? kthread_park+0x90/0x90 ret_from_fork+0x3a/0x50 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
52dd0f1b |
| 05-Jul-2019 |
Luis Henriques <lhenriques@suse.com> |
ceph: use generic_delete_inode() for ->drop_inode ceph_drop_inode() implementation is not any different from the generic function, thus there's no point in keeping it around. Si
ceph: use generic_delete_inode() for ->drop_inode ceph_drop_inode() implementation is not any different from the generic function, thus there's no point in keeping it around. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
87bc5b89 |
| 01-Jun-2019 |
Yan, Zheng <zyan@redhat.com> |
ceph: use ceph_evict_inode to cleanup inode's resource remove_session_caps() relies on __wait_on_freeing_inode(), to wait for freeing inode to remove its caps. But VFS wakes freeing inod
ceph: use ceph_evict_inode to cleanup inode's resource remove_session_caps() relies on __wait_on_freeing_inode(), to wait for freeing inode to remove its caps. But VFS wakes freeing inode waiters before calling destroy_inode(). Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/40102 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
a35ead31 |
| 06-Jun-2019 |
Jeff Layton <jlayton@kernel.org> |
ceph: add change_attr field to ceph_inode_info Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail
ceph: add change_attr field to ceph_inode_info Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|