#
cbed4ff7 |
| 27-Oct-2021 |
Luís Henriques <lhenriques@suse.de> |
ceph: split 'metric' debugfs file into several files
Currently, all the metrics are grouped together in a single file, making it difficult to process this file from scripts. Furthermore, as new met
ceph: split 'metric' debugfs file into several files
Currently, all the metrics are grouped together in a single file, making it difficult to process this file from scripts. Furthermore, as new metrics are added, processing this file will become even more challenging.
This patch turns the 'metric' file into a directory that will contain several files, one for each metric.
Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
631ed4b0 |
| 14-Oct-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: shut down mount on bad mdsmap or fsmap decode
As Greg pointed out, if we get a mangled mdsmap or fsmap, then something has gone very wrong, and we should avoid doing any activity on the filesy
ceph: shut down mount on bad mdsmap or fsmap decode
As Greg pointed out, if we get a mangled mdsmap or fsmap, then something has gone very wrong, and we should avoid doing any activity on the filesystem.
When this occurs, shut down the mount the same way we would with a forced umount by calling ceph_umount_begin when decoding fails on either map. This causes most operations done against the filesystem to return an error. Any dirty data or caps in the cache will be dropped as well.
The effect is not reversible, so the only remedy is to umount.
[ idryomov: print fsmap decoding error ]
URL: https://tracker.ceph.com/issues/52303 Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Greg Farnum <gfarnum@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
5d6451b1 |
| 31-Aug-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: shut down access to inode when async create fails
Add proper error handling for when an async create fails. The inode never existed, so any dirty caps or data are now toast. We already d_drop
ceph: shut down access to inode when async create fails
Add proper error handling for when an async create fails. The inode never existed, so any dirty caps or data are now toast. We already d_drop the dentry in that case, but the now-stale inode may still be around. We want to shut down access to these inodes, and ensure that they can't harbor any more dirty data, which can cause problems at umount time.
When this occurs, flag such inodes as being SHUTDOWN, and trash any caps and cap flushes that may be in flight for them, and invalidate the pagecache for the inode. Add a new helper that can check whether an inode or an entire mount is now shut down, and call it instead of accessing the mount_state directly in places where we test that now.
URL: https://tracker.ceph.com/issues/51279 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
36e6da98 |
| 02-Sep-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: refactor remove_session_caps_cb
Move remove_capsnaps to caps.c. Move the part of remove_session_caps_cb under i_ceph_lock into a separate function that lives in caps.c. Have remove_session_cap
ceph: refactor remove_session_caps_cb
Move remove_capsnaps to caps.c. Move the part of remove_session_caps_cb under i_ceph_lock into a separate function that lives in caps.c. Have remove_session_caps_cb call the new helper after taking the lock.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
f7a67b46 |
| 09-Aug-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: enable async dirops by default
Async dirops have been supported in mainline kernels for quite some time now, and we've recently (as of June) started doing regular testing in teuthology with '-
ceph: enable async dirops by default
Async dirops have been supported in mainline kernels for quite some time now, and we've recently (as of June) started doing regular testing in teuthology with '-o nowsync'. There were a few issues, but we've sorted those out now.
Enable async dirops by default, and change /proc/mounts to show "wsync" when they are disabled rather than "nowsync" when they are enabled.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
1bd85aa6 |
| 07-Oct-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: fix handling of "meta" errors
Currently, we check the wb_err too early for directories, before all of the unsafe child requests have been waited on. In order to fix that we need to check the m
ceph: fix handling of "meta" errors
Currently, we check the wb_err too early for directories, before all of the unsafe child requests have been waited on. In order to fix that we need to check the mapping->wb_err later nearer to the end of ceph_fsync.
We also have an overly-complex method for tracking errors after blocklisting. The errors recorded in cleanup_session_requests go to a completely separate field in the inode, but we end up reporting them the same way we would for any other error (in fsync).
There's no real benefit to tracking these errors in two different places, since the only reporting mechanism for them is in fsync, and we'd need to advance them both every time.
Given that, we can just remove i_meta_err, and convert the places that used it to instead just use mapping->wb_err instead. That also fixes the original problem by ensuring that we do a check_and_advance of the wb_err at the end of the fsync op.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/52864 Reported-by: Patrick Donnelly <pdonnell@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
a76d0a9c |
| 25-Aug-2021 |
Xiubo Li <xiubli@redhat.com> |
ceph: don't WARN if we're forcibly removing the session caps
For example in the case of a forced umount, we'll remove all the session caps even if they are dirty. Move the warning to a wrapper funct
ceph: don't WARN if we're forcibly removing the session caps
For example in the case of a forced umount, we'll remove all the session caps even if they are dirty. Move the warning to a wrapper function and make most of the callers use it. Call the core function when removing caps due to a forced umount.
Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
a6d37ccd |
| 25-Aug-2021 |
Xiubo Li <xiubli@redhat.com> |
ceph: remove the capsnaps when removing caps
capsnaps will take inode references via ihold when queueing to flush. When force unmounting, the client will just close the sessions and may never get a
ceph: remove the capsnaps when removing caps
capsnaps will take inode references via ihold when queueing to flush. When force unmounting, the client will just close the sessions and may never get a flush reply, causing a leak and inode ref leak.
Fix this by removing the capsnaps for an inode when removing the caps.
URL: https://tracker.ceph.com/issues/52295 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
0ba92e1c |
| 02-Aug-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: add ceph_change_snap_realm() helper
Consolidate some fiddly code for changing an inode's snap_realm into a new helper function, and change the callers to use it.
While we're in here, nothing
ceph: add ceph_change_snap_realm() helper
Consolidate some fiddly code for changing an inode's snap_realm into a new helper function, and change the callers to use it.
While we're in here, nothing uses the i_snap_realm_counter field, so remove that from the inode.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
b2f9fa1f |
| 18-Aug-2021 |
Xiubo Li <xiubli@redhat.com> |
ceph: correctly handle releasing an embedded cap flush
The ceph_cap_flush structures are usually dynamically allocated, but the ceph_cap_snap has an embedded one.
When force umounting, the client w
ceph: correctly handle releasing an embedded cap flush
The ceph_cap_flush structures are usually dynamically allocated, but the ceph_cap_snap has an embedded one.
When force umounting, the client will try to remove all the session caps. During this, it will free them, but that should not be done with the ones embedded in a capsnap.
Fix this by adding a new boolean that indicates that the cap flush is embedded in a capsnap, and skip freeing it if that's set.
At the same time, switch to using list_del_init() when detaching the i_list and g_list heads. It's possible for a forced umount to remove these objects but then handle_cap_flushsnap_ack() races in and does the list_del_init() again, corrupting memory.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/52283 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
0cad6246 |
| 18-Aug-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
vfs: add rcu argument to ->get_acl() callback
Add a rcu argument to the ->get_acl() callback to allow get_cached_acl_rcu() to call the ->get_acl() method in the next patch.
Signed-off-by: Miklos Sz
vfs: add rcu argument to ->get_acl() callback
Add a rcu argument to the ->get_acl() callback to allow get_cached_acl_rcu() to call the ->get_acl() method in the next patch.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
bf2ba432 |
| 06-Jul-2021 |
Luis Henriques <lhenriques@suse.de> |
ceph: reduce contention in ceph_check_delayed_caps()
Function ceph_check_delayed_caps() is called from the mdsc->delayed_work workqueue and it can be kept looping for quite some time if caps keep be
ceph: reduce contention in ceph_check_delayed_caps()
Function ceph_check_delayed_caps() is called from the mdsc->delayed_work workqueue and it can be kept looping for quite some time if caps keep being added back to the mdsc->cap_delay_list. This may result in the watchdog tainting the kernel with the softlockup flag.
This patch breaks this loop if the caps have been recently (i.e. during the loop execution). Any new caps added to the list will be handled in the next run.
Also, allow schedule_delayed() callers to explicitly set the delay value instead of defaulting to 5s, so we can ensure that it runs soon afterward if it looks like there is more work.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/46284 Signed-off-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
23c2c76e |
| 04-Jun-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: eliminate ceph_async_iput()
Now that we don't need to hold session->s_mutex or the snap_rwsem when calling ceph_check_caps, we can eliminate ceph_async_iput and just use normal iput calls.
Si
ceph: eliminate ceph_async_iput()
Now that we don't need to hold session->s_mutex or the snap_rwsem when calling ceph_check_caps, we can eliminate ceph_async_iput and just use normal iput calls.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
4364c693 |
| 13-May-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: make ceph_queue_cap_snap static
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
7a971e2c |
| 02-Jun-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: fix error handling in ceph_atomic_open and ceph_lookup
Commit aa60cfc3f7ee broke the error handling in these functions such that they don't handle non-ENOENT errors from ceph_mdsc_do_request p
ceph: fix error handling in ceph_atomic_open and ceph_lookup
Commit aa60cfc3f7ee broke the error handling in these functions such that they don't handle non-ENOENT errors from ceph_mdsc_do_request properly.
Move the checking of -ENOENT out of ceph_handle_snapdir and into the callers, and if we get a different error, return it immediately.
Fixes: aa60cfc3f7ee ("ceph: don't use d_add in ceph_handle_snapdir") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
d4f6b31d |
| 01-Apr-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: don't allow access to MDS-private inodes
The MDS reserves a set of inodes for its own usage, and these should never be accessible to clients. Add a new helper to vet a proposed inode number ag
ceph: don't allow access to MDS-private inodes
The MDS reserves a set of inodes for its own usage, and these should never be accessible to clients. Add a new helper to vet a proposed inode number against that range, and complain loudly and refuse to create or look it up if it's in it.
Also, ensure that the MDS doesn't try to delegate inodes that are in that range or lower. Print a warning if it does, and don't save the range in the xarray.
URL: https://tracker.ceph.com/issues/49922 Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
e7f72952 |
| 27-Aug-2020 |
Yanhu Cao <gmayyyha@gmail.com> |
ceph: support getting ceph.dir.rsnaps vxattr
Add support for grabbing the rsnaps value out of the inode info in traces, and exposing that via ceph.dir.rsnaps xattr.
Signed-off-by: Yanhu Cao <gmayyy
ceph: support getting ceph.dir.rsnaps vxattr
Add support for grabbing the rsnaps value out of the inode info in traces, and exposing that via ceph.dir.rsnaps xattr.
Signed-off-by: Yanhu Cao <gmayyyha@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
e72968e1 |
| 05-Apr-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: drop pinned_page parameter from ceph_get_caps
All of the existing callers that don't set this to NULL just drop the page reference at some arbitrary point later in processing. There's no point
ceph: drop pinned_page parameter from ceph_get_caps
All of the existing callers that don't set this to NULL just drop the page reference at some arbitrary point later in processing. There's no point in keeping a page reference that we don't use, so just drop the reference immediately after checking the Uptodate flag.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
aa60cfc3 |
| 01-Mar-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: don't use d_add in ceph_handle_snapdir
It's possible ceph_get_snapdir could end up finding a (disconnected) inode that already exists in the cache. Change the prototype for ceph_handle_snapdir
ceph: don't use d_add in ceph_handle_snapdir
It's possible ceph_get_snapdir could end up finding a (disconnected) inode that already exists in the cache. Change the prototype for ceph_handle_snapdir to return a dentry pointer and have it use d_splice_alias so we don't end up with an aliased dentry in the cache.
Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
7c46b318 |
| 21-Jan-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: rework PageFsCache handling
With the new fscache API, the PageFsCache bit now indicates that the page is being written to the cache and shouldn't be modified or released until it's finished.
ceph: rework PageFsCache handling
With the new fscache API, the PageFsCache bit now indicates that the page is being written to the cache and shouldn't be modified or released until it's finished.
Change releasepage and invalidatepage to wait on that bit before returning.
Also define FSCACHE_USE_NEW_IO_API so that we opt into the new fscache API.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
e7df4524 |
| 21-Jan-2021 |
Jeff Layton <jlayton@kernel.org> |
ceph: rip out old fscache readpage handling
With the new netfs read helper functions, we won't need a lot of this infrastructure as it handles the pagecache pages itself. Rip out the read handling f
ceph: rip out old fscache readpage handling
With the new netfs read helper functions, we won't need a lot of this infrastructure as it handles the pagecache pages itself. Rip out the read handling for now, and much of the old infrastructure that deals in individual pages.
The cookie handling is mostly unchanged, however.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
a8810cdc |
| 10-Dec-2020 |
Jeff Layton <jlayton@kernel.org> |
ceph: allow queueing cap/snap handling after putting cap references
Testing with the fscache overhaul has triggered some lockdep warnings about circular lock dependencies involving page_mkwrite and
ceph: allow queueing cap/snap handling after putting cap references
Testing with the fscache overhaul has triggered some lockdep warnings about circular lock dependencies involving page_mkwrite and the mmap_lock. It'd be better to do the "real work" without the mmap lock being held.
Change the skip_checking_caps parameter in __ceph_put_cap_refs to an enum, and use that to determine whether to queue check_caps, do it synchronously or not at all. Change ceph_page_mkwrite to do a ceph_put_cap_refs_async().
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
64f28c62 |
| 09-Oct-2020 |
Jeff Layton <jlayton@kernel.org> |
ceph: clean up inode work queueing
Add a generic function for taking an inode reference, setting the I_WORK bit and queueing i_work. Turn the ceph_queue_* functions into static inline wrappers that
ceph: clean up inode work queueing
Add a generic function for taking an inode reference, setting the I_WORK bit and queueing i_work. Turn the ceph_queue_* functions into static inline wrappers that pass in the right bit.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
549c7297 |
| 21-Jan-2021 |
Christian Brauner <christian.brauner@ubuntu.com> |
fs: make helpers idmap mount aware
Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has b
fs: make helpers idmap mount aware
Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches.
As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods.
Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
show more ...
|
#
6646ea1c |
| 12-Nov-2020 |
Luis Henriques <lhenriques@suse.de> |
Revert "ceph: allow rename operation under different quota realms"
This reverts commit dffdcd71458e699e839f0bf47c3d42d64210b939.
When doing a rename across quota realms, there's a corner case that
Revert "ceph: allow rename operation under different quota realms"
This reverts commit dffdcd71458e699e839f0bf47c3d42d64210b939.
When doing a rename across quota realms, there's a corner case that isn't handled correctly. Here's a testcase:
mkdir files limit truncate files/file -s 10G setfattr limit -n ceph.quota.max_bytes -v 1000000 mv files limit/
The above will succeed because ftruncate(2) won't immediately notify the MDSs with the new file size, and thus the quota realms stats won't be updated.
Since the possible fixes for this issue would have a huge performance impact, the solution for now is to simply revert to returning -EXDEV when doing a cross quota realms rename.
URL: https://tracker.ceph.com/issues/48203 Signed-off-by: Luis Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|