#
7ed286f3 |
| 09-Jun-2020 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: don't omit used_replica in target_copy()
Currently target_copy() is used only for sending linger pings, so this doesn't come up, but generally omitting used_replica can hang the client as w
libceph: don't omit used_replica in target_copy()
Currently target_copy() is used only for sending linger pings, so this doesn't come up, but generally omitting used_replica can hang the client as we wouldn't notice the acting set change (legacy_change in calc_target()) or trigger a warning in handle_reply().
Fixes: 117d96a04f00 ("libceph: support for balanced and localized reads") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
show more ...
|
#
2f3fead6 |
| 09-Jun-2020 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: don't omit recovery_deletes in target_copy()
Currently target_copy() is used only for sending linger pings, so this doesn't come up, but generally omitting recovery_deletes can result in un
libceph: don't omit recovery_deletes in target_copy()
Currently target_copy() is used only for sending linger pings, so this doesn't come up, but generally omitting recovery_deletes can result in unneeded resends (force_resend in calc_target()).
Fixes: ae78dd8139ce ("libceph: make RECOVERY_DELETES feature create a new interval") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
show more ...
|
Revision tags: v5.4.45, v5.7.1 |
|
#
22d2cfdf |
| 04-Jun-2020 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: move away from global osd_req_flags
osd_req_flags is overly general and doesn't suit its only user (read_from_replica option) well:
- applying osd_req_flags in account_request() affects al
libceph: move away from global osd_req_flags
osd_req_flags is overly general and doesn't suit its only user (read_from_replica option) well:
- applying osd_req_flags in account_request() affects all OSD requests, including linger (i.e. watch and notify). However, linger requests should always go to the primary even though some of them are reads (e.g. notify has side effects but it is a read because it doesn't result in mutation on the OSDs).
- calls to class methods that are reads are allowed to go to the replica, but most such calls issued for "rbd map" and/or exclusive lock transitions are requested to be resent to the primary via EAGAIN, doubling the latency.
Get rid of global osd_req_flags and set read_from_replica flag only on specific OSD requests instead.
Fixes: 8ad44d5e0d1e ("libceph: read_from_replica option") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
show more ...
|
Revision tags: v5.4.44, v5.7 |
|
#
d3798acc |
| 29-May-2020 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: support for alloc hint flags
Allow indicating future I/O pattern via flags. This is supported since Kraken (and bluestore persists flags together with expected_object_size and expected_wri
libceph: support for alloc hint flags
Allow indicating future I/O pattern via flags. This is supported since Kraken (and bluestore persists flags together with expected_object_size and expected_write_size).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
show more ...
|
Revision tags: v5.4.43 |
|
#
8ad44d5e |
| 23-May-2020 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: read_from_replica option
Expose replica reads through read_from_replica=balance and read_from_replica=localize. The default is to read from primary (read_from_replica=no).
Signed-off-by:
libceph: read_from_replica option
Expose replica reads through read_from_replica=balance and read_from_replica=localize. The default is to read from primary (read_from_replica=no).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
show more ...
|
#
117d96a0 |
| 23-May-2020 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: support for balanced and localized reads
OSD-side issues with reads from replica have been resolved in Octopus. Reading from replica should be safe wrt. unstable or uncommitted state now,
libceph: support for balanced and localized reads
OSD-side issues with reads from replica have been resolved in Octopus. Reading from replica should be safe wrt. unstable or uncommitted state now, so add support for balanced and localized reads.
There are two cases when a read from replica can't be served:
- OSD may silently drop the request, expecting the client to notice that the acting set has changed and resend via the usual means (handled with t->used_replica)
- OSD may return EAGAIN, expecting the client to resend to the primary, ignoring replica read flags (see handle_reply())
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
show more ...
|
Revision tags: v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27 |
|
#
97e27aaa |
| 19-Mar-2020 |
Xiubo Li <xiubli@redhat.com> |
ceph: add read/write latency metric support
Calculate the latency for OSD read requests. Add a new r_end_stamp field to struct ceph_osd_request that will hold the time of that the reply was received
ceph: add read/write latency metric support
Calculate the latency for OSD read requests. Add a new r_end_stamp field to struct ceph_osd_request that will hold the time of that the reply was received. Use that to calculate the RTT for each call, and divide the sum of those by number of calls to get averate RTT.
Keep a tally of RTT for OSD writes and number of calls to track average latency of OSD writes.
URL: https://tracker.ceph.com/issues/43215 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
890bd0f8 |
| 18-May-2020 |
Jerry Lee <leisurelysw24@gmail.com> |
libceph: ignore pool overlay and cache logic on redirects
OSD client should ignore cache/overlay flag if got redirect reply. Otherwise, the client hangs when the cache tier is in forward mode.
[ id
libceph: ignore pool overlay and cache logic on redirects
OSD client should ignore cache/overlay flag if got redirect reply. Otherwise, the client hangs when the cache tier is in forward mode.
[ idryomov: Redirects are effectively deprecated and no longer used or tested. The original tiering modes based on redirects are inherently flawed because redirects can race and reorder, potentially resulting in data corruption. The new proxy and readproxy tiering modes should be used instead of forward and readforward. Still marking for stable as obviously correct, though. ]
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/23296 URL: https://tracker.ceph.com/issues/36406 Signed-off-by: Jerry Lee <leisurelysw24@gmail.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12 |
|
#
bb0e681d |
| 30-Aug-2019 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: directly skip to the end of redirect reply
Coverity complains about a double write to *p. Don't bother with osd_instructions and directly skip to the end of redirect reply.
Reported-by: C
libceph: directly skip to the end of redirect reply
Coverity complains about a double write to *p. Don't bother with osd_instructions and directly skip to the end of redirect reply.
Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
5107d7d5 |
| 29-Jan-2020 |
Xiubo Li <xiubli@redhat.com> |
ceph: move ceph_osdc_{read,write}pages to ceph.ko
Since these helpers are only used by ceph.ko, move them there and rename them with _sync_ qualifiers.
Signed-off-by: Xiubo Li <xiubli@redhat.com> R
ceph: move ceph_osdc_{read,write}pages to ceph.ko
Since these helpers are only used by ceph.ko, move them there and rename them with _sync_ qualifiers.
Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
e8862740 |
| 10-Mar-2020 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: fix alloc_msg_with_page_vector() memory leaks
Make it so that CEPH_MSG_DATA_PAGES data item can own pages, fixing a bunch of memory leaks for a page vector allocated in alloc_msg_with_page_
libceph: fix alloc_msg_with_page_vector() memory leaks
Make it so that CEPH_MSG_DATA_PAGES data item can own pages, fixing a bunch of memory leaks for a page vector allocated in alloc_msg_with_page_vector(). Currently, only watch-notify messages trigger this allocation, and normally the page vector is freed either in handle_watch_notify() or by the caller of ceph_osdc_notify(). But if the message is freed before that (e.g. if the session faults while reading in the message or if the notify is stale), we leak the page vector.
This was supposed to be fixed by switching to a message-owned pagelist, but that never happened.
Fixes: 1907920324f1 ("libceph: support for sending notifies") Reported-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Roman Penyaev <rpenyaev@suse.de>
show more ...
|
#
78beb0ff |
| 08-Jan-2020 |
Luis Henriques <lhenriques@suse.com> |
ceph: use copy-from2 op in copy_file_range
Instead of using the copy-from operation, switch copy_file_range to the new copy-from2 operation, which allows to send the truncate_seq and truncate_size p
ceph: use copy-from2 op in copy_file_range
Instead of using the copy-from operation, switch copy_file_range to the new copy-from2 operation, which allows to send the truncate_seq and truncate_size parameters.
If an OSD does not support the copy-from2 operation it will return -EOPNOTSUPP. In that case, the kernel client will stop trying to do remote object copies for this fs client and will always use the generic VFS copy_file_range.
Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.2.11, v5.2.10 |
|
#
8edf84ba |
| 21-Aug-2019 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: drop unused con parameter of calc_target()
This bit was omitted from a561372405cf ("libceph: fix PG split vs OSD (re)connect race") to avoid backport conflicts.
Signed-off-by: Ilya Dryomov
libceph: drop unused con parameter of calc_target()
This bit was omitted from a561372405cf ("libceph: fix PG split vs OSD (re)connect race") to avoid backport conflicts.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1, v5.2 |
|
#
4766815b |
| 03-Jul-2019 |
David Disseldorp <ddiss@suse.de> |
libceph: handle OSD op ceph_pagelist_append() errors
osd_req_op_cls_init() and osd_req_op_xattr_init() currently propagate ceph_pagelist_alloc() ENOMEM errors but ignore ceph_pagelist_append() memor
libceph: handle OSD op ceph_pagelist_append() errors
osd_req_op_cls_init() and osd_req_op_xattr_init() currently propagate ceph_pagelist_alloc() ENOMEM errors but ignore ceph_pagelist_append() memory allocation failures. Add these checks and cleanup on error.
Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
2cef0ba8 |
| 25-Jul-2019 |
Yan, Zheng <zyan@redhat.com> |
libceph: add function that clears osd client's abort_err
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
120a75ea |
| 25-Jul-2019 |
Yan, Zheng <zyan@redhat.com> |
libceph: add function that reset client's entity addr
This function also re-open connections to OSD/MON, and re-send in-flight OSD requests after re-opening connections to OSD.
Signed-off-by: "Yan,
libceph: add function that reset client's entity addr
This function also re-open connections to OSD/MON, and re-send in-flight OSD requests after re-opening connections to OSD.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
a5613724 |
| 20-Aug-2019 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: fix PG split vs OSD (re)connect race
We can't rely on ->peer_features in calc_target() because it may be called both when the OSD session is established and open and when it's not. ->peer_
libceph: fix PG split vs OSD (re)connect race
We can't rely on ->peer_features in calc_target() because it may be called both when the OSD session is established and open and when it's not. ->peer_features is not valid unless the OSD session is open. If this happens on a PG split (pg_num increase), that could mean we don't resend a request that should have been resent, hanging the client indefinitely.
In userspace this was fixed by looking at require_osd_release and get_xinfo[osd].features fields of the osdmap. However these fields belong to the OSD section of the osdmap, which the kernel doesn't decode (only the client section is decoded).
Instead, let's drop this feature check. It effectively checks for luminous, so only pre-luminous OSDs would be affected in that on a PG split the kernel might resend a request that should not have been resent. Duplicates can occur in other scenarios, so both sides should already be prepared for them: see dup/replay logic on the OSD side and retry_attempt check on the client side.
Cc: stable@vger.kernel.org Fixes: 7de030d6b10a ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT") Link: https://tracker.ceph.com/issues/41162 Reported-by: Jerry Lee <leisurelysw24@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Jerry Lee <leisurelysw24@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
show more ...
|
Revision tags: v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10 |
|
#
4cf3e6df |
| 14-Jun-2019 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: export osd_req_op_data() macro
We already have one exported wrapper around it for extent.osd_data and rbd_object_map_update_finish() needs another one for cls.request_data.
Signed-off-by:
libceph: export osd_req_op_data() macro
We already have one exported wrapper around it for extent.osd_data and rbd_object_map_update_finish() needs another one for cls.request_data.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Jeff Layton <jlayton@kernel.org>
show more ...
|
#
68ada915 |
| 14-Jun-2019 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: change ceph_osdc_call() to take page vector for response
This will be used for loading object map. rbd_obj_read_sync() isn't suitable because object map must be accessed through class meth
libceph: change ceph_osdc_call() to take page vector for response
This will be used for loading object map. rbd_obj_read_sync() isn't suitable because object map must be accessed through class methods.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Jeff Layton <jlayton@kernel.org>
show more ...
|
#
94e85771 |
| 08-Jul-2019 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: rename r_unsafe_item to r_private_item
This list item remained from when we had safe and unsafe replies (commit vs ack). It has since become a private list item for use by clients.
Signed
libceph: rename r_unsafe_item to r_private_item
This list item remained from when we had safe and unsafe replies (commit vs ack). It has since become a private list item for use by clients.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.1.9, v5.1.8 |
|
#
51fc7ab4 |
| 04-Jun-2019 |
Jeff Layton <jlayton@kernel.org> |
libceph: fix watch_item_t decoding to use ceph_decode_entity_addr
While we're in there, let's also fix up the decoder to do proper bounds checking.
Signed-off-by: Jeff Layton <jlayton@kernel.org> R
libceph: fix watch_item_t decoding to use ceph_decode_entity_addr
While we're in there, let's also fix up the decoder to do proper bounds checking.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.1.7, v5.1.6, v5.1.5, v5.1.4, v5.1.3, v5.1.2, v5.1.1, v5.0.14 |
|
#
b726ec97 |
| 06-May-2019 |
Jeff Layton <jlayton@kernel.org> |
libceph: make ceph_pr_addr take an struct ceph_entity_addr pointer
GCC9 is throwing a lot of warnings about unaligned accesses by callers of ceph_pr_addr. All of the current callers are passing a po
libceph: make ceph_pr_addr take an struct ceph_entity_addr pointer
GCC9 is throwing a lot of warnings about unaligned accesses by callers of ceph_pr_addr. All of the current callers are passing a pointer to the sockaddr inside struct ceph_entity_addr.
Fix it to take a pointer to a struct ceph_entity_addr instead, and then have the function make a copy of the sockaddr before printing it.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6, v5.0.5 |
|
#
d75f773c |
| 25-Mar-2019 |
Sakari Ailus <sakari.ailus@linux.intel.com> |
treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
%pF and %pf are functionally equivalent to %pS and %ps conversion specifiers. The former are deprecated, therefore switch
treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
%pF and %pf are functionally equivalent to %pS and %ps conversion specifiers. The former are deprecated, therefore switch the current users to use the preferred variant.
The changes have been produced by the following command:
git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \ while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done
And verifying the result.
Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: xen-devel@lists.xenproject.org Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvdimm@lists.01.org Cc: linux-pci@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-btrfs@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: linux-mm@kvack.org Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: David Sterba <dsterba@suse.com> (for btrfs) Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c) Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci) Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
show more ...
|
Revision tags: v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25, v4.19.24, v4.19.23, v4.19.22, v4.19.21, v4.19.20, v4.19.19, v4.19.18, v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11 |
|
#
02b2f549 |
| 18-Dec-2018 |
Dongsheng Yang <dongsheng.yang@easystack.cn> |
libceph: allow setting abort_on_full for rbd
Introduce a new option abort_on_full, default to false. Then we can get -ENOSPC when the pool is full, or reaches quota.
[ Don't show abort_on_full in /
libceph: allow setting abort_on_full for rbd
Introduce a new option abort_on_full, default to false. Then we can get -ENOSPC when the pool is full, or reaches quota.
[ Don't show abort_on_full in /proc/mounts. ]
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v4.19.10, v4.19.9, v4.19.8, v4.19.7, v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15 |
|
#
23ddf9be |
| 15-Oct-2018 |
Luis Henriques <lhenriques@suse.com> |
libceph: support the RADOS copy-from operation
Add support for performing remote object copies using the 'copy-from' operation.
[ Add COPY_FROM to get_num_data_items(). ]
Signed-off-by: Luis Henri
libceph: support the RADOS copy-from operation
Add support for performing remote object copies using the 'copy-from' operation.
[ Add COPY_FROM to get_num_data_items(). ]
Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|