#
7cca78c9 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: replace ceph_monc_request_next_osdmap()
... with a wrapper around maybe_request_map() - no need for two osdmap-specific functions.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
4609245e |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: pool deletion detection
This adds the "map check" infrastructure for sending osdmap version checks on CALC_TARGET_POOL_DNE and completing in-flight requests with -ENOENT if the target pool
libceph: pool deletion detection
This adds the "map check" infrastructure for sending osdmap version checks on CALC_TARGET_POOL_DNE and completing in-flight requests with -ENOENT if the target pool doesn't exist or has just been deleted.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
b07d3c4b |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: support for checking on status of watch
Implement ceph_osdc_watch_check() to be able to check on status of watch. Note that the time it takes for a watch/notify event to get delivered thro
libceph: support for checking on status of watch
Implement ceph_osdc_watch_check() to be able to check on status of watch. Note that the time it takes for a watch/notify event to get delivered through the notify_wq is taken into account.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
19079203 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: support for sending notifies
Implement ceph_osdc_notify() for sending notifies.
Due to the fact that the current messenger can't do read-in into pagelists (it can only do write-out from th
libceph: support for sending notifies
Implement ceph_osdc_notify() for sending notifies.
Due to the fact that the current messenger can't do read-in into pagelists (it can only do write-out from them), I had to go with a page vector for a NOTIFY_COMPLETE payload, for now.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
922dab61 |
| 25-May-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph, rbd: ceph_osd_linger_request, watch/notify v2
This adds support and switches rbd to a new, more reliable version of watch/notify protocol. As with the OSD client update, this is mostly abo
libceph, rbd: ceph_osd_linger_request, watch/notify v2
This adds support and switches rbd to a new, more reliable version of watch/notify protocol. As with the OSD client update, this is mostly about getting the right structures linked into the right places so that reconnects are properly sent when needed. watch/notify v2 also requires sending regular pings to the OSDs - send_linger_ping().
A major change from the old watch/notify implementation is the introduction of ceph_osd_linger_request - linger requests no longer piggy back on ceph_osd_request. ceph_osd_event has been merged into ceph_osd_linger_request.
All the details are now hidden within libceph, the interface consists of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack(). ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep the lifetime management simple.
ceph_osdc_notify_ack() accepts an optional data payload, which is relayed back to the notifier.
Portions of this patch are loosely based on work by Douglas Fuller <dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
42b06965 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: wait_request_timeout()
The unwatch timeout is currently implemented in rbd. With watch/unwatch code moving into libceph, we are going to need a ceph_osdc_wait_request() variant with a time
libceph: wait_request_timeout()
The unwatch timeout is currently implemented in rbd. With watch/unwatch code moving into libceph, we are going to need a ceph_osdc_wait_request() variant with a timeout.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
3540bfdb |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: request_init() and request_release_checks()
These are going to be used by request_reinit() code.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
5aea3dcd |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: a major OSD client update
This is a major sync up, up to ~Jewel. The highlights are:
- per-session request trees (vs a global per-client tree) - per-session locking (vs a global per-clien
libceph: a major OSD client update
This is a major sync up, up to ~Jewel. The highlights are:
- per-session request trees (vs a global per-client tree) - per-session locking (vs a global per-client rwlock) - homeless OSD session - no ad-hoc global per-client lists - support for pool quotas - foundation for watch/notify v2 support - foundation for map check (pool deletion detection) support
The switchover is incomplete: lingering requests can be setup and teared down but aren't ever reestablished. This functionality is restored with the introduction of the new lingering infrastructure (ceph_osd_linger_request, linger_work, etc) in a later commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
9dd2845c |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: protect osdc->osd_lru list with a spinlock
OSD client is getting moved from the big per-client lock to a set of per-session locks. The big rwlock would only be held for read most of the ti
libceph: protect osdc->osd_lru list with a spinlock
OSD client is getting moved from the big per-client lock to a set of per-session locks. The big rwlock would only be held for read most of the time, so a global osdc->osd_lru needs additional protection.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
7a28f59b |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: allocate ceph_osd with GFP_NOFAIL
create_osd() is called way too deep in the stack to be able to error out in a sane way; a failing create_osd() just messes everything up. The current req_n
libceph: allocate ceph_osd with GFP_NOFAIL
create_osd() is called way too deep in the stack to be able to error out in a sane way; a failing create_osd() just messes everything up. The current req_notarget list solution is broken - the list is never traversed as it's not entirely clear when to do it, I guess.
If we were to start traversing it at regular intervals and retrying each request, we wouldn't be far off from what __GFP_NOFAIL is doing, so allocate OSD sessions with __GFP_NOFAIL, at least until we come up with a better fix.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
0247a0cf |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: osd_init() and osd_cleanup()
These are going to be used by homeless OSD sessions code.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
42c1b124 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: handle_one_map()
Separate osdmap handling from decoding and iterating over a bag of maps in a fresh MOSDMap message. This sets up the scene for the updated OSD client.
Of particular impor
libceph: handle_one_map()
Separate osdmap handling from decoding and iterating over a bag of maps in a fresh MOSDMap message. This sets up the scene for the updated OSD client.
Of particular importance here is the addition of pi->was_full, which can be used to answer "did this pool go full -> not-full in this map?". This is the key bit for supporting pool quotas.
We won't be able to downgrade map_sem for much longer, so drop downgrade_write().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
e5253a7b |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: allocate dummy osdmap in ceph_osdc_init()
This leads to a simpler osdmap handling code, particularly when dealing with pi->was_full, which is introduced in a later commit.
Signed-off-by: I
libceph: allocate dummy osdmap in ceph_osdc_init()
This leads to a simpler osdmap handling code, particularly when dealing with pi->was_full, which is introduced in a later commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
fbca9635 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: schedule tick from ceph_osdc_init()
Both homeless OSD sessions and watch/notify v2, introduced in later commits, require periodic ticks which don't depend on ->num_requests. Schedule the in
libceph: schedule tick from ceph_osdc_init()
Both homeless OSD sessions and watch/notify v2, introduced in later commits, require periodic ticks which don't depend on ->num_requests. Schedule the initial tick from ceph_osdc_init() and reschedule from handle_timeout() unconditionally.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
b37ee1b9 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: move schedule_delayed_work() in ceph_osdc_init()
ceph_osdc_stop() isn't called if ceph_osdc_init() fails, so we end up with handle_osds_timeout() running on invalid memory if any one of the
libceph: move schedule_delayed_work() in ceph_osdc_init()
ceph_osdc_stop() isn't called if ceph_osdc_init() fails, so we end up with handle_osds_timeout() running on invalid memory if any one of the allocations fails. Call schedule_delayed_work() after everything is setup, just before returning.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
fe5da05e |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: redo callbacks and factor out MOSDOpReply decoding
If you specify ACK | ONDISK and set ->r_unsafe_callback, both ->r_callback and ->r_unsafe_callback(true) are called on ack. This is very
libceph: redo callbacks and factor out MOSDOpReply decoding
If you specify ACK | ONDISK and set ->r_unsafe_callback, both ->r_callback and ->r_unsafe_callback(true) are called on ack. This is very confusing. Redo this so that only one of them is called:
->r_unsafe_callback(true), on ack ->r_unsafe_callback(false), on commit
or
->r_callback, on ack|commit
Decode everything in decode_MOSDOpReply() to reduce clutter.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
85e084fe |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: drop msg argument from ceph_osdc_callback_t
finish_read(), its only user, uses it to get to hdr.data_len, which is what ->r_result is set to on success. This gains us the ability to safely
libceph: drop msg argument from ceph_osdc_callback_t
finish_read(), its only user, uses it to get to hdr.data_len, which is what ->r_result is set to on success. This gains us the ability to safely call callbacks from contexts other than reply, e.g. map check.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
bb873b53 |
| 25-May-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: switch to calc_target(), part 2
The crux of this is getting rid of ceph_osdc_build_request(), so that MOSDOp can be encoded not before but after calc_target() calculates the actual target.
libceph: switch to calc_target(), part 2
The crux of this is getting rid of ceph_osdc_build_request(), so that MOSDOp can be encoded not before but after calc_target() calculates the actual target. Encoding now happens within ceph_osdc_start_request().
Also nuked is the accompanying bunch of pointers into the encoded buffer that was used to update fields on each send - instead, the entire front is re-encoded. If we want to support target->name_len != base->name_len in the future, there is no other way, because oid is surrounded by other fields in the encoded buffer.
Encoding OSD ops and adding data items to the request message were mixed together in osd_req_encode_op(). While we want to re-encode OSD ops, we don't want to add duplicate data items to the message when resending, so all call to ceph_osdc_msg_data_add() are factored out into a new setup_request_data().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
a66dd383 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: switch to calc_target(), part 1
Replace __calc_request_pg() and most of __map_request() with calc_target() and start using req->r_t.
ceph_osdc_build_request() however still encodes base_oi
libceph: switch to calc_target(), part 1
Replace __calc_request_pg() and most of __map_request() with calc_target() and start using req->r_t.
ceph_osdc_build_request() however still encodes base_oid, because it's called before calc_target() is and target_oid is empty at that point in time; a printf in osdc_show() also shows base_oid. This is fixed in "libceph: switch to calc_target(), part 2".
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
63244fa1 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: introduce ceph_osd_request_target, calc_target()
Introduce ceph_osd_request_target, containing all mapping-related fields of ceph_osd_request and calc_target() for calculating mappings and
libceph: introduce ceph_osd_request_target, calc_target()
Introduce ceph_osd_request_target, containing all mapping-related fields of ceph_osd_request and calc_target() for calculating mappings and populating it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
6f3bfd45 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: ceph_osds, ceph_pg_to_up_acting_osds()
Knowning just acting set isn't enough, we need to be able to record up set as well to detect interval changes. This means returning (up[], up_len, up
libceph: ceph_osds, ceph_pg_to_up_acting_osds()
Knowning just acting set isn't enough, we need to be able to record up set as well to detect interval changes. This means returning (up[], up_len, up_primary, acting[], acting_len, acting_primary) and passing it around. Introduce and switch to ceph_osds to help with that.
Rename ceph_calc_pg_acting() to ceph_pg_to_up_acting_osds() and return both up and acting sets from it.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
d9591f5e |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: rename ceph_oloc_oid_to_pg()
Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg(). Emphasise that returned is raw PG and return -ENOENT instead of -EIO if the pool doesn't exist.
S
libceph: rename ceph_oloc_oid_to_pg()
Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg(). Emphasise that returned is raw PG and return -ENOENT instead of -EIO if the pool doesn't exist.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
fcd00b68 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: DEFINE_RB_FUNCS macro
Given
struct foo { u64 id; struct rb_node bar_node; };
generate insert_bar(), erase_bar() and lookup_bar() functions with
DEFINE_RB_FUNC
libceph: DEFINE_RB_FUNCS macro
Given
struct foo { u64 id; struct rb_node bar_node; };
generate insert_bar(), erase_bar() and lookup_bar() functions with
DEFINE_RB_FUNCS(bar, struct foo, id, bar_node)
The key is assumed to be an integer (u64, int, etc), compared with < and >. nodefld has to be initialized with RB_CLEAR_NODE().
Start using it for MDS, MON and OSD requests and OSD sessions.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
42a2c09f |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: open-code remove_{all,old}_osds()
They are called only once, from ceph_osdc_stop() and handle_osds_timeout() respectively.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
0c0a8de1 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: nuke unused fields and functions
Either unused or useless:
osdmap->mkfs_epoch osd->o_marked_for_keepalive monc->num_generic_requests osdc->map_waiters osdc->last_reques
libceph: nuke unused fields and functions
Either unused or useless:
osdmap->mkfs_epoch osd->o_marked_for_keepalive monc->num_generic_requests osdc->map_waiters osdc->last_requested_map osdc->timeout_tid
osd_req_op_cls_response_data()
osdmap_apply_incremental() @msgr arg
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|