#
a02a946d |
| 19-Jun-2017 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: respect RADOS_BACKOFF backoffs Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
df28152d |
| 15-Jun-2017 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: avoid unnecessary pi lookups in calc_target() Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
7de030d6 |
| 15-Jun-2017 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: resend on PG splits if OSD has RESEND_ON_SPLIT Note that ceph_osd_request_target fields are updated regardless of RESEND_ON_SPLIT. Signed-off-by: Ilya Dryomov <idryomov
libceph: resend on PG splits if OSD has RESEND_ON_SPLIT Note that ceph_osd_request_target fields are updated regardless of RESEND_ON_SPLIT. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
dc98ff72 |
| 15-Jun-2017 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: introduce ceph_spg, ceph_pg_to_primary_shard() Store both raw pgid and actual spgid in ceph_osd_request_target. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
8e48cf00 |
| 05-Jun-2017 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: new pi->last_force_request_resend The old (v15) pi->last_force_request_resend has been repurposed to make pre-RESEND_ON_SPLIT clients that don't check for PG splits but do o
libceph: new pi->last_force_request_resend The old (v15) pi->last_force_request_resend has been repurposed to make pre-RESEND_ON_SPLIT clients that don't check for PG splits but do obey pi->last_force_request_resend resend on splits. See ceph.git commit 189ca7ec6420 ("mon/OSDMonitor: make pre-luminous clients resend ops on split"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
ca35ffea |
| 05-Jun-2017 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: handle non-empty dest in ceph_{oloc,oid}_copy() Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
293dffaa |
| 23-May-2017 |
Dan Carpenter <dan.carpenter@oracle.com> |
libceph: NULL deref on crush_decode() error path If there is not enough space then ceph_decode_32_safe() does a goto bad. We need to return an error code in that situation. The current
libceph: NULL deref on crush_decode() error path If there is not enough space then ceph_decode_32_safe() does a goto bad. We need to return an error code in that situation. The current code returns ERR_PTR(0) which is NULL. The callers are not expecting that and it results in a NULL dereference. Fixes: f24e9980eb86 ("ceph: OSD client") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v4.10.17, v4.10.16, v4.10.15, v4.10.14, v4.10.13, v4.10.12, v4.10.11, v4.10.10, v4.10.9, v4.10.8, v4.10.7, v4.10.6, v4.10.5, v4.10.4, v4.10.3, v4.10.2 |
|
#
b581a585 |
| 01-Mar-2017 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: don't set weight to IN when OSD is destroyed Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info, osd_xinfo on osd deletion"), weight is set to IN when OSD is delete
libceph: don't set weight to IN when OSD is destroyed Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info, osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted. This changes the result of applying an incremental for clients, not just OSDs. Because CRUSH computations are obviously affected, pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on object placement, resulting in misdirected requests. Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f. Fixes: 930c53286977 ("libceph: apply new_state before new_up_client on incrementals") Link: http://tracker.ceph.com/issues/19122 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
show more ...
|
#
9afd30db |
| 28-Feb-2017 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: fix crush_decode() for older maps Older (shorter) CRUSH maps too need to be finalized. Fixes: 66a0e2d579db ("crush: remove mutable part of CRUSH map") Signed-off-by: Il
libceph: fix crush_decode() for older maps Older (shorter) CRUSH maps too need to be finalized. Fixes: 66a0e2d579db ("crush: remove mutable part of CRUSH map") Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v4.10.1, v4.10 |
|
#
ef9324bb |
| 08-Feb-2017 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: don't go through with the mapping if the PG is too wide With EC overwrites maturing, the kernel client will be getting exposed to potentially very wide EC pools. While "min(pi-
libceph: don't go through with the mapping if the PG is too wide With EC overwrites maturing, the kernel client will be getting exposed to potentially very wide EC pools. While "min(pi->size, X)" works fine when the cluster is stable and happy, truncating OSD sets interferes with resend logic (ceph_is_new_interval(), etc). Abort the mapping if the pool is too wide, assigning the request to the homeless session. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
show more ...
|
#
743efcff |
| 31-Jan-2017 |
Ilya Dryomov <idryomov@gmail.com> |
crush: merge working data and scratch Much like Arlo Guthrie, I decided that one big pile is better than two little piles. Reflects ceph.git commit 95c2df6c7e0b22d2ea9d91db500cf
crush: merge working data and scratch Much like Arlo Guthrie, I decided that one big pile is better than two little piles. Reflects ceph.git commit 95c2df6c7e0b22d2ea9d91db500cf8b9441c73ba. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
66a0e2d5 |
| 31-Jan-2017 |
Ilya Dryomov <idryomov@gmail.com> |
crush: remove mutable part of CRUSH map Then add it to the working state. It would be very nice if we didn't have to take a lock to calculate a crush placement. By moving the permuta
crush: remove mutable part of CRUSH map Then add it to the working state. It would be very nice if we didn't have to take a lock to calculate a crush placement. By moving the permutation array into the working data, we can treat the CRUSH map as immutable. Reflects ceph.git commit cbcd039651c0569551cb90d26ce27e1432671f2a. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
1b6a78b5 |
| 31-Jan-2017 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: add osdmap_set_crush() helper Simplify osdmap_decode() and osdmap_apply_incremental() a bit. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
Revision tags: v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31, v4.4.30, v4.4.29, v4.4.28, v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4, v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18, v4.4.17, openbmc-4.4-20160804-1, v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1, openbmc-20160713-1, v4.4.15, v4.6.4, v4.6.3, v4.4.14, v4.6.2, v4.4.13, openbmc-20160606-1, v4.6.1, v4.4.12, openbmc-20160521-1, v4.4.11, openbmc-20160518-1, v4.6, v4.4.10, openbmc-20160511-1, openbmc-20160505-1, v4.4.9, v4.4.8, v4.4.7, openbmc-20160329-2, openbmc-20160329-1, openbmc-20160321-1, v4.4.6, v4.5, v4.4.5, v4.4.4, v4.4.3, openbmc-20160222-1, v4.4.2 |
|
#
30c156d9 |
| 13-Feb-2016 |
Yan, Zheng <zyan@redhat.com> |
libceph: rados pool namespace support Add pool namesapce pointer to struct ceph_file_layout and struct ceph_object_locator. Pool namespace is used by when mapping object to PG, it's
libceph: rados pool namespace support Add pool namesapce pointer to struct ceph_file_layout and struct ceph_object_locator. Pool namespace is used by when mapping object to PG, it's also used when composing OSD request. The namespace pointer in struct ceph_file_layout is RCU protected. So libceph can read namespace without taking lock. Signed-off-by: Yan, Zheng <zyan@redhat.com> [idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: openbmc-20160212-1, openbmc-20160210-1 |
|
#
7627151e |
| 03-Feb-2016 |
Yan, Zheng <zyan@redhat.com> |
libceph: define new ceph_file_layout structure Define new ceph_file_layout structure and rename old ceph_file_layout to ceph_file_layout_legacy. This is preparation for adding namespace
libceph: define new ceph_file_layout structure Define new ceph_file_layout structure and rename old ceph_file_layout to ceph_file_layout_legacy. This is preparation for adding namespace to ceph_file_layout structure. Signed-off-by: Yan, Zheng <zyan@redhat.com>
show more ...
|
#
930c5328 |
| 18-Jul-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: apply new_state before new_up_client on incrementals Currently, osd_weight and osd_state fields are updated in the encoding order. This is wrong, because an incremental map may
libceph: apply new_state before new_up_client on incrementals Currently, osd_weight and osd_state fields are updated in the encoding order. This is wrong, because an incremental map may look like e.g. new_up_client: { osd=6, addr=... } # set osd_state and addr new_state: { osd=6, xorstate=EXISTS } # clear osd_state Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down). After applying new_up_client, osd_state is changed to EXISTS | UP. Carrying on with the new_state update, we flip EXISTS and leave osd6 in a weird "!EXISTS but UP" state. A non-existent OSD is considered down by the mapping code 2087 for (i = 0; i < pg->pg_temp.len; i++) { 2088 if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) { 2089 if (ceph_can_shift_osds(pi)) 2090 continue; 2091 2092 temp->osds[temp->size++] = CRUSH_ITEM_NONE; and so requests get directed to the second OSD in the set instead of the first, resulting in OSD-side errors like: [WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680 and hung rbds on the client: [ 493.566367] rbd: rbd0: write 400000 at 11cc00000 (0) [ 493.566805] rbd: rbd0: result -6 xferred 400000 [ 493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688 The fix is to decouple application from the decoding and: - apply new_weight first - apply new_state before new_up_client - twiddle osd_state flags if marking in - clear out some of the state if osd is destroyed Fixes: http://tracker.ceph.com/issues/14901 Cc: stable@vger.kernel.org # 3.15+: 6dd74e44dc1d: libceph: set 'exists' flag for newly up osd Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>
show more ...
|
#
4a3262b1 |
| 30-May-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: use %s instead of %pE in dout()s Commit d30291b985d1 ("libceph: variable-sized ceph_object_id") changed dout()s in what is now encode_request() and ceph_object_locator_to_pg()
libceph: use %s instead of %pE in dout()s Commit d30291b985d1 ("libceph: variable-sized ceph_object_id") changed dout()s in what is now encode_request() and ceph_object_locator_to_pg() to use %pE, mostly to document that, although all rbd and cephfs object names are NULL-terminated strings, ceph_object_id will handle any RADOS object name, including the one containing NULs, just fine. However, it turns out that vbin_printf() can't handle anything but ints and %s - all %p suffixes are ignored. The buffer %p** points to isn't recorded, resulting in trash in the messages if the buffer had been reused by the time bstr_printf() got to it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
e5253a7b |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: allocate dummy osdmap in ceph_osdc_init() This leads to a simpler osdmap handling code, particularly when dealing with pi->was_full, which is introduced in a later commit.
libceph: allocate dummy osdmap in ceph_osdc_init() This leads to a simpler osdmap handling code, particularly when dealing with pi->was_full, which is introduced in a later commit. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
63244fa1 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: introduce ceph_osd_request_target, calc_target() Introduce ceph_osd_request_target, containing all mapping-related fields of ceph_osd_request and calc_target() for calculating m
libceph: introduce ceph_osd_request_target, calc_target() Introduce ceph_osd_request_target, containing all mapping-related fields of ceph_osd_request and calc_target() for calculating mappings and populating it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
04812acf |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: pi->min_size, pi->last_force_request_resend Add and decode pi->min_size and pi->last_force_request_resend. These are going to be used by calc_target(). Signed-off-by:
libceph: pi->min_size, pi->last_force_request_resend Add and decode pi->min_size and pi->last_force_request_resend. These are going to be used by calc_target(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
f984cb76 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: make pgid_cmp() global calc_target() code is going to need to know how to compare PGs. Take lhs and rhs pgid by const * while at it. Signed-off-by: Ilya Dryomov <idryo
libceph: make pgid_cmp() global calc_target() code is going to need to know how to compare PGs. Take lhs and rhs pgid by const * while at it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
f81f1633 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: rename ceph_calc_pg_primary() Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to emphasise that it returns acting primary. Signed-off-by: Ilya Dryomov <idr
libceph: rename ceph_calc_pg_primary() Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to emphasise that it returns acting primary. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
6f3bfd45 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: ceph_osds, ceph_pg_to_up_acting_osds() Knowning just acting set isn't enough, we need to be able to record up set as well to detect interval changes. This means returning (up[]
libceph: ceph_osds, ceph_pg_to_up_acting_osds() Knowning just acting set isn't enough, we need to be able to record up set as well to detect interval changes. This means returning (up[], up_len, up_primary, acting[], acting_len, acting_primary) and passing it around. Introduce and switch to ceph_osds to help with that. Rename ceph_calc_pg_acting() to ceph_pg_to_up_acting_osds() and return both up and acting sets from it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
d9591f5e |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: rename ceph_oloc_oid_to_pg() Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg(). Emphasise that returned is raw PG and return -ENOENT instead of -EIO if the pool d
libceph: rename ceph_oloc_oid_to_pg() Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg(). Emphasise that returned is raw PG and return -ENOENT instead of -EIO if the pool doesn't exist. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
0c0a8de1 |
| 28-Apr-2016 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: nuke unused fields and functions Either unused or useless: osdmap->mkfs_epoch osd->o_marked_for_keepalive monc->num_generic_requests osdc->map_w
libceph: nuke unused fields and functions Either unused or useless: osdmap->mkfs_epoch osd->o_marked_for_keepalive monc->num_generic_requests osdc->map_waiters osdc->last_requested_map osdc->timeout_tid osd_req_op_cls_response_data() osdmap_apply_incremental() @msgr arg Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|