Revision tags: v3.15, v3.15-rc8, v3.15-rc7, v3.15-rc6, v3.15-rc5, v3.15-rc4, v3.15-rc3, v3.15-rc2, v3.15-rc1, v3.14, v3.14-rc8 |
|
#
8008ab10 |
| 24-Mar-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: return primary from ceph_calc_pg_acting()
In preparation for adding support for primary_temp, stop assuming primaryness: add a primary out parameter to ceph_calc_pg_acting() and change call
libceph: return primary from ceph_calc_pg_acting()
In preparation for adding support for primary_temp, stop assuming primaryness: add a primary out parameter to ceph_calc_pg_acting() and change call sites accordingly. Primary is now specified separately from the order of osds in the set.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
show more ...
|
Revision tags: v3.14-rc7 |
|
#
a2505d63 |
| 13-Mar-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: split osdmap allocation and decode steps
Split osdmap allocation and initialization into a separate function, ceph_osdmap_decode().
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> R
libceph: split osdmap allocation and decode steps
Split osdmap allocation and initialization into a separate function, ceph_osdmap_decode().
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
show more ...
|
Revision tags: v3.14-rc6, v3.14-rc5 |
|
#
c647b8a8 |
| 25-Feb-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op
This is primarily for rbd's benefit and is supposed to combat fragmentation:
"... knowing that rbd images have a 4m size, librbd can pass a
libceph: add support for CEPH_OSD_OP_SETALLOCHINT osd op
This is primarily for rbd's benefit and is supposed to combat fragmentation:
"... knowing that rbd images have a 4m size, librbd can pass a hint that will let the osd do the xfs allocation size ioctl on new files so that they are allocated in 1m or 4m chunks. We've seen cases where users with rbd workloads have very high levels of fragmentation in xfs and this would mitigate that and probably have a pretty nice performance benefit."
SETALLOCHINT is considered advisory, so our backwards compatibility mechanism here is to set FAILOK flag for all SETALLOCHINT ops.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
show more ...
|
#
7b25bf5f |
| 25-Feb-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: encode CEPH_OSD_OP_FLAG_* op flags
Encode ceph_osd_op::flags field so that it gets sent over the wire.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@i
libceph: encode CEPH_OSD_OP_FLAG_* op flags
Encode ceph_osd_op::flags field so that it gets sent over the wire.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
show more ...
|
Revision tags: v3.14-rc4, v3.14-rc3 |
|
#
2045ceae |
| 12-Feb-2014 |
stephen hemminger <stephen@networkplumber.org> |
net: remove unnecessary return's
One of my pet coding style peeves is the practice of adding extra return; at the end of function. Kill several instances of this in network code.
I suppose some coc
net: remove unnecessary return's
One of my pet coding style peeves is the practice of adding extra return; at the end of function. Kill several instances of this in network code.
I suppose some coccinelle wizardy could do this automatically.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.14-rc2 |
|
#
ff513ace |
| 03-Feb-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: take map_sem for read in handle_reply()
Handling redirect replies requires both map_sem and request_mutex. Taking map_sem unconditionally near the top of handle_reply() avoids possible race
libceph: take map_sem for read in handle_reply()
Handling redirect replies requires both map_sem and request_mutex. Taking map_sem unconditionally near the top of handle_reply() avoids possible race conditions that arise from releasing request_mutex to be able to acquire map_sem in redirect reply case. (Lock ordering is: map_sem, request_mutex, crush_mutex.)
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
Revision tags: v3.14-rc1 |
|
#
0bbfdfe8 |
| 31-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: factor out logic from ceph_osdc_start_request()
Factor out logic from ceph_osdc_start_request() into a new helper, __ceph_osdc_start_request(). ceph_osdc_start_request() now amounts to tak
libceph: factor out logic from ceph_osdc_start_request()
Factor out logic from ceph_osdc_start_request() into a new helper, __ceph_osdc_start_request(). ceph_osdc_start_request() now amounts to taking locks and calling __ceph_osdc_start_request().
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
#
c172ec5c |
| 31-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: fix error handling in ceph_osdc_init()
msgpool_op_reply message pool isn't destroyed if workqueue construction fails. Fix it.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Review
libceph: fix error handling in ceph_osdc_init()
msgpool_op_reply message pool isn't destroyed if workqueue construction fails. Fix it.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
#
205ee118 |
| 27-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: follow redirect replies from osds
Follow redirect replies from osds, for details see ceph.git commit fbbe3ad1220799b7bb00ea30fce581c5eadaf034.
v1 (current) version of redirect reply consis
libceph: follow redirect replies from osds
Follow redirect replies from osds, for details see ceph.git commit fbbe3ad1220799b7bb00ea30fce581c5eadaf034.
v1 (current) version of redirect reply consists of oloc and oid, which expands to pool, key, nspace, hash and oid. However, server-side code that would populate anything other than pool doesn't exist yet, and hence this commit adds support for pool redirects only. To make sure that future server-side updates don't break us, we decode all fields and, if any of key, nspace, hash or oid have a non-default value, error out with "corrupt osd_op_reply ..." message.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
#
3c972c95 |
| 27-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}
Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before introducing r_target_{oloc,oid} needed for redirects.
Signed-of
libceph: rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid}
Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before introducing r_target_{oloc,oid} needed for redirects.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
#
17a13e40 |
| 27-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: follow {read,write}_tier fields on osd request submission
Overwrite ceph_osd_request::r_oloc.pool with read_tier for read ops and write_tier for write and read+write ops (aka basic tiering
libceph: follow {read,write}_tier fields on osd request submission
Overwrite ceph_osd_request::r_oloc.pool with read_tier for read ops and write_tier for write and read+write ops (aka basic tiering support). {read,write}_tier are part of pg_pool_t since v9. This commit bumps our pg_pool_t decode compat version from v7 to v9, all new fields except for {read,write}_tier are ignored.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
#
7c13cb64 |
| 27-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()
Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename it to ceph_oloc_oid_to_pg() to make its purpose more clear.
Si
libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()
Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename it to ceph_oloc_oid_to_pg() to make its purpose more clear.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
#
4295f221 |
| 27-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: introduce and start using oid abstraction
In preparation for tiering support, which would require having two (base and target) object names for each osd request and also copying those names
libceph: introduce and start using oid abstraction
In preparation for tiering support, which would require having two (base and target) object names for each osd request and also copying those names around, introduce struct ceph_object_id (oid) and a couple helpers to facilitate those copies and encapsulate the fact that object name is not necessarily a NUL-terminated string.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
#
2d0ebc5d |
| 27-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN
In preparation for adding oid abstraction, rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN.
Signed-off-by: Ilya Dryomov <ilya.dryomov@in
libceph: rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN
In preparation for adding oid abstraction, rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
#
22116525 |
| 27-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: start using oloc abstraction
Instead of relying on pool fields in ceph_file_layout (for mapping) and ceph_pg (for enconding), start using ceph_object_locator (oloc) abstraction. Note that
libceph: start using oloc abstraction
Instead of relying on pool fields in ceph_file_layout (for mapping) and ceph_pg (for enconding), start using ceph_object_locator (oloc) abstraction. Note that userspace oloc currently consists of pool, key, nspace and hash fields, while this one contains only a pool. This is OK, because at this point we only send (i.e. encode) olocs and never have to receive (i.e. decode) them.
This makes keeping a copy of ceph_file_layout in every osd request unnecessary, so ceph_osd_request::r_file_layout field is nuked.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
Revision tags: v3.13 |
|
#
0b4af2e8 |
| 16-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: dout() is missing a newline
Add a missing newline to a dout() in __reset_osd().
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
|
Revision tags: v3.13-rc8 |
|
#
f2be82b0 |
| 09-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: fix preallocation check in get_reply()
The check that makes sure that we have enough memory allocated to read in the entire header of the message in question is currently busted. It compare
libceph: fix preallocation check in get_reply()
The check that makes sure that we have enough memory allocated to read in the entire header of the message in question is currently busted. It compares front_len of the incoming message with iov_len field of ceph_msg::front structure, which is used primarily to indicate the amount of data already read in, and not the size of the allocated buffer. Under certain conditions (e.g. a short read from a socket followed by that socket's shutdown and owning ceph_connection reset) this results in a warning similar to
[85688.975866] libceph: get_reply front 198 > preallocated 122 (4#0)
and, through another bug, leads to forever hung tasks and forced reboots. Fix this by comparing front_len with front_alloc_len field of struct ceph_msg, which stores the actual size of the buffer.
Fixes: http://tracker.ceph.com/issues/5425
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
#
3f0a4ac5 |
| 09-Jan-2014 |
Ilya Dryomov <ilya.dryomov@inktank.com> |
libceph: rename front to front_len in get_reply()
Rename front local variable to front_len in get_reply() to make its purpose more clear.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Revi
libceph: rename front to front_len in get_reply()
Rename front local variable to front_len in get_reply() to make its purpose more clear.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
Revision tags: v3.13-rc7, v3.13-rc6, v3.13-rc5, v3.13-rc4 |
|
#
9a1ea2db |
| 10-Dec-2013 |
Josh Durgin <josh.durgin@inktank.com> |
libceph: resend all writes after the osdmap loses the full flag
With the current full handling, there is a race between osds and clients getting the first map marked full. If the osd wins, it will r
libceph: resend all writes after the osdmap loses the full flag
With the current full handling, there is a race between osds and clients getting the first map marked full. If the osd wins, it will return -ENOSPC to any writes, but the client may already have writes in flight. This results in the client getting the error and propagating it up the stack. For rbd, the block layer turns this into EIO, which can cause corruption in filesystems above it.
To avoid this race, osds are being changed to drop writes that came from clients with an osdmap older than the last osdmap marked full. In order for this to work, clients must resend all writes after they encounter a full -> not full transition in the osdmap. osds will wait for an updated map instead of processing a request from a client with a newer map, so resent writes will not be dropped by the osd unless there is another not full -> full transition.
This approach requires both osds and clients to be fixed to avoid the race. Old clients talking to osds with this fix may hang instead of returning EIO and potentially corrupting an fs. New clients talking to old osds have the same behavior as before if they encounter this race.
Fixes: http://tracker.ceph.com/issues/6938
Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
show more ...
|
Revision tags: v3.13-rc3 |
|
#
d29adb34 |
| 02-Dec-2013 |
Josh Durgin <josh.durgin@inktank.com> |
libceph: block I/O when PAUSE or FULL osd map flags are set
The PAUSEWR and PAUSERD flags are meant to stop the cluster from processing writes and reads, respectively. The FULL flag is set when the
libceph: block I/O when PAUSE or FULL osd map flags are set
The PAUSEWR and PAUSERD flags are meant to stop the cluster from processing writes and reads, respectively. The FULL flag is set when the cluster determines that it is out of space, and will no longer process writes. PAUSEWR and PAUSERD are purely client-side settings already implemented in userspace clients. The osd does nothing special with these flags.
When the FULL flag is set, however, the osd responds to all writes with -ENOSPC. For cephfs, this makes sense, but for rbd the block layer translates this into EIO. If a cluster goes from full to non-full quickly, a filesystem on top of rbd will not behave well, since some writes succeed while others get EIO.
Fix this by blocking any writes when the FULL flag is set in the osd client. This is the same strategy used by userspace, so apply it by default. A follow-on patch makes this configurable.
__map_request() is called to re-target osd requests in case the available osds changed. Add a paused field to a ceph_osd_request, and set it whenever an appropriate osd map flag is set. Avoid queueing paused requests in __map_request(), but force them to be resent if they become unpaused.
Also subscribe to the next osd map from the monitor if any of these flags are set, so paused requests can be unblocked as soon as possible.
Fixes: http://tracker.ceph.com/issues/6079
Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
show more ...
|
Revision tags: v3.13-rc2 |
|
#
37c89bde |
| 27-Nov-2013 |
Li Wang <liwang@ubuntukylin.com> |
ceph: Add necessary clean up if invalid reply received in handle_reply()
Wake up possible waiters, invoke the call back if any, unregister the request
Signed-off-by: Li Wang <liwang@ubuntukylin.com
ceph: Add necessary clean up if invalid reply received in handle_reply()
Wake up possible waiters, invoke the call back if any, unregister the request
Signed-off-by: Li Wang <liwang@ubuntukylin.com> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Signed-off-by: Sage Weil <sage@inktank.com>
show more ...
|
Revision tags: v3.13-rc1, v3.12, v3.12-rc7, v3.12-rc6, v3.12-rc5, v3.12-rc4, v3.12-rc3, v3.12-rc2, v3.12-rc1, v3.11 |
|
#
dd935f44 |
| 28-Aug-2013 |
Josh Durgin <josh.durgin@inktank.com> |
libceph: add function to ensure notifies are complete
Without a way to flush the osd client's notify workqueue, a watch event that is unregistered could continue receiving callbacks indefinitely.
U
libceph: add function to ensure notifies are complete
Without a way to flush the osd client's notify workqueue, a watch event that is unregistered could continue receiving callbacks indefinitely.
Unregistering the event simply means no new notifies are added to the queue, but there may still be events in the queue that will call the watch callback for the event. If the queue is flushed after the event is unregistered, the caller can be sure no more watch callbacks will occur for the canceled watch.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
show more ...
|
Revision tags: v3.11-rc7, v3.11-rc6 |
|
#
dbcae088 |
| 15-Aug-2013 |
Dan Carpenter <dan.carpenter@oracle.com> |
libceph: create_singlethread_workqueue() doesn't return ERR_PTRs
create_singlethread_workqueue() returns NULL on error, and it doesn't return ERR_PTRs.
I tweaked the error handling a little to be c
libceph: create_singlethread_workqueue() doesn't return ERR_PTRs
create_singlethread_workqueue() returns NULL on error, and it doesn't return ERR_PTRs.
I tweaked the error handling a little to be consistent with earlier in the function.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
#
b72e19b9 |
| 15-Aug-2013 |
Dan Carpenter <dan.carpenter@oracle.com> |
libceph: potential NULL dereference in ceph_osdc_handle_map()
There are two places where we read "nr_maps" if both of them are set to zero then we would hit a NULL dereference here.
Signed-off-by:
libceph: potential NULL dereference in ceph_osdc_handle_map()
There are two places where we read "nr_maps" if both of them are set to zero then we would hit a NULL dereference here.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|
#
18741196 |
| 15-Aug-2013 |
Dan Carpenter <dan.carpenter@oracle.com> |
libceph: fix error handling in handle_reply()
We've tried to fix the error paths in this function before, but there is still a hidden goto in the ceph_decode_need() macro which goes to the wrong pla
libceph: fix error handling in handle_reply()
We've tried to fix the error paths in this function before, but there is still a hidden goto in the ceph_decode_need() macro which goes to the wrong place. We need to release the "req" and unlock a mutex before returning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Sage Weil <sage@inktank.com>
show more ...
|