History log of /openbmc/linux/net/ceph/osdmap.c (Results 101 – 125 of 276)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# a2505d63 13-Mar-2014 Ilya Dryomov <ilya.dryomov@inktank.com>

libceph: split osdmap allocation and decode steps

Split osdmap allocation and initialization into a separate function,
ceph_osdmap_decode().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
R

libceph: split osdmap allocation and decode steps

Split osdmap allocation and initialization into a separate function,
ceph_osdmap_decode().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>

show more ...


# 38a8d560 13-Mar-2014 Ilya Dryomov <ilya.dryomov@inktank.com>

libceph: dump osdmap and enhance output on decode errors

Dump osdmap in hex on both full and incremental decode errors, to make
it easier to match the contents with error offset. dout() map epoch
a

libceph: dump osdmap and enhance output on decode errors

Dump osdmap in hex on both full and incremental decode errors, to make
it easier to match the contents with error offset. dout() map epoch
and max_osd value on success.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>

show more ...


Revision tags: v3.14-rc6, v3.14-rc5, v3.14-rc4, v3.14-rc3, v3.14-rc2, v3.14-rc1
# 9d521470 31-Jan-2014 Ilya Dryomov <ilya.dryomov@inktank.com>

libceph: a per-osdc crush scratch buffer

With the addition of erasure coding support in the future, scratch
variable-length array in crush_do_rule_ary() is going to grow to at
least 200 bytes on ave

libceph: a per-osdc crush scratch buffer

With the addition of erasure coding support in the future, scratch
variable-length array in crush_do_rule_ary() is going to grow to at
least 200 bytes on average, on top of another 128 bytes consumed by
rawosd/osd arrays in the call chain. Replace it with a buffer inside
struct osdmap and a mutex. This shouldn't result in any contention,
because all osd requests were already serialized by request_mutex at
that point; the only unlocked caller was ceph_ioctl_get_dataloc().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

show more ...


# 17a13e40 27-Jan-2014 Ilya Dryomov <ilya.dryomov@inktank.com>

libceph: follow {read,write}_tier fields on osd request submission

Overwrite ceph_osd_request::r_oloc.pool with read_tier for read ops and
write_tier for write and read+write ops (aka basic tiering

libceph: follow {read,write}_tier fields on osd request submission

Overwrite ceph_osd_request::r_oloc.pool with read_tier for read ops and
write_tier for write and read+write ops (aka basic tiering support).
{read,write}_tier are part of pg_pool_t since v9. This commit bumps
our pg_pool_t decode compat version from v7 to v9, all new fields
except for {read,write}_tier are ignored.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

show more ...


# ce7f6a27 27-Jan-2014 Ilya Dryomov <ilya.dryomov@inktank.com>

libceph: add ceph_pg_pool_by_id()

"Lookup pool info by ID" function is hidden in osdmap.c. Expose it to
the rest of libceph.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sag

libceph: add ceph_pg_pool_by_id()

"Lookup pool info by ID" function is hidden in osdmap.c. Expose it to
the rest of libceph.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

show more ...


# 7c13cb64 27-Jan-2014 Ilya Dryomov <ilya.dryomov@inktank.com>

libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()

Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename
it to ceph_oloc_oid_to_pg() to make its purpose more clear.

Si

libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()

Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename
it to ceph_oloc_oid_to_pg() to make its purpose more clear.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

show more ...


Revision tags: v3.13, v3.13-rc8, v3.13-rc7, v3.13-rc6
# e8ef19c4 24-Dec-2013 Ilya Dryomov <ilya.dryomov@inktank.com>

crush: eliminate CRUSH_MAX_SET result size limitation

This is only present to size the temporary scratch arrays that we put on
the stack. Let the caller allocate them as they wish and remove the
li

crush: eliminate CRUSH_MAX_SET result size limitation

This is only present to size the temporary scratch arrays that we put on
the stack. Let the caller allocate them as they wish and remove the
limitation.

Reflects ceph.git commit 1cfe140bf2dab99517589a82a916f4c75b9492d1.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

show more ...


# b3b33b0e 24-Dec-2013 Ilya Dryomov <ilya.dryomov@inktank.com>

crush: pass weight vector size to map function

Pass the size of the weight vector into crush_do_rule() to ensure that we
don't access values past the end. This can happen if the caller misbehaves
a

crush: pass weight vector size to map function

Pass the size of the weight vector into crush_do_rule() to ensure that we
don't access values past the end. This can happen if the caller misbehaves
and passes a weight vector that is smaller than max_devices.

Currently the monitor tries to prevent that from happening, but this will
gracefully tolerate previous bad osdmaps that got into this state. It's
also a bit more defensive.

Reflects ceph.git commit 5922e2c2b8335b5e46c9504349c3a55b7434c01a.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

show more ...


Revision tags: v3.13-rc5, v3.13-rc4, v3.13-rc3, v3.13-rc2, v3.13-rc1, v3.12, v3.12-rc7, v3.12-rc6, v3.12-rc5, v3.12-rc4, v3.12-rc3, v3.12-rc2, v3.12-rc1, v3.11
# 9542cf0b 28-Aug-2013 Sage Weil <sage@inktank.com>

libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc

Fix a typo that used the wrong bitmask for the pg.seed calculation. This
is normally unnoticed because in most cases pg_num == pgp_

libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc

Fix a typo that used the wrong bitmask for the pg.seed calculation. This
is normally unnoticed because in most cases pg_num == pgp_num. It is, however,
a bug that is easily corrected.

CC: stable@vger.kernel.org
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <alex.elder@linary.org>

show more ...


Revision tags: v3.11-rc7, v3.11-rc6, v3.11-rc5, v3.11-rc4, v3.11-rc3, v3.11-rc2, v3.11-rc1, v3.10, v3.10-rc7, v3.10-rc6, v3.10-rc5, v3.10-rc4, v3.10-rc3, v3.10-rc2, v3.10-rc1, v3.9, v3.9-rc8, v3.9-rc7, v3.9-rc6
# ef4859d6 01-Apr-2013 Alex Elder <elder@inktank.com>

libceph: define ceph_decode_pgid() only once

There are two basically identical definitions of __decode_pgid()
in libceph, one in "net/ceph/osdmap.c" and the other in
"net/ceph/osd_client.c". Get ri

libceph: define ceph_decode_pgid() only once

There are two basically identical definitions of __decode_pgid()
in libceph, one in "net/ceph/osdmap.c" and the other in
"net/ceph/osd_client.c". Get rid of both, and instead define
a single inline version in "include/linux/ceph/osdmap.h".

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

show more ...


Revision tags: v3.9-rc5, v3.9-rc4, v3.9-rc3, v3.9-rc2, v3.9-rc1
# 41766f87 01-Mar-2013 Alex Elder <elder@inktank.com>

libceph: rename ceph_calc_object_layout()

The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object

libceph: rename ceph_calc_object_layout()

The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object id.

Currently that function takes a file layout parameter, but the only
thing used out of that is its pool number.

Change the function so it takes a pool number rather than the full
file layout structure. Only update the ceph_pg if the pool is found
in the osd map. Get rid of few useless lines of code from the
function while there.

Since the function now very clearly just fills in the ceph_pg
structure it's provided, rename it ceph_calc_ceph_pg().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

show more ...


# d6c0dd6b 06-Mar-2013 Sage Weil <sage@inktank.com>

libceph: fix decoding of pgids

In 4f6a7e5ee1393ec4b243b39dac9f36992d161540 we effectively dropped support
for the legacy encoding for the OSDMap and incremental. However, we didn't
fix the decoding

libceph: fix decoding of pgids

In 4f6a7e5ee1393ec4b243b39dac9f36992d161540 we effectively dropped support
for the legacy encoding for the OSDMap and incremental. However, we didn't
fix the decoding for the pgid.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

show more ...


# 83ca14fd 26-Feb-2013 Sage Weil <sage@inktank.com>

libceph: add support for HASHPSPOOL pool flag

The legacy behavior adds the pgid seed and pool together as the input for
CRUSH. That is problematic because each pool's PGs end up mapping to the
same

libceph: add support for HASHPSPOOL pool flag

The legacy behavior adds the pgid seed and pool together as the input for
CRUSH. That is problematic because each pool's PGs end up mapping to the
same OSDs: 1.5 == 2.4 == 3.3 == ...

Instead, if the HASHPSPOOL flag is set, we has the ps and pool together and
feed that into CRUSH. This ensures that two adjacent pools will map to
an independent pseudorandom set of OSDs.

Advertise our support for this via a protocol feature flag.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>

show more ...


# 2169aea6 25-Feb-2013 Sage Weil <sage@inktank.com>

libceph: calculate placement based on the internal data types

Instead of using the old ceph_object_layout struct, update our internal
ceph_calc_object_layout method to use the ceph_pg type. This al

libceph: calculate placement based on the internal data types

Instead of using the old ceph_object_layout struct, update our internal
ceph_calc_object_layout method to use the ceph_pg type. This allows us to
pass the full 32-bit precision of the pgid.seed to the callers. It also
allows some callers to avoid reaching into the request structures for the
struct ceph_object_layout fields.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>

show more ...


# 4f6a7e5e 23-Feb-2013 Sage Weil <sage@inktank.com>

ceph: update support for PGID64, PGPOOL3, OSDENC protocol features

Support (and require) the PGID64, PGPOOL3, and OSDENC protocol features.
These have been present in ceph.git since v0.42, Feb 2012.

ceph: update support for PGID64, PGPOOL3, OSDENC protocol features

Support (and require) the PGID64, PGPOOL3, and OSDENC protocol features.
These have been present in ceph.git since v0.42, Feb 2012. Require these
features to simplify support; nobody is running older userspace.

Note that the new request and reply encoding is still not in place, so the new
code is not yet functional.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>

show more ...


# 5b191d99 23-Feb-2013 Sage Weil <sage@inktank.com>

libceph: decode into cpu-native ceph_pg type

Always decode data into our cpu-native ceph_pg type that has the correct
field widths. Limit any remaining uses of ceph_pg_v1 to dealing with the
legacy

libceph: decode into cpu-native ceph_pg type

Always decode data into our cpu-native ceph_pg type that has the correct
field widths. Limit any remaining uses of ceph_pg_v1 to dealing with the
legacy protocol.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>

show more ...


Revision tags: v3.8, v3.8-rc7, v3.8-rc6, v3.8-rc5, v3.8-rc4, v3.8-rc3
# 12979354 08-Jan-2013 Sage Weil <sage@inktank.com>

libceph: rename ceph_pg -> ceph_pg_v1

Rename the old version this type to distinguish it from the new version.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>


# 1ec3911d 25-Jan-2013 Cong Ding <dinggnu@gmail.com>

libceph: fix undefined behavior when using snprintf()

The variable "str" is used as both the source and destination in
function snprintf(), which is undefined behavior based on C11. The
original des

libceph: fix undefined behavior when using snprintf()

The variable "str" is used as both the source and destination in
function snprintf(), which is undefined behavior based on C11. The
original description in C11 is:
"If copying takes place between objects that
overlap, the behavior is undefined."

And, the function of ceph_osdmap_state_str() is to return the osdmap
state, so it should return "doesn't exist" when all the conditions
are not satisfied. I fix it in this patch.

[elder@inktank.com: shortened the commit message]

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Reviewed-by: Alex Elder <elder@inktank.com>

show more ...


Revision tags: v3.8-rc2, v3.8-rc1, v3.7, v3.7-rc8, v3.7-rc7, v3.7-rc6
# e8afad65 14-Nov-2012 Alex Elder <elder@inktank.com>

libceph: pass length to ceph_calc_file_object_mapping()

ceph_calc_file_object_mapping() takes (among other things) a "file"
offset and length, and based on the layout, determines the object
number (

libceph: pass length to ceph_calc_file_object_mapping()

ceph_calc_file_object_mapping() takes (among other things) a "file"
offset and length, and based on the layout, determines the object
number ("bno") backing the affected portion of the file's data and
the offset into that object where the desired range begins. It also
computes the size that should be used for the request--either the
amount requested or something less if that would exceed the end of
the object.

This patch changes the input length parameter in this function so it
is used only for input. That is, the argument will be passed by
value rather than by address, so the value provided won't get
updated by the function.

The value would only get updated if the length would surpass the
current object, and in that case the value it got updated to would
be exactly that returned in *oxlen.

Only one of the two callers is affected by this change. Update
ceph_calc_raw_layout() so it records any updated value.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

show more ...


# 1604f488 30-Nov-2012 Jim Schutt <jaschut@sandia.gov>

libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed

Add libceph support for a new CRUSH tunable recently added to Ceph servers.

Consider the CRUSH rule
step choosel

libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed

Add libceph support for a new CRUSH tunable recently added to Ceph servers.

Consider the CRUSH rule
step chooseleaf firstn 0 type <node_type>

This rule means that <n> replicas will be chosen in a manner such that
each chosen leaf's branch will contain a unique instance of <node_type>.

When an object is re-replicated after a leaf failure, if the CRUSH map uses
a chooseleaf rule the remapped replica ends up under the <node_type> bucket
that held the failed leaf. This causes uneven data distribution across the
storage cluster, to the point that when all the leaves but one fail under a
particular <node_type> bucket, that remaining leaf holds all the data from
its failed peers.

This behavior also limits the number of peers that can participate in the
re-replication of the data held by the failed leaf, which increases the
time required to re-replicate after a failure.

For a chooseleaf CRUSH rule, the tree descent has two steps: call them the
inner and outer descents.

If the tree descent down to <node_type> is the outer descent, and the descent
from <node_type> down to a leaf is the inner descent, the issue is that a
down leaf is detected on the inner descent, so only the inner descent is
retried.

In order to disperse re-replicated data as widely as possible across a
storage cluster after a failure, we want to retry the outer descent. So,
fix up crush_choose() to allow the inner descent to return immediately on
choosing a failed leaf. Wire this up as a new CRUSH tunable.

Note that after this change, for a chooseleaf rule, if the primary OSD
in a placement group has failed, choosing a replacement may result in
one of the other OSDs in the PG colliding with the new primary. This
requires that OSD's data for that PG to need moving as well. This
seems unavoidable but should be relatively rare.

This corresponds to ceph.git commit 88f218181a9e6d2292e2697fc93797d0f6d6e5dc.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Reviewed-by: Sage Weil <sage@inktank.com>

show more ...


Revision tags: v3.7-rc5, v3.7-rc4
# 72afc71f 30-Oct-2012 Alex Elder <elder@inktank.com>

libceph: define ceph_pg_pool_name_by_id()

Define and export function ceph_pg_pool_name_by_id() to supply
the name of a pg pool whose id is given. This will be used by
the next patch.

Signed-off-by

libceph: define ceph_pg_pool_name_by_id()

Define and export function ceph_pg_pool_name_by_id() to supply
the name of a pg pool whose id is given. This will be used by
the next patch.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

show more ...


# 0ed7285e 29-Oct-2012 Sage Weil <sage@inktank.com>

libceph: fix osdmap decode error paths

Ensure that we set the err value correctly so that we do not pass a 0
value to ERR_PTR and confuse the calling code. (In particular,
osd_client.c handle_map()

libceph: fix osdmap decode error paths

Ensure that we set the err value correctly so that we do not pass a 0
value to ERR_PTR and confuse the calling code. (In particular,
osd_client.c handle_map() will BUG(!newmap)).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>

show more ...


Revision tags: v3.7-rc3, v3.7-rc2, v3.7-rc1, v3.6
# d63b77f4 24-Sep-2012 Sage Weil <sage@inktank.com>

libceph: check for invalid mapping

If we encounter an invalid (e.g., zeroed) mapping, return an error
and avoid a divide by zero.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder

libceph: check for invalid mapping

If we encounter an invalid (e.g., zeroed) mapping, return an error
and avoid a divide by zero.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>

show more ...


Revision tags: v3.6-rc7, v3.6-rc6, v3.6-rc5, v3.6-rc4, v3.6-rc3, v3.6-rc2, v3.6-rc1
# 546f04ef 30-Jul-2012 Sage Weil <sage@inktank.com>

libceph: support crush tunables

The server side recently added support for tuning some magic
crush variables. Decode these variables if they are present, or use the
default values if they are not pr

libceph: support crush tunables

The server side recently added support for tuning some magic
crush variables. Decode these variables if they are present, or use the
default values if they are not present.

Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5.

Signed-off-by: caleb miles <caleb.miles@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

show more ...


Revision tags: v3.5, v3.5-rc7, v3.5-rc6, v3.5-rc5, v3.5-rc4, v3.5-rc3, v3.5-rc2
# a5506049 06-Jun-2012 Xi Wang <xi.wang@gmail.com>

libceph: fix overflow in osdmap_apply_incremental()

On 32-bit systems, a large `pglen' would overflow `pglen*sizeof(u32)'
and bypass the check ceph_decode_need(p, end, pglen*sizeof(u32), bad).
It wo

libceph: fix overflow in osdmap_apply_incremental()

On 32-bit systems, a large `pglen' would overflow `pglen*sizeof(u32)'
and bypass the check ceph_decode_need(p, end, pglen*sizeof(u32), bad).
It would also overflow the subsequent kmalloc() size, leading to
out-of-bounds write.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Alex Elder <elder@inktank.com>

show more ...


12345678910>>...12