#
98c4bfe9 |
| 17-Oct-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: check reply num_data_items in setup_request_data()
setup_request_data() adds message data items to both request and reply messages, but only checks request num_data_items before proceeding
libceph: check reply num_data_items in setup_request_data()
setup_request_data() adds message data items to both request and reply messages, but only checks request num_data_items before proceeding with the loop. This is wrong because if an op doesn't have any request data items but has a reply data item (e.g. read), a duplicate data item gets added to the message on every resend attempt.
This went unnoticed for years but now that message data items are preallocated, it promptly crashes in ceph_msg_data_add(). Amend the signature to make it clear that setup_request_data() operates on both request and reply messages. Also, remove data_len assert -- we have another one in prepare_write_message().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
0d9c1ab3 |
| 15-Oct-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: preallocate message data items
Currently message data items are allocated with ceph_msg_data_create() in setup_request_data() inside send_request(). send_request() has never been allowed t
libceph: preallocate message data items
Currently message data items are allocated with ceph_msg_data_create() in setup_request_data() inside send_request(). send_request() has never been allowed to fail, so each allocation is followed by a BUG_ON:
data = ceph_msg_data_create(...); BUG_ON(!data);
It's been this way since support for multiple message data items was added in commit 6644ed7b7e04 ("libceph: make message data be a pointer") in 3.10.
There is no reason to delay the allocation of message data items until the last possible moment and we certainly don't need a linked list of them as they are only ever appended to the end and never erased. Make ceph_msg_new2() take max_data_items and adapt the rest of the code.
Reported-by: Jerry Lee <leisurelysw24@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
26f887e0 |
| 15-Oct-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph, rbd, ceph: move ceph_osdc_alloc_messages() calls
The current requirement is that ceph_osdc_alloc_messages() should be called after oid and oloc are known. In preparation for preallocating
libceph, rbd, ceph: move ceph_osdc_alloc_messages() calls
The current requirement is that ceph_osdc_alloc_messages() should be called after oid and oloc are known. In preparation for preallocating message data items, move ceph_osdc_alloc_messages() further down, so that it is called when OSD op codes are known.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
39e58c34 |
| 15-Oct-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: introduce alloc_watch_request()
ceph_osdc_alloc_messages() call will be moved out of alloc_linger_request() in the next commit, which means that ceph_osdc_watch() will need to call ceph_osd
libceph: introduce alloc_watch_request()
ceph_osdc_alloc_messages() call will be moved out of alloc_linger_request() in the next commit, which means that ceph_osdc_watch() will need to call ceph_osdc_alloc_messages() twice. Add a helper for that.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v4.18.14 |
|
#
81c65213 |
| 11-Oct-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: assign cookies in linger_submit()
Register lingers directly in linger_submit(). This avoids allocating memory for notify pagelist while holding osdc->lock and simplifies both callers of li
libceph: assign cookies in linger_submit()
Register lingers directly in linger_submit(). This avoids allocating memory for notify pagelist while holding osdc->lock and simplifies both callers of linger_submit().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
3b83f60d |
| 11-Oct-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: enable fallback to ceph_msg_new() in ceph_msgpool_get()
ceph_msgpool_get() can fall back to ceph_msg_new() when it is asked for a message whose front portion is larger than pool->front_len.
libceph: enable fallback to ceph_msg_new() in ceph_msgpool_get()
ceph_msgpool_get() can fall back to ceph_msg_new() when it is asked for a message whose front portion is larger than pool->front_len. However the caller always passes 0, effectively disabling that code path. The allocation goes to the message pool and returns a message with a front that is smaller than requested, setting us up for a crash.
One example of this is a directory with a large number of snapshots. If its snap context doesn't fit, we oops in encode_request_partial().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
41a264e1 |
| 13-Oct-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: no need to call osd_req_opcode_valid() in osd_req_encode_op()
Any uninitialized or unknown ops will be caught by the default clause anyway.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
Revision tags: v4.18.13, v4.18.12, v4.18.11 |
|
#
89486833 |
| 28-Sep-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: don't consume a ref on pagelist in ceph_msg_data_add_pagelist()
Because send_mds_reconnect() wants to send a message with a pagelist and pass the ownership to the messenger, ceph_msg_data_a
libceph: don't consume a ref on pagelist in ceph_msg_data_add_pagelist()
Because send_mds_reconnect() wants to send a message with a pagelist and pass the ownership to the messenger, ceph_msg_data_add_pagelist() consumes a ref which is then put in ceph_msg_data_destroy(). This makes managing pagelists in the OSD client (where they are wrapped in ceph_osd_data) unnecessarily hard because the handoff only happens in ceph_osdc_start_request() instead of when the pagelist is passed to ceph_osd_data_pagelist_init(). I counted several memory leaks on various error paths.
Fix up ceph_msg_data_add_pagelist() and carry a pagelist ref in ceph_osd_data.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
33165d47 |
| 28-Sep-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: introduce ceph_pagelist_alloc()
struct ceph_pagelist cannot be embedded into anything else because it has its own refcount. Merge allocation and initialization together.
Signed-off-by: Il
libceph: introduce ceph_pagelist_alloc()
struct ceph_pagelist cannot be embedded into anything else because it has its own refcount. Merge allocation and initialization together.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
24639ce5 |
| 26-Sep-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: osd_req_op_cls_init() doesn't need to take opcode
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
Revision tags: v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11 |
|
#
6daca13d |
| 27-Jul-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: add authorizer challenge
When a client authenticates with a service, an authorizer is sent with a nonce to the service (ceph_x_authorize_[ab]) and the service responds with a mutation of th
libceph: add authorizer challenge
When a client authenticates with a service, an authorizer is sent with a nonce to the service (ceph_x_authorize_[ab]) and the service responds with a mutation of that nonce (ceph_x_authorize_reply). This lets the client verify the service is who it says it is but it doesn't protect against a replay: someone can trivially capture the exchange and reuse the same authorizer to authenticate themselves.
Allow the service to reject an initial authorizer with a random challenge (ceph_x_authorize_challenge). The client then has to respond with an updated authorizer proving they are able to decrypt the service's challenge and that the new authorizer was produced for this specific connection instance.
The accepting side requires this challenge and response unconditionally if the client side advertises they have CEPHX_V2 feature bit.
This addresses CVE-2018-1128.
Link: http://tracker.ceph.com/issues/24836 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
show more ...
|
Revision tags: v4.17.10, v4.17.9, v4.17.8, v4.17.7 |
|
#
fac02ddf |
| 13-Jul-2018 |
Arnd Bergmann <arnd@arndb.de> |
libceph: use timespec64 for r_mtime
The request mtime field is used all over ceph, and is currently represented as a 'timespec' structure in Linux. This changes it to timespec64 to allow times beyon
libceph: use timespec64 for r_mtime
The request mtime field is used all over ceph, and is currently represented as a 'timespec' structure in Linux. This changes it to timespec64 to allow times beyond 2038, modifying all users at the same time.
[ Remove now redundant ts variable in writepage_nounlock(). ]
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v4.17.6, v4.17.5, v4.17.4, v4.17.3 |
|
#
6d54228f |
| 25-Jun-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: make ceph_osdc_notify{,_ack}() payload_len u32
The wire format dictates that payload_len fits into 4 bytes. Take u32 instead of size_t to reflect that.
All callers pass a small integer, s
libceph: make ceph_osdc_notify{,_ack}() payload_len u32
The wire format dictates that payload_len fits into 4 bytes. Take u32 instead of size_t to reflect that.
All callers pass a small integer, so no changes required.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
Revision tags: v4.17.2, v4.17.1, v4.17 |
|
#
acafe7e3 |
| 08-May-2018 |
Kees Cook <keescook@chromium.org> |
treewide: Use struct_size() for kmalloc()-family
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with me
treewide: Use struct_size() for kmalloc()-family
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example:
struct foo { int stuff; void *entry[]; };
instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper:
instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
This patch makes the changes for kmalloc()-family (and kvmalloc()-family) uses. It was done via automatic conversion with manual review for the "CHECKME" non-standard cases noted below, using the following Coccinelle script:
// pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len * // sizeof *pkey_cache->table, GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@
- alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
// mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@
- alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
// Same pattern, but can't trivially locate the trailing element name, // or variable name. @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; expression SOMETHING, COUNT, ELEMENT; @@
- alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP) + alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)
Signed-off-by: Kees Cook <keescook@chromium.org>
show more ...
|
#
a86f009f |
| 23-May-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: allocate the locator string with GFP_NOFAIL
calc_target() isn't supposed to fail with anything but POOL_DNE, in which case we report that the pool doesn't exist and fail the request with -E
libceph: allocate the locator string with GFP_NOFAIL
calc_target() isn't supposed to fail with anything but POOL_DNE, in which case we report that the pool doesn't exist and fail the request with -ENOENT. Doing this for -ENOMEM is at the very least confusing and also harmful -- as the preceding requests complete, a short-lived locator string allocation is likely to succeed after a wait.
(We used to call ceph_object_locator_to_pg() for a pi lookup. In theory that could fail with -ENOENT, hence the "ret != -ENOENT" warning being removed.)
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|
#
c843d13c |
| 30-May-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: make abort_on_full a per-osdc setting
The intent behind making it a per-request setting was that it would be set for writes, but not for reads. As it is, the flag is set for all fs/ceph re
libceph: make abort_on_full a per-osdc setting
The intent behind making it a per-request setting was that it would be set for writes, but not for reads. As it is, the flag is set for all fs/ceph requests except for pool perm check stat request (technically a read).
ceph_osdc_abort_on_full() skips reads since the previous commit and I don't see a use case for marking individual requests.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
show more ...
|
#
690f951d |
| 30-May-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: don't abort reads in ceph_osdc_abort_on_full()
Don't consider reads for aborting and use ->base_oloc instead of ->target_oloc, as done in __submit_request().
Strictly speaking, we shouldn'
libceph: don't abort reads in ceph_osdc_abort_on_full()
Don't consider reads for aborting and use ->base_oloc instead of ->target_oloc, as done in __submit_request().
Strictly speaking, we shouldn't be aborting FULL_TRY/FULL_FORCE writes either. But, there is an inconsistency in FULL_TRY/FULL_FORCE handling on the OSD side [1], so given that neither of these is used in the kernel client, leave it for when the OSD behaviour is sorted out.
[1] http://tracker.ceph.com/issues/24339
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
show more ...
|
#
6001567c |
| 22-May-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: avoid a use-after-free during map check
Sending map check after complete_request() was called is not only useless, but can lead to a use-after-free as req->r_kref decrement in __complete_re
libceph: avoid a use-after-free during map check
Sending map check after complete_request() was called is not only useless, but can lead to a use-after-free as req->r_kref decrement in __complete_request() races with map check code.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
show more ...
|
#
29e87820 |
| 17-May-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: don't warn if req->r_abort_on_full is set
The "FULL or reached pool quota" warning is there to explain paused requests. No need to emit it if pausing isn't going to occur.
Signed-off-by:
libceph: don't warn if req->r_abort_on_full is set
The "FULL or reached pool quota" warning is there to explain paused requests. No need to emit it if pausing isn't going to occur.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
show more ...
|
#
4eea0fef |
| 16-May-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: use for_each_request() in ceph_osdc_abort_on_full()
Scanning the trees just to see if there is anything to abort is unnecessary -- all that is needed here is to update the epoch barrier fir
libceph: use for_each_request() in ceph_osdc_abort_on_full()
Scanning the trees just to see if there is anything to abort is unnecessary -- all that is needed here is to update the epoch barrier first, before we start aborting. Simplify and do the update inside the loop before calling abort_request() for the first time.
The switch to for_each_request() also fixes a bug: homeless requests weren't even considered for aborting.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
show more ...
|
#
88bc1922 |
| 21-May-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: defer __complete_request() to a workqueue
In the common case, req->r_callback is called by handle_reply() on the ceph-msgr worker thread without any locks. If handle_reply() fails, it is c
libceph: defer __complete_request() to a workqueue
In the common case, req->r_callback is called by handle_reply() on the ceph-msgr worker thread without any locks. If handle_reply() fails, it is called with both osd->lock and osdc->lock. In the map check case, it is called with just osdc->lock but held for write. Finally, if the request is aborted because of -ENOSPC or by ceph_osdc_abort_requests(), it is called directly on the submitter's thread, again with both locks.
req->r_callback on the submitter's thread is relatively new (introduced in 4.12) and ripe for deadlocks -- e.g. writeback worker thread waiting on itself:
inode_wait_for_writeback+0x26/0x40 evict+0xb5/0x1a0 iput+0x1d2/0x220 ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph] writepages_finish+0x2d3/0x410 [ceph] __complete_request+0x26/0x60 [libceph] complete_request+0x2e/0x70 [libceph] __submit_request+0x256/0x330 [libceph] submit_request+0x2b/0x30 [libceph] ceph_osdc_start_request+0x25/0x40 [libceph] ceph_writepages_start+0xdfe/0x1320 [ceph] do_writepages+0x1f/0x70 __writeback_single_inode+0x45/0x330 writeback_sb_inodes+0x26a/0x600 __writeback_inodes_wb+0x92/0xc0 wb_writeback+0x274/0x330 wb_workfn+0x2d5/0x3b0
Defer __complete_request() to a workqueue in all failure cases so it's never on the same thread as ceph_osdc_start_request() and always called with no locks held.
Link: http://tracker.ceph.com/issues/23978 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
show more ...
|
#
26df726b |
| 21-May-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: move more code into __complete_request()
Move req->r_completion wake up and req->r_kref decrement into __complete_request().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff
libceph: move more code into __complete_request()
Move req->r_completion wake up and req->r_kref decrement into __complete_request().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
show more ...
|
#
0d09c57d |
| 18-May-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: no need to call flush_workqueue() before destruction
destroy_workqueue() drains the workqueue before proceeding with destruction.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
66850df5 |
| 15-May-2018 |
Ilya Dryomov <idryomov@gmail.com> |
libceph: introduce ceph_osdc_abort_requests()
This will be used by the filesystem for "umount -f".
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
#
fe943d50 |
| 11-Apr-2018 |
Chengguang Xu <cgxu519@gmx.com> |
libceph, rbd: add error handling for osd_req_op_cls_init()
Add proper error handling for osd_req_op_cls_init() to replace BUG_ON statement when failing from memory allocation.
Signed-off-by: Chengg
libceph, rbd: add error handling for osd_req_op_cls_init()
Add proper error handling for osd_req_op_cls_init() to replace BUG_ON statement when failing from memory allocation.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
show more ...
|