1898bd37aSMauro Carvalho Chehab======================================
2898bd37aSMauro Carvalho ChehabImmutable biovecs and biovec iterators
3898bd37aSMauro Carvalho Chehab======================================
4898bd37aSMauro Carvalho Chehab
5898bd37aSMauro Carvalho ChehabKent Overstreet <kmo@daterainc.com>
6898bd37aSMauro Carvalho Chehab
7898bd37aSMauro Carvalho ChehabAs of 3.13, biovecs should never be modified after a bio has been submitted.
8898bd37aSMauro Carvalho ChehabInstead, we have a new struct bvec_iter which represents a range of a biovec -
9898bd37aSMauro Carvalho Chehabthe iterator will be modified as the bio is completed, not the biovec.
10898bd37aSMauro Carvalho Chehab
11898bd37aSMauro Carvalho ChehabMore specifically, old code that needed to partially complete a bio would
12898bd37aSMauro Carvalho Chehabupdate bi_sector and bi_size, and advance bi_idx to the next biovec. If it
13898bd37aSMauro Carvalho Chehabended up partway through a biovec, it would increment bv_offset and decrement
14898bd37aSMauro Carvalho Chehabbv_len by the number of bytes completed in that biovec.
15898bd37aSMauro Carvalho Chehab
16898bd37aSMauro Carvalho ChehabIn the new scheme of things, everything that must be mutated in order to
17898bd37aSMauro Carvalho Chehabpartially complete a bio is segregated into struct bvec_iter: bi_sector,
18898bd37aSMauro Carvalho Chehabbi_size and bi_idx have been moved there; and instead of modifying bv_offset
19898bd37aSMauro Carvalho Chehaband bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
20898bd37aSMauro Carvalho Chehabbytes completed in the current bvec.
21898bd37aSMauro Carvalho Chehab
22898bd37aSMauro Carvalho ChehabThere are a bunch of new helper macros for hiding the gory details - in
23898bd37aSMauro Carvalho Chehabparticular, presenting the illusion of partially completed biovecs so that
24898bd37aSMauro Carvalho Chehabnormal code doesn't have to deal with bi_bvec_done.
25898bd37aSMauro Carvalho Chehab
26898bd37aSMauro Carvalho Chehab * Driver code should no longer refer to biovecs directly; we now have
27898bd37aSMauro Carvalho Chehab   bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
28898bd37aSMauro Carvalho Chehab   constructed from the raw biovecs but taking into account bi_bvec_done and
29898bd37aSMauro Carvalho Chehab   bi_size.
30898bd37aSMauro Carvalho Chehab
31898bd37aSMauro Carvalho Chehab   bio_for_each_segment() has been updated to take a bvec_iter argument
32898bd37aSMauro Carvalho Chehab   instead of an integer (that corresponded to bi_idx); for a lot of code the
33898bd37aSMauro Carvalho Chehab   conversion just required changing the types of the arguments to
34898bd37aSMauro Carvalho Chehab   bio_for_each_segment().
35898bd37aSMauro Carvalho Chehab
36898bd37aSMauro Carvalho Chehab * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
37898bd37aSMauro Carvalho Chehab   wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
38898bd37aSMauro Carvalho Chehab   advances the bio integrity's iter if present.
39898bd37aSMauro Carvalho Chehab
40898bd37aSMauro Carvalho Chehab   There is a lower level advance function - bvec_iter_advance() - which takes
41898bd37aSMauro Carvalho Chehab   a pointer to a biovec, not a bio; this is used by the bio integrity code.
42898bd37aSMauro Carvalho Chehab
43898bd37aSMauro Carvalho ChehabWhat's all this get us?
44898bd37aSMauro Carvalho Chehab=======================
45898bd37aSMauro Carvalho Chehab
46898bd37aSMauro Carvalho ChehabHaving a real iterator, and making biovecs immutable, has a number of
47898bd37aSMauro Carvalho Chehabadvantages:
48898bd37aSMauro Carvalho Chehab
49898bd37aSMauro Carvalho Chehab * Before, iterating over bios was very awkward when you weren't processing
50898bd37aSMauro Carvalho Chehab   exactly one bvec at a time - for example, bio_copy_data() in fs/bio.c,
51898bd37aSMauro Carvalho Chehab   which copies the contents of one bio into another. Because the biovecs
52898bd37aSMauro Carvalho Chehab   wouldn't necessarily be the same size, the old code was tricky convoluted -
53898bd37aSMauro Carvalho Chehab   it had to walk two different bios at the same time, keeping both bi_idx and
54898bd37aSMauro Carvalho Chehab   and offset into the current biovec for each.
55898bd37aSMauro Carvalho Chehab
56898bd37aSMauro Carvalho Chehab   The new code is much more straightforward - have a look. This sort of
57898bd37aSMauro Carvalho Chehab   pattern comes up in a lot of places; a lot of drivers were essentially open
58898bd37aSMauro Carvalho Chehab   coding bvec iterators before, and having common implementation considerably
59898bd37aSMauro Carvalho Chehab   simplifies a lot of code.
60898bd37aSMauro Carvalho Chehab
61898bd37aSMauro Carvalho Chehab * Before, any code that might need to use the biovec after the bio had been
62898bd37aSMauro Carvalho Chehab   completed (perhaps to copy the data somewhere else, or perhaps to resubmit
63898bd37aSMauro Carvalho Chehab   it somewhere else if there was an error) had to save the entire bvec array
64898bd37aSMauro Carvalho Chehab   - again, this was being done in a fair number of places.
65898bd37aSMauro Carvalho Chehab
66898bd37aSMauro Carvalho Chehab * Biovecs can be shared between multiple bios - a bvec iter can represent an
67898bd37aSMauro Carvalho Chehab   arbitrary range of an existing biovec, both starting and ending midway
68898bd37aSMauro Carvalho Chehab   through biovecs. This is what enables efficient splitting of arbitrary
69898bd37aSMauro Carvalho Chehab   bios. Note that this means we _only_ use bi_size to determine when we've
70898bd37aSMauro Carvalho Chehab   reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
71898bd37aSMauro Carvalho Chehab   bi_size into account when constructing biovecs.
72898bd37aSMauro Carvalho Chehab
73898bd37aSMauro Carvalho Chehab * Splitting bios is now much simpler. The old bio_split() didn't even work on
74898bd37aSMauro Carvalho Chehab   bios with more than a single bvec! Now, we can efficiently split arbitrary
75898bd37aSMauro Carvalho Chehab   size bios - because the new bio can share the old bio's biovec.
76898bd37aSMauro Carvalho Chehab
77898bd37aSMauro Carvalho Chehab   Care must be taken to ensure the biovec isn't freed while the split bio is
78898bd37aSMauro Carvalho Chehab   still using it, in case the original bio completes first, though. Using
79898bd37aSMauro Carvalho Chehab   bio_chain() when splitting bios helps with this.
80898bd37aSMauro Carvalho Chehab
81898bd37aSMauro Carvalho Chehab * Submitting partially completed bios is now perfectly fine - this comes up
82898bd37aSMauro Carvalho Chehab   occasionally in stacking block drivers and various code (e.g. md and
83898bd37aSMauro Carvalho Chehab   bcache) had some ugly workarounds for this.
84898bd37aSMauro Carvalho Chehab
85898bd37aSMauro Carvalho Chehab   It used to be the case that submitting a partially completed bio would work
86898bd37aSMauro Carvalho Chehab   fine to _most_ devices, but since accessing the raw bvec array was the
87898bd37aSMauro Carvalho Chehab   norm, not all drivers would respect bi_idx and those would break. Now,
88898bd37aSMauro Carvalho Chehab   since all drivers _must_ go through the bvec iterator - and have been
89898bd37aSMauro Carvalho Chehab   audited to make sure they are - submitting partially completed bios is
90898bd37aSMauro Carvalho Chehab   perfectly fine.
91898bd37aSMauro Carvalho Chehab
92898bd37aSMauro Carvalho ChehabOther implications:
93898bd37aSMauro Carvalho Chehab===================
94898bd37aSMauro Carvalho Chehab
95898bd37aSMauro Carvalho Chehab * Almost all usage of bi_idx is now incorrect and has been removed; instead,
96898bd37aSMauro Carvalho Chehab   where previously you would have used bi_idx you'd now use a bvec_iter,
97898bd37aSMauro Carvalho Chehab   probably passing it to one of the helper macros.
98898bd37aSMauro Carvalho Chehab
99898bd37aSMauro Carvalho Chehab   I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you
100898bd37aSMauro Carvalho Chehab   now use bio_iter_iovec(), which takes a bvec_iter and returns a
101898bd37aSMauro Carvalho Chehab   literal struct bio_vec - constructed on the fly from the raw biovec but
102898bd37aSMauro Carvalho Chehab   taking into account bi_bvec_done (and bi_size).
103898bd37aSMauro Carvalho Chehab
104898bd37aSMauro Carvalho Chehab * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
105898bd37aSMauro Carvalho Chehab   doesn't actually own the bio. The reason is twofold: firstly, it's not
106898bd37aSMauro Carvalho Chehab   actually needed for iterating over the bio anymore - we only use bi_size.
107898bd37aSMauro Carvalho Chehab   Secondly, when cloning a bio and reusing (a portion of) the original bio's
108898bd37aSMauro Carvalho Chehab   biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
109898bd37aSMauro Carvalho Chehab   over all the biovecs in the new bio - which is silly as it's not needed.
110898bd37aSMauro Carvalho Chehab
111898bd37aSMauro Carvalho Chehab   So, don't use bi_vcnt anymore.
112898bd37aSMauro Carvalho Chehab
113898bd37aSMauro Carvalho Chehab * The current interface allows the block layer to split bios as needed, so we
114898bd37aSMauro Carvalho Chehab   could eliminate a lot of complexity particularly in stacked drivers. Code
115898bd37aSMauro Carvalho Chehab   that creates bios can then create whatever size bios are convenient, and
116898bd37aSMauro Carvalho Chehab   more importantly stacked drivers don't have to deal with both their own bio
117898bd37aSMauro Carvalho Chehab   size limitations and the limitations of the underlying devices. Thus
118898bd37aSMauro Carvalho Chehab   there's no need to define ->merge_bvec_fn() callbacks for individual block
119898bd37aSMauro Carvalho Chehab   drivers.
120898bd37aSMauro Carvalho Chehab
121898bd37aSMauro Carvalho ChehabUsage of helpers:
122898bd37aSMauro Carvalho Chehab=================
123898bd37aSMauro Carvalho Chehab
124898bd37aSMauro Carvalho Chehab* The following helpers whose names have the suffix of `_all` can only be used
125898bd37aSMauro Carvalho Chehab  on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers
126898bd37aSMauro Carvalho Chehab  shouldn't use them because the bio may have been split before it reached the
127898bd37aSMauro Carvalho Chehab  driver.
128898bd37aSMauro Carvalho Chehab
129898bd37aSMauro Carvalho Chehab::
130898bd37aSMauro Carvalho Chehab
131898bd37aSMauro Carvalho Chehab	bio_for_each_segment_all()
132898bd37aSMauro Carvalho Chehab	bio_first_bvec_all()
133898bd37aSMauro Carvalho Chehab	bio_first_page_all()
134898bd37aSMauro Carvalho Chehab	bio_last_bvec_all()
135898bd37aSMauro Carvalho Chehab
136898bd37aSMauro Carvalho Chehab* The following helpers iterate over single-page segment. The passed 'struct
137898bd37aSMauro Carvalho Chehab  bio_vec' will contain a single-page IO vector during the iteration::
138898bd37aSMauro Carvalho Chehab
139898bd37aSMauro Carvalho Chehab	bio_for_each_segment()
140898bd37aSMauro Carvalho Chehab	bio_for_each_segment_all()
141898bd37aSMauro Carvalho Chehab
142898bd37aSMauro Carvalho Chehab* The following helpers iterate over multi-page bvec. The passed 'struct
143898bd37aSMauro Carvalho Chehab  bio_vec' will contain a multi-page IO vector during the iteration::
144898bd37aSMauro Carvalho Chehab
145898bd37aSMauro Carvalho Chehab	bio_for_each_bvec()
146898bd37aSMauro Carvalho Chehab	rq_for_each_bvec()
147