1898bd37aSMauro Carvalho Chehab======================================
2898bd37aSMauro Carvalho ChehabImmutable biovecs and biovec iterators
3898bd37aSMauro Carvalho Chehab======================================
4898bd37aSMauro Carvalho Chehab
5898bd37aSMauro Carvalho ChehabKent Overstreet <kmo@daterainc.com>
6898bd37aSMauro Carvalho Chehab
7898bd37aSMauro Carvalho ChehabAs of 3.13, biovecs should never be modified after a bio has been submitted.
8898bd37aSMauro Carvalho ChehabInstead, we have a new struct bvec_iter which represents a range of a biovec -
9898bd37aSMauro Carvalho Chehabthe iterator will be modified as the bio is completed, not the biovec.
10898bd37aSMauro Carvalho Chehab
11898bd37aSMauro Carvalho ChehabMore specifically, old code that needed to partially complete a bio would
12898bd37aSMauro Carvalho Chehabupdate bi_sector and bi_size, and advance bi_idx to the next biovec. If it
13898bd37aSMauro Carvalho Chehabended up partway through a biovec, it would increment bv_offset and decrement
14898bd37aSMauro Carvalho Chehabbv_len by the number of bytes completed in that biovec.
15898bd37aSMauro Carvalho Chehab
16898bd37aSMauro Carvalho ChehabIn the new scheme of things, everything that must be mutated in order to
17898bd37aSMauro Carvalho Chehabpartially complete a bio is segregated into struct bvec_iter: bi_sector,
18898bd37aSMauro Carvalho Chehabbi_size and bi_idx have been moved there; and instead of modifying bv_offset
19898bd37aSMauro Carvalho Chehaband bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
20898bd37aSMauro Carvalho Chehabbytes completed in the current bvec.
21898bd37aSMauro Carvalho Chehab
22898bd37aSMauro Carvalho ChehabThere are a bunch of new helper macros for hiding the gory details - in
23898bd37aSMauro Carvalho Chehabparticular, presenting the illusion of partially completed biovecs so that
24898bd37aSMauro Carvalho Chehabnormal code doesn't have to deal with bi_bvec_done.
25898bd37aSMauro Carvalho Chehab
26898bd37aSMauro Carvalho Chehab * Driver code should no longer refer to biovecs directly; we now have
27898bd37aSMauro Carvalho Chehab   bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
28898bd37aSMauro Carvalho Chehab   constructed from the raw biovecs but taking into account bi_bvec_done and
29898bd37aSMauro Carvalho Chehab   bi_size.
30898bd37aSMauro Carvalho Chehab
31898bd37aSMauro Carvalho Chehab   bio_for_each_segment() has been updated to take a bvec_iter argument
32898bd37aSMauro Carvalho Chehab   instead of an integer (that corresponded to bi_idx); for a lot of code the
33898bd37aSMauro Carvalho Chehab   conversion just required changing the types of the arguments to
34898bd37aSMauro Carvalho Chehab   bio_for_each_segment().
35898bd37aSMauro Carvalho Chehab
36898bd37aSMauro Carvalho Chehab * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
37898bd37aSMauro Carvalho Chehab   wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
38898bd37aSMauro Carvalho Chehab   advances the bio integrity's iter if present.
39898bd37aSMauro Carvalho Chehab
40898bd37aSMauro Carvalho Chehab   There is a lower level advance function - bvec_iter_advance() - which takes
41898bd37aSMauro Carvalho Chehab   a pointer to a biovec, not a bio; this is used by the bio integrity code.
42898bd37aSMauro Carvalho Chehab
439b2e0016SPavel BegunkovAs of 5.12 bvec segments with zero bv_len are not supported.
449b2e0016SPavel Begunkov
45898bd37aSMauro Carvalho ChehabWhat's all this get us?
46898bd37aSMauro Carvalho Chehab=======================
47898bd37aSMauro Carvalho Chehab
48898bd37aSMauro Carvalho ChehabHaving a real iterator, and making biovecs immutable, has a number of
49898bd37aSMauro Carvalho Chehabadvantages:
50898bd37aSMauro Carvalho Chehab
51898bd37aSMauro Carvalho Chehab * Before, iterating over bios was very awkward when you weren't processing
526f7f8ef7SGuoqing Jiang   exactly one bvec at a time - for example, bio_copy_data() in block/bio.c,
53898bd37aSMauro Carvalho Chehab   which copies the contents of one bio into another. Because the biovecs
54898bd37aSMauro Carvalho Chehab   wouldn't necessarily be the same size, the old code was tricky convoluted -
55898bd37aSMauro Carvalho Chehab   it had to walk two different bios at the same time, keeping both bi_idx and
56898bd37aSMauro Carvalho Chehab   and offset into the current biovec for each.
57898bd37aSMauro Carvalho Chehab
58898bd37aSMauro Carvalho Chehab   The new code is much more straightforward - have a look. This sort of
59898bd37aSMauro Carvalho Chehab   pattern comes up in a lot of places; a lot of drivers were essentially open
60898bd37aSMauro Carvalho Chehab   coding bvec iterators before, and having common implementation considerably
61898bd37aSMauro Carvalho Chehab   simplifies a lot of code.
62898bd37aSMauro Carvalho Chehab
63898bd37aSMauro Carvalho Chehab * Before, any code that might need to use the biovec after the bio had been
64898bd37aSMauro Carvalho Chehab   completed (perhaps to copy the data somewhere else, or perhaps to resubmit
65898bd37aSMauro Carvalho Chehab   it somewhere else if there was an error) had to save the entire bvec array
66898bd37aSMauro Carvalho Chehab   - again, this was being done in a fair number of places.
67898bd37aSMauro Carvalho Chehab
68898bd37aSMauro Carvalho Chehab * Biovecs can be shared between multiple bios - a bvec iter can represent an
69898bd37aSMauro Carvalho Chehab   arbitrary range of an existing biovec, both starting and ending midway
70898bd37aSMauro Carvalho Chehab   through biovecs. This is what enables efficient splitting of arbitrary
71898bd37aSMauro Carvalho Chehab   bios. Note that this means we _only_ use bi_size to determine when we've
72898bd37aSMauro Carvalho Chehab   reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
73898bd37aSMauro Carvalho Chehab   bi_size into account when constructing biovecs.
74898bd37aSMauro Carvalho Chehab
75898bd37aSMauro Carvalho Chehab * Splitting bios is now much simpler. The old bio_split() didn't even work on
76898bd37aSMauro Carvalho Chehab   bios with more than a single bvec! Now, we can efficiently split arbitrary
77898bd37aSMauro Carvalho Chehab   size bios - because the new bio can share the old bio's biovec.
78898bd37aSMauro Carvalho Chehab
79898bd37aSMauro Carvalho Chehab   Care must be taken to ensure the biovec isn't freed while the split bio is
80898bd37aSMauro Carvalho Chehab   still using it, in case the original bio completes first, though. Using
81898bd37aSMauro Carvalho Chehab   bio_chain() when splitting bios helps with this.
82898bd37aSMauro Carvalho Chehab
83898bd37aSMauro Carvalho Chehab * Submitting partially completed bios is now perfectly fine - this comes up
84898bd37aSMauro Carvalho Chehab   occasionally in stacking block drivers and various code (e.g. md and
85898bd37aSMauro Carvalho Chehab   bcache) had some ugly workarounds for this.
86898bd37aSMauro Carvalho Chehab
87898bd37aSMauro Carvalho Chehab   It used to be the case that submitting a partially completed bio would work
88898bd37aSMauro Carvalho Chehab   fine to _most_ devices, but since accessing the raw bvec array was the
89898bd37aSMauro Carvalho Chehab   norm, not all drivers would respect bi_idx and those would break. Now,
90898bd37aSMauro Carvalho Chehab   since all drivers _must_ go through the bvec iterator - and have been
91898bd37aSMauro Carvalho Chehab   audited to make sure they are - submitting partially completed bios is
92898bd37aSMauro Carvalho Chehab   perfectly fine.
93898bd37aSMauro Carvalho Chehab
94898bd37aSMauro Carvalho ChehabOther implications:
95898bd37aSMauro Carvalho Chehab===================
96898bd37aSMauro Carvalho Chehab
97898bd37aSMauro Carvalho Chehab * Almost all usage of bi_idx is now incorrect and has been removed; instead,
98898bd37aSMauro Carvalho Chehab   where previously you would have used bi_idx you'd now use a bvec_iter,
99898bd37aSMauro Carvalho Chehab   probably passing it to one of the helper macros.
100898bd37aSMauro Carvalho Chehab
101898bd37aSMauro Carvalho Chehab   I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you
102898bd37aSMauro Carvalho Chehab   now use bio_iter_iovec(), which takes a bvec_iter and returns a
103898bd37aSMauro Carvalho Chehab   literal struct bio_vec - constructed on the fly from the raw biovec but
104898bd37aSMauro Carvalho Chehab   taking into account bi_bvec_done (and bi_size).
105898bd37aSMauro Carvalho Chehab
106898bd37aSMauro Carvalho Chehab * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
107898bd37aSMauro Carvalho Chehab   doesn't actually own the bio. The reason is twofold: firstly, it's not
108898bd37aSMauro Carvalho Chehab   actually needed for iterating over the bio anymore - we only use bi_size.
109898bd37aSMauro Carvalho Chehab   Secondly, when cloning a bio and reusing (a portion of) the original bio's
110898bd37aSMauro Carvalho Chehab   biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
111898bd37aSMauro Carvalho Chehab   over all the biovecs in the new bio - which is silly as it's not needed.
112898bd37aSMauro Carvalho Chehab
113898bd37aSMauro Carvalho Chehab   So, don't use bi_vcnt anymore.
114898bd37aSMauro Carvalho Chehab
115898bd37aSMauro Carvalho Chehab * The current interface allows the block layer to split bios as needed, so we
116898bd37aSMauro Carvalho Chehab   could eliminate a lot of complexity particularly in stacked drivers. Code
117898bd37aSMauro Carvalho Chehab   that creates bios can then create whatever size bios are convenient, and
118898bd37aSMauro Carvalho Chehab   more importantly stacked drivers don't have to deal with both their own bio
119898bd37aSMauro Carvalho Chehab   size limitations and the limitations of the underlying devices. Thus
120898bd37aSMauro Carvalho Chehab   there's no need to define ->merge_bvec_fn() callbacks for individual block
121898bd37aSMauro Carvalho Chehab   drivers.
122898bd37aSMauro Carvalho Chehab
123898bd37aSMauro Carvalho ChehabUsage of helpers:
124898bd37aSMauro Carvalho Chehab=================
125898bd37aSMauro Carvalho Chehab
126898bd37aSMauro Carvalho Chehab* The following helpers whose names have the suffix of `_all` can only be used
127898bd37aSMauro Carvalho Chehab  on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers
128898bd37aSMauro Carvalho Chehab  shouldn't use them because the bio may have been split before it reached the
129898bd37aSMauro Carvalho Chehab  driver.
130898bd37aSMauro Carvalho Chehab
131898bd37aSMauro Carvalho Chehab::
132898bd37aSMauro Carvalho Chehab
133898bd37aSMauro Carvalho Chehab	bio_for_each_segment_all()
1341072c12dSOmar Sandoval	bio_for_each_bvec_all()
135898bd37aSMauro Carvalho Chehab	bio_first_bvec_all()
136898bd37aSMauro Carvalho Chehab	bio_first_page_all()
137*6d2790d9SZhangPeng	bio_first_folio_all()
138898bd37aSMauro Carvalho Chehab	bio_last_bvec_all()
139898bd37aSMauro Carvalho Chehab
140898bd37aSMauro Carvalho Chehab* The following helpers iterate over single-page segment. The passed 'struct
141898bd37aSMauro Carvalho Chehab  bio_vec' will contain a single-page IO vector during the iteration::
142898bd37aSMauro Carvalho Chehab
143898bd37aSMauro Carvalho Chehab	bio_for_each_segment()
144898bd37aSMauro Carvalho Chehab	bio_for_each_segment_all()
145898bd37aSMauro Carvalho Chehab
146898bd37aSMauro Carvalho Chehab* The following helpers iterate over multi-page bvec. The passed 'struct
147898bd37aSMauro Carvalho Chehab  bio_vec' will contain a multi-page IO vector during the iteration::
148898bd37aSMauro Carvalho Chehab
149898bd37aSMauro Carvalho Chehab	bio_for_each_bvec()
1501072c12dSOmar Sandoval	bio_for_each_bvec_all()
151898bd37aSMauro Carvalho Chehab	rq_for_each_bvec()
152