1898bd37aSMauro Carvalho Chehab====================================== 2898bd37aSMauro Carvalho ChehabImmutable biovecs and biovec iterators 3898bd37aSMauro Carvalho Chehab====================================== 4898bd37aSMauro Carvalho Chehab 5898bd37aSMauro Carvalho ChehabKent Overstreet <kmo@daterainc.com> 6898bd37aSMauro Carvalho Chehab 7898bd37aSMauro Carvalho ChehabAs of 3.13, biovecs should never be modified after a bio has been submitted. 8898bd37aSMauro Carvalho ChehabInstead, we have a new struct bvec_iter which represents a range of a biovec - 9898bd37aSMauro Carvalho Chehabthe iterator will be modified as the bio is completed, not the biovec. 10898bd37aSMauro Carvalho Chehab 11898bd37aSMauro Carvalho ChehabMore specifically, old code that needed to partially complete a bio would 12898bd37aSMauro Carvalho Chehabupdate bi_sector and bi_size, and advance bi_idx to the next biovec. If it 13898bd37aSMauro Carvalho Chehabended up partway through a biovec, it would increment bv_offset and decrement 14898bd37aSMauro Carvalho Chehabbv_len by the number of bytes completed in that biovec. 15898bd37aSMauro Carvalho Chehab 16898bd37aSMauro Carvalho ChehabIn the new scheme of things, everything that must be mutated in order to 17898bd37aSMauro Carvalho Chehabpartially complete a bio is segregated into struct bvec_iter: bi_sector, 18898bd37aSMauro Carvalho Chehabbi_size and bi_idx have been moved there; and instead of modifying bv_offset 19898bd37aSMauro Carvalho Chehaband bv_len, struct bvec_iter has bi_bvec_done, which represents the number of 20898bd37aSMauro Carvalho Chehabbytes completed in the current bvec. 21898bd37aSMauro Carvalho Chehab 22898bd37aSMauro Carvalho ChehabThere are a bunch of new helper macros for hiding the gory details - in 23898bd37aSMauro Carvalho Chehabparticular, presenting the illusion of partially completed biovecs so that 24898bd37aSMauro Carvalho Chehabnormal code doesn't have to deal with bi_bvec_done. 25898bd37aSMauro Carvalho Chehab 26898bd37aSMauro Carvalho Chehab * Driver code should no longer refer to biovecs directly; we now have 27898bd37aSMauro Carvalho Chehab bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs, 28898bd37aSMauro Carvalho Chehab constructed from the raw biovecs but taking into account bi_bvec_done and 29898bd37aSMauro Carvalho Chehab bi_size. 30898bd37aSMauro Carvalho Chehab 31898bd37aSMauro Carvalho Chehab bio_for_each_segment() has been updated to take a bvec_iter argument 32898bd37aSMauro Carvalho Chehab instead of an integer (that corresponded to bi_idx); for a lot of code the 33898bd37aSMauro Carvalho Chehab conversion just required changing the types of the arguments to 34898bd37aSMauro Carvalho Chehab bio_for_each_segment(). 35898bd37aSMauro Carvalho Chehab 36898bd37aSMauro Carvalho Chehab * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a 37898bd37aSMauro Carvalho Chehab wrapper around bio_advance_iter() that operates on bio->bi_iter, and also 38898bd37aSMauro Carvalho Chehab advances the bio integrity's iter if present. 39898bd37aSMauro Carvalho Chehab 40898bd37aSMauro Carvalho Chehab There is a lower level advance function - bvec_iter_advance() - which takes 41898bd37aSMauro Carvalho Chehab a pointer to a biovec, not a bio; this is used by the bio integrity code. 42898bd37aSMauro Carvalho Chehab 439b2e0016SPavel BegunkovAs of 5.12 bvec segments with zero bv_len are not supported. 449b2e0016SPavel Begunkov 45898bd37aSMauro Carvalho ChehabWhat's all this get us? 46898bd37aSMauro Carvalho Chehab======================= 47898bd37aSMauro Carvalho Chehab 48898bd37aSMauro Carvalho ChehabHaving a real iterator, and making biovecs immutable, has a number of 49898bd37aSMauro Carvalho Chehabadvantages: 50898bd37aSMauro Carvalho Chehab 51898bd37aSMauro Carvalho Chehab * Before, iterating over bios was very awkward when you weren't processing 526f7f8ef7SGuoqing Jiang exactly one bvec at a time - for example, bio_copy_data() in block/bio.c, 53898bd37aSMauro Carvalho Chehab which copies the contents of one bio into another. Because the biovecs 54898bd37aSMauro Carvalho Chehab wouldn't necessarily be the same size, the old code was tricky convoluted - 55898bd37aSMauro Carvalho Chehab it had to walk two different bios at the same time, keeping both bi_idx and 56898bd37aSMauro Carvalho Chehab and offset into the current biovec for each. 57898bd37aSMauro Carvalho Chehab 58898bd37aSMauro Carvalho Chehab The new code is much more straightforward - have a look. This sort of 59898bd37aSMauro Carvalho Chehab pattern comes up in a lot of places; a lot of drivers were essentially open 60898bd37aSMauro Carvalho Chehab coding bvec iterators before, and having common implementation considerably 61898bd37aSMauro Carvalho Chehab simplifies a lot of code. 62898bd37aSMauro Carvalho Chehab 63898bd37aSMauro Carvalho Chehab * Before, any code that might need to use the biovec after the bio had been 64898bd37aSMauro Carvalho Chehab completed (perhaps to copy the data somewhere else, or perhaps to resubmit 65898bd37aSMauro Carvalho Chehab it somewhere else if there was an error) had to save the entire bvec array 66898bd37aSMauro Carvalho Chehab - again, this was being done in a fair number of places. 67898bd37aSMauro Carvalho Chehab 68898bd37aSMauro Carvalho Chehab * Biovecs can be shared between multiple bios - a bvec iter can represent an 69898bd37aSMauro Carvalho Chehab arbitrary range of an existing biovec, both starting and ending midway 70898bd37aSMauro Carvalho Chehab through biovecs. This is what enables efficient splitting of arbitrary 71898bd37aSMauro Carvalho Chehab bios. Note that this means we _only_ use bi_size to determine when we've 72898bd37aSMauro Carvalho Chehab reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes 73898bd37aSMauro Carvalho Chehab bi_size into account when constructing biovecs. 74898bd37aSMauro Carvalho Chehab 75898bd37aSMauro Carvalho Chehab * Splitting bios is now much simpler. The old bio_split() didn't even work on 76898bd37aSMauro Carvalho Chehab bios with more than a single bvec! Now, we can efficiently split arbitrary 77898bd37aSMauro Carvalho Chehab size bios - because the new bio can share the old bio's biovec. 78898bd37aSMauro Carvalho Chehab 79898bd37aSMauro Carvalho Chehab Care must be taken to ensure the biovec isn't freed while the split bio is 80898bd37aSMauro Carvalho Chehab still using it, in case the original bio completes first, though. Using 81898bd37aSMauro Carvalho Chehab bio_chain() when splitting bios helps with this. 82898bd37aSMauro Carvalho Chehab 83898bd37aSMauro Carvalho Chehab * Submitting partially completed bios is now perfectly fine - this comes up 84898bd37aSMauro Carvalho Chehab occasionally in stacking block drivers and various code (e.g. md and 85898bd37aSMauro Carvalho Chehab bcache) had some ugly workarounds for this. 86898bd37aSMauro Carvalho Chehab 87898bd37aSMauro Carvalho Chehab It used to be the case that submitting a partially completed bio would work 88898bd37aSMauro Carvalho Chehab fine to _most_ devices, but since accessing the raw bvec array was the 89898bd37aSMauro Carvalho Chehab norm, not all drivers would respect bi_idx and those would break. Now, 90898bd37aSMauro Carvalho Chehab since all drivers _must_ go through the bvec iterator - and have been 91898bd37aSMauro Carvalho Chehab audited to make sure they are - submitting partially completed bios is 92898bd37aSMauro Carvalho Chehab perfectly fine. 93898bd37aSMauro Carvalho Chehab 94898bd37aSMauro Carvalho ChehabOther implications: 95898bd37aSMauro Carvalho Chehab=================== 96898bd37aSMauro Carvalho Chehab 97898bd37aSMauro Carvalho Chehab * Almost all usage of bi_idx is now incorrect and has been removed; instead, 98898bd37aSMauro Carvalho Chehab where previously you would have used bi_idx you'd now use a bvec_iter, 99898bd37aSMauro Carvalho Chehab probably passing it to one of the helper macros. 100898bd37aSMauro Carvalho Chehab 101898bd37aSMauro Carvalho Chehab I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you 102898bd37aSMauro Carvalho Chehab now use bio_iter_iovec(), which takes a bvec_iter and returns a 103898bd37aSMauro Carvalho Chehab literal struct bio_vec - constructed on the fly from the raw biovec but 104898bd37aSMauro Carvalho Chehab taking into account bi_bvec_done (and bi_size). 105898bd37aSMauro Carvalho Chehab 106898bd37aSMauro Carvalho Chehab * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that 107898bd37aSMauro Carvalho Chehab doesn't actually own the bio. The reason is twofold: firstly, it's not 108898bd37aSMauro Carvalho Chehab actually needed for iterating over the bio anymore - we only use bi_size. 109898bd37aSMauro Carvalho Chehab Secondly, when cloning a bio and reusing (a portion of) the original bio's 110898bd37aSMauro Carvalho Chehab biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate 111898bd37aSMauro Carvalho Chehab over all the biovecs in the new bio - which is silly as it's not needed. 112898bd37aSMauro Carvalho Chehab 113898bd37aSMauro Carvalho Chehab So, don't use bi_vcnt anymore. 114898bd37aSMauro Carvalho Chehab 115898bd37aSMauro Carvalho Chehab * The current interface allows the block layer to split bios as needed, so we 116898bd37aSMauro Carvalho Chehab could eliminate a lot of complexity particularly in stacked drivers. Code 117898bd37aSMauro Carvalho Chehab that creates bios can then create whatever size bios are convenient, and 118898bd37aSMauro Carvalho Chehab more importantly stacked drivers don't have to deal with both their own bio 119898bd37aSMauro Carvalho Chehab size limitations and the limitations of the underlying devices. Thus 120898bd37aSMauro Carvalho Chehab there's no need to define ->merge_bvec_fn() callbacks for individual block 121898bd37aSMauro Carvalho Chehab drivers. 122898bd37aSMauro Carvalho Chehab 123898bd37aSMauro Carvalho ChehabUsage of helpers: 124898bd37aSMauro Carvalho Chehab================= 125898bd37aSMauro Carvalho Chehab 126898bd37aSMauro Carvalho Chehab* The following helpers whose names have the suffix of `_all` can only be used 127898bd37aSMauro Carvalho Chehab on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers 128898bd37aSMauro Carvalho Chehab shouldn't use them because the bio may have been split before it reached the 129898bd37aSMauro Carvalho Chehab driver. 130898bd37aSMauro Carvalho Chehab 131898bd37aSMauro Carvalho Chehab:: 132898bd37aSMauro Carvalho Chehab 133898bd37aSMauro Carvalho Chehab bio_for_each_segment_all() 1341072c12dSOmar Sandoval bio_for_each_bvec_all() 135898bd37aSMauro Carvalho Chehab bio_first_bvec_all() 136898bd37aSMauro Carvalho Chehab bio_first_page_all() 137*6d2790d9SZhangPeng bio_first_folio_all() 138898bd37aSMauro Carvalho Chehab bio_last_bvec_all() 139898bd37aSMauro Carvalho Chehab 140898bd37aSMauro Carvalho Chehab* The following helpers iterate over single-page segment. The passed 'struct 141898bd37aSMauro Carvalho Chehab bio_vec' will contain a single-page IO vector during the iteration:: 142898bd37aSMauro Carvalho Chehab 143898bd37aSMauro Carvalho Chehab bio_for_each_segment() 144898bd37aSMauro Carvalho Chehab bio_for_each_segment_all() 145898bd37aSMauro Carvalho Chehab 146898bd37aSMauro Carvalho Chehab* The following helpers iterate over multi-page bvec. The passed 'struct 147898bd37aSMauro Carvalho Chehab bio_vec' will contain a multi-page IO vector during the iteration:: 148898bd37aSMauro Carvalho Chehab 149898bd37aSMauro Carvalho Chehab bio_for_each_bvec() 1501072c12dSOmar Sandoval bio_for_each_bvec_all() 151898bd37aSMauro Carvalho Chehab rq_for_each_bvec() 152