1.. SPDX-License-Identifier: GPL-2.0 2 3================================= 4NETWORK FILESYSTEM HELPER LIBRARY 5================================= 6 7.. Contents: 8 9 - Overview. 10 - Buffered read helpers. 11 - Read helper functions. 12 - Read helper structures. 13 - Read helper operations. 14 - Read helper procedure. 15 - Read helper cache API. 16 17 18Overview 19======== 20 21The network filesystem helper library is a set of functions designed to aid a 22network filesystem in implementing VM/VFS operations. For the moment, that 23just includes turning various VM buffered read operations into requests to read 24from the server. The helper library, however, can also interpose other 25services, such as local caching or local data encryption. 26 27Note that the library module doesn't link against local caching directly, so 28access must be provided by the netfs. 29 30 31Buffered Read Helpers 32===================== 33 34The library provides a set of read helpers that handle the ->readpage(), 35->readahead() and much of the ->write_begin() VM operations and translate them 36into a common call framework. 37 38The following services are provided: 39 40 * Handles transparent huge pages (THPs). 41 42 * Insulates the netfs from VM interface changes. 43 44 * Allows the netfs to arbitrarily split reads up into pieces, even ones that 45 don't match page sizes or page alignments and that may cross pages. 46 47 * Allows the netfs to expand a readahead request in both directions to meet 48 its needs. 49 50 * Allows the netfs to partially fulfil a read, which will then be resubmitted. 51 52 * Handles local caching, allowing cached data and server-read data to be 53 interleaved for a single request. 54 55 * Handles clearing of bufferage that aren't on the server. 56 57 * Handle retrying of reads that failed, switching reads from the cache to the 58 server as necessary. 59 60 * In the future, this is a place that other services can be performed, such as 61 local encryption of data to be stored remotely or in the cache. 62 63From the network filesystem, the helpers require a table of operations. This 64includes a mandatory method to issue a read operation along with a number of 65optional methods. 66 67 68Read Helper Functions 69--------------------- 70 71Three read helpers are provided:: 72 73 * void netfs_readahead(struct readahead_control *ractl, 74 const struct netfs_read_request_ops *ops, 75 void *netfs_priv);`` 76 * int netfs_readpage(struct file *file, 77 struct page *page, 78 const struct netfs_read_request_ops *ops, 79 void *netfs_priv); 80 * int netfs_write_begin(struct file *file, 81 struct address_space *mapping, 82 loff_t pos, 83 unsigned int len, 84 unsigned int flags, 85 struct page **_page, 86 void **_fsdata, 87 const struct netfs_read_request_ops *ops, 88 void *netfs_priv); 89 90Each corresponds to a VM operation, with the addition of a couple of parameters 91for the use of the read helpers: 92 93 * ``ops`` 94 95 A table of operations through which the helpers can talk to the filesystem. 96 97 * ``netfs_priv`` 98 99 Filesystem private data (can be NULL). 100 101Both of these values will be stored into the read request structure. 102 103For ->readahead() and ->readpage(), the network filesystem should just jump 104into the corresponding read helper; whereas for ->write_begin(), it may be a 105little more complicated as the network filesystem might want to flush 106conflicting writes or track dirty data and needs to put the acquired page if an 107error occurs after calling the helper. 108 109The helpers manage the read request, calling back into the network filesystem 110through the suppplied table of operations. Waits will be performed as 111necessary before returning for helpers that are meant to be synchronous. 112 113If an error occurs and netfs_priv is non-NULL, ops->cleanup() will be called to 114deal with it. If some parts of the request are in progress when an error 115occurs, the request will get partially completed if sufficient data is read. 116 117Additionally, there is:: 118 119 * void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, 120 ssize_t transferred_or_error, 121 bool was_async); 122 123which should be called to complete a read subrequest. This is given the number 124of bytes transferred or a negative error code, plus a flag indicating whether 125the operation was asynchronous (ie. whether the follow-on processing can be 126done in the current context, given this may involve sleeping). 127 128 129Read Helper Structures 130---------------------- 131 132The read helpers make use of a couple of structures to maintain the state of 133the read. The first is a structure that manages a read request as a whole:: 134 135 struct netfs_read_request { 136 struct inode *inode; 137 struct address_space *mapping; 138 struct netfs_cache_resources cache_resources; 139 void *netfs_priv; 140 loff_t start; 141 size_t len; 142 loff_t i_size; 143 const struct netfs_read_request_ops *netfs_ops; 144 unsigned int debug_id; 145 ... 146 }; 147 148The above fields are the ones the netfs can use. They are: 149 150 * ``inode`` 151 * ``mapping`` 152 153 The inode and the address space of the file being read from. The mapping 154 may or may not point to inode->i_data. 155 156 * ``cache_resources`` 157 158 Resources for the local cache to use, if present. 159 160 * ``netfs_priv`` 161 162 The network filesystem's private data. The value for this can be passed in 163 to the helper functions or set during the request. The ->cleanup() op will 164 be called if this is non-NULL at the end. 165 166 * ``start`` 167 * ``len`` 168 169 The file position of the start of the read request and the length. These 170 may be altered by the ->expand_readahead() op. 171 172 * ``i_size`` 173 174 The size of the file at the start of the request. 175 176 * ``netfs_ops`` 177 178 A pointer to the operation table. The value for this is passed into the 179 helper functions. 180 181 * ``debug_id`` 182 183 A number allocated to this operation that can be displayed in trace lines 184 for reference. 185 186 187The second structure is used to manage individual slices of the overall read 188request:: 189 190 struct netfs_read_subrequest { 191 struct netfs_read_request *rreq; 192 loff_t start; 193 size_t len; 194 size_t transferred; 195 unsigned long flags; 196 unsigned short debug_index; 197 ... 198 }; 199 200Each subrequest is expected to access a single source, though the helpers will 201handle falling back from one source type to another. The members are: 202 203 * ``rreq`` 204 205 A pointer to the read request. 206 207 * ``start`` 208 * ``len`` 209 210 The file position of the start of this slice of the read request and the 211 length. 212 213 * ``transferred`` 214 215 The amount of data transferred so far of the length of this slice. The 216 network filesystem or cache should start the operation this far into the 217 slice. If a short read occurs, the helpers will call again, having updated 218 this to reflect the amount read so far. 219 220 * ``flags`` 221 222 Flags pertaining to the read. There are two of interest to the filesystem 223 or cache: 224 225 * ``NETFS_SREQ_CLEAR_TAIL`` 226 227 This can be set to indicate that the remainder of the slice, from 228 transferred to len, should be cleared. 229 230 * ``NETFS_SREQ_SEEK_DATA_READ`` 231 232 This is a hint to the cache that it might want to try skipping ahead to 233 the next data (ie. using SEEK_DATA). 234 235 * ``debug_index`` 236 237 A number allocated to this slice that can be displayed in trace lines for 238 reference. 239 240 241Read Helper Operations 242---------------------- 243 244The network filesystem must provide the read helpers with a table of operations 245through which it can issue requests and negotiate:: 246 247 struct netfs_read_request_ops { 248 void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); 249 bool (*is_cache_enabled)(struct inode *inode); 250 int (*begin_cache_operation)(struct netfs_read_request *rreq); 251 void (*expand_readahead)(struct netfs_read_request *rreq); 252 bool (*clamp_length)(struct netfs_read_subrequest *subreq); 253 void (*issue_op)(struct netfs_read_subrequest *subreq); 254 bool (*is_still_valid)(struct netfs_read_request *rreq); 255 int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, 256 struct page *page, void **_fsdata); 257 void (*done)(struct netfs_read_request *rreq); 258 void (*cleanup)(struct address_space *mapping, void *netfs_priv); 259 }; 260 261The operations are as follows: 262 263 * ``init_rreq()`` 264 265 [Optional] This is called to initialise the request structure. It is given 266 the file for reference and can modify the ->netfs_priv value. 267 268 * ``is_cache_enabled()`` 269 270 [Required] This is called by netfs_write_begin() to ask if the file is being 271 cached. It should return true if it is being cached and false otherwise. 272 273 * ``begin_cache_operation()`` 274 275 [Optional] This is called to ask the network filesystem to call into the 276 cache (if present) to initialise the caching state for this read. The netfs 277 library module cannot access the cache directly, so the cache should call 278 something like fscache_begin_read_operation() to do this. 279 280 The cache gets to store its state in ->cache_resources and must set a table 281 of operations of its own there (though of a different type). 282 283 This should return 0 on success and an error code otherwise. If an error is 284 reported, the operation may proceed anyway, just without local caching (only 285 out of memory and interruption errors cause failure here). 286 287 * ``expand_readahead()`` 288 289 [Optional] This is called to allow the filesystem to expand the size of a 290 readahead read request. The filesystem gets to expand the request in both 291 directions, though it's not permitted to reduce it as the numbers may 292 represent an allocation already made. If local caching is enabled, it gets 293 to expand the request first. 294 295 Expansion is communicated by changing ->start and ->len in the request 296 structure. Note that if any change is made, ->len must be increased by at 297 least as much as ->start is reduced. 298 299 * ``clamp_length()`` 300 301 [Optional] This is called to allow the filesystem to reduce the size of a 302 subrequest. The filesystem can use this, for example, to chop up a request 303 that has to be split across multiple servers or to put multiple reads in 304 flight. 305 306 This should return 0 on success and an error code on error. 307 308 * ``issue_op()`` 309 310 [Required] The helpers use this to dispatch a subrequest to the server for 311 reading. In the subrequest, ->start, ->len and ->transferred indicate what 312 data should be read from the server. 313 314 There is no return value; the netfs_subreq_terminated() function should be 315 called to indicate whether or not the operation succeeded and how much data 316 it transferred. The filesystem also should not deal with setting pages 317 uptodate, unlocking them or dropping their refs - the helpers need to deal 318 with this as they have to coordinate with copying to the local cache. 319 320 Note that the helpers have the pages locked, but not pinned. It is possible 321 to use the ITER_XARRAY iov iterator to refer to the range of the inode that 322 is being operated upon without the need to allocate large bvec tables. 323 324 * ``is_still_valid()`` 325 326 [Optional] This is called to find out if the data just read from the local 327 cache is still valid. It should return true if it is still valid and false 328 if not. If it's not still valid, it will be reread from the server. 329 330 * ``check_write_begin()`` 331 332 [Optional] This is called from the netfs_write_begin() helper once it has 333 allocated/grabbed the page to be modified to allow the filesystem to flush 334 conflicting state before allowing it to be modified. 335 336 It should return 0 if everything is now fine, -EAGAIN if the page should be 337 regrabbed and any other error code to abort the operation. 338 339 * ``done`` 340 341 [Optional] This is called after the pages in the request have all been 342 unlocked (and marked uptodate if applicable). 343 344 * ``cleanup`` 345 346 [Optional] This is called as the request is being deallocated so that the 347 filesystem can clean up ->netfs_priv. 348 349 350 351Read Helper Procedure 352--------------------- 353 354The read helpers work by the following general procedure: 355 356 * Set up the request. 357 358 * For readahead, allow the local cache and then the network filesystem to 359 propose expansions to the read request. This is then proposed to the VM. 360 If the VM cannot fully perform the expansion, a partially expanded read will 361 be performed, though this may not get written to the cache in its entirety. 362 363 * Loop around slicing chunks off of the request to form subrequests: 364 365 * If a local cache is present, it gets to do the slicing, otherwise the 366 helpers just try to generate maximal slices. 367 368 * The network filesystem gets to clamp the size of each slice if it is to be 369 the source. This allows rsize and chunking to be implemented. 370 371 * The helpers issue a read from the cache or a read from the server or just 372 clears the slice as appropriate. 373 374 * The next slice begins at the end of the last one. 375 376 * As slices finish being read, they terminate. 377 378 * When all the subrequests have terminated, the subrequests are assessed and 379 any that are short or have failed are reissued: 380 381 * Failed cache requests are issued against the server instead. 382 383 * Failed server requests just fail. 384 385 * Short reads against either source will be reissued against that source 386 provided they have transferred some more data: 387 388 * The cache may need to skip holes that it can't do DIO from. 389 390 * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the 391 end of the slice instead of reissuing. 392 393 * Once the data is read, the pages that have been fully read/cleared: 394 395 * Will be marked uptodate. 396 397 * If a cache is present, will be marked with PG_fscache. 398 399 * Unlocked 400 401 * Any pages that need writing to the cache will then have DIO writes issued. 402 403 * Synchronous operations will wait for reading to be complete. 404 405 * Writes to the cache will proceed asynchronously and the pages will have the 406 PG_fscache mark removed when that completes. 407 408 * The request structures will be cleaned up when everything has completed. 409 410 411Read Helper Cache API 412--------------------- 413 414When implementing a local cache to be used by the read helpers, two things are 415required: some way for the network filesystem to initialise the caching for a 416read request and a table of operations for the helpers to call. 417 418The network filesystem's ->begin_cache_operation() method is called to set up a 419cache and this must call into the cache to do the work. If using fscache, for 420example, the cache would call:: 421 422 int fscache_begin_read_operation(struct netfs_read_request *rreq, 423 struct fscache_cookie *cookie); 424 425passing in the request pointer and the cookie corresponding to the file. 426 427The netfs_read_request object contains a place for the cache to hang its 428state:: 429 430 struct netfs_cache_resources { 431 const struct netfs_cache_ops *ops; 432 void *cache_priv; 433 void *cache_priv2; 434 }; 435 436This contains an operations table pointer and two private pointers. The 437operation table looks like the following:: 438 439 struct netfs_cache_ops { 440 void (*end_operation)(struct netfs_cache_resources *cres); 441 442 void (*expand_readahead)(struct netfs_cache_resources *cres, 443 loff_t *_start, size_t *_len, loff_t i_size); 444 445 enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq, 446 loff_t i_size); 447 448 int (*read)(struct netfs_cache_resources *cres, 449 loff_t start_pos, 450 struct iov_iter *iter, 451 bool seek_data, 452 netfs_io_terminated_t term_func, 453 void *term_func_priv); 454 455 int (*write)(struct netfs_cache_resources *cres, 456 loff_t start_pos, 457 struct iov_iter *iter, 458 netfs_io_terminated_t term_func, 459 void *term_func_priv); 460 }; 461 462With a termination handler function pointer:: 463 464 typedef void (*netfs_io_terminated_t)(void *priv, 465 ssize_t transferred_or_error, 466 bool was_async); 467 468The methods defined in the table are: 469 470 * ``end_operation()`` 471 472 [Required] Called to clean up the resources at the end of the read request. 473 474 * ``expand_readahead()`` 475 476 [Optional] Called at the beginning of a netfs_readahead() operation to allow 477 the cache to expand a request in either direction. This allows the cache to 478 size the request appropriately for the cache granularity. 479 480 The function is passed poiners to the start and length in its parameters, 481 plus the size of the file for reference, and adjusts the start and length 482 appropriately. It should return one of: 483 484 * ``NETFS_FILL_WITH_ZEROES`` 485 * ``NETFS_DOWNLOAD_FROM_SERVER`` 486 * ``NETFS_READ_FROM_CACHE`` 487 * ``NETFS_INVALID_READ`` 488 489 to indicate whether the slice should just be cleared or whether it should be 490 downloaded from the server or read from the cache - or whether slicing 491 should be given up at the current point. 492 493 * ``prepare_read()`` 494 495 [Required] Called to configure the next slice of a request. ->start and 496 ->len in the subrequest indicate where and how big the next slice can be; 497 the cache gets to reduce the length to match its granularity requirements. 498 499 * ``read()`` 500 501 [Required] Called to read from the cache. The start file offset is given 502 along with an iterator to read to, which gives the length also. It can be 503 given a hint requesting that it seek forward from that start position for 504 data. 505 506 Also provided is a pointer to a termination handler function and private 507 data to pass to that function. The termination function should be called 508 with the number of bytes transferred or an error code, plus a flag 509 indicating whether the termination is definitely happening in the caller's 510 context. 511 512 * ``write()`` 513 514 [Required] Called to write to the cache. The start file offset is given 515 along with an iterator to write from, which gives the length also. 516 517 Also provided is a pointer to a termination handler function and private 518 data to pass to that function. The termination function should be called 519 with the number of bytes transferred or an error code, plus a flag 520 indicating whether the termination is definitely happening in the caller's 521 context. 522 523Note that these methods are passed a pointer to the cache resource structure, 524not the read request structure as they could be used in other situations where 525there isn't a read request structure as well, such as writing dirty data to the 526cache. 527