1.. SPDX-License-Identifier: GPL-2.0 2 3================================= 4Network Filesystem Helper Library 5================================= 6 7.. Contents: 8 9 - Overview. 10 - Per-inode context. 11 - Inode context helper functions. 12 - Buffered read helpers. 13 - Read helper functions. 14 - Read helper structures. 15 - Read helper operations. 16 - Read helper procedure. 17 - Read helper cache API. 18 19 20Overview 21======== 22 23The network filesystem helper library is a set of functions designed to aid a 24network filesystem in implementing VM/VFS operations. For the moment, that 25just includes turning various VM buffered read operations into requests to read 26from the server. The helper library, however, can also interpose other 27services, such as local caching or local data encryption. 28 29Note that the library module doesn't link against local caching directly, so 30access must be provided by the netfs. 31 32 33Per-Inode Context 34================= 35 36The network filesystem helper library needs a place to store a bit of state for 37its use on each netfs inode it is helping to manage. To this end, a context 38structure is defined:: 39 40 struct netfs_i_context { 41 const struct netfs_request_ops *ops; 42 struct fscache_cookie *cache; 43 }; 44 45A network filesystem that wants to use netfs lib must place one of these 46directly after the VFS ``struct inode`` it allocates, usually as part of its 47own struct. This can be done in a way similar to the following:: 48 49 struct my_inode { 50 struct { 51 /* These must be contiguous */ 52 struct inode vfs_inode; 53 struct netfs_i_context netfs_ctx; 54 }; 55 ... 56 }; 57 58This allows netfslib to find its state by simple offset from the inode pointer, 59thereby allowing the netfslib helper functions to be pointed to directly by the 60VFS/VM operation tables. 61 62The structure contains the following fields: 63 64 * ``ops`` 65 66 The set of operations provided by the network filesystem to netfslib. 67 68 * ``cache`` 69 70 Local caching cookie, or NULL if no caching is enabled. This field does not 71 exist if fscache is disabled. 72 73 74Inode Context Helper Functions 75------------------------------ 76 77To help deal with the per-inode context, a number helper functions are 78provided. Firstly, a function to perform basic initialisation on a context and 79set the operations table pointer:: 80 81 void netfs_i_context_init(struct inode *inode, 82 const struct netfs_request_ops *ops); 83 84then two functions to cast between the VFS inode structure and the netfs 85context:: 86 87 struct netfs_i_context *netfs_i_context(struct inode *inode); 88 struct inode *netfs_inode(struct netfs_i_context *ctx); 89 90and finally, a function to get the cache cookie pointer from the context 91attached to an inode (or NULL if fscache is disabled):: 92 93 struct fscache_cookie *netfs_i_cookie(struct inode *inode); 94 95 96Buffered Read Helpers 97===================== 98 99The library provides a set of read helpers that handle the ->readpage(), 100->readahead() and much of the ->write_begin() VM operations and translate them 101into a common call framework. 102 103The following services are provided: 104 105 * Handle folios that span multiple pages. 106 107 * Insulate the netfs from VM interface changes. 108 109 * Allow the netfs to arbitrarily split reads up into pieces, even ones that 110 don't match folio sizes or folio alignments and that may cross folios. 111 112 * Allow the netfs to expand a readahead request in both directions to meet its 113 needs. 114 115 * Allow the netfs to partially fulfil a read, which will then be resubmitted. 116 117 * Handle local caching, allowing cached data and server-read data to be 118 interleaved for a single request. 119 120 * Handle clearing of bufferage that aren't on the server. 121 122 * Handle retrying of reads that failed, switching reads from the cache to the 123 server as necessary. 124 125 * In the future, this is a place that other services can be performed, such as 126 local encryption of data to be stored remotely or in the cache. 127 128From the network filesystem, the helpers require a table of operations. This 129includes a mandatory method to issue a read operation along with a number of 130optional methods. 131 132 133Read Helper Functions 134--------------------- 135 136Three read helpers are provided:: 137 138 void netfs_readahead(struct readahead_control *ractl); 139 int netfs_readpage(struct file *file, 140 struct page *page); 141 int netfs_write_begin(struct file *file, 142 struct address_space *mapping, 143 loff_t pos, 144 unsigned int len, 145 unsigned int flags, 146 struct folio **_folio, 147 void **_fsdata); 148 149Each corresponds to a VM address space operation. These operations use the 150state in the per-inode context. 151 152For ->readahead() and ->readpage(), the network filesystem just point directly 153at the corresponding read helper; whereas for ->write_begin(), it may be a 154little more complicated as the network filesystem might want to flush 155conflicting writes or track dirty data and needs to put the acquired folio if 156an error occurs after calling the helper. 157 158The helpers manage the read request, calling back into the network filesystem 159through the suppplied table of operations. Waits will be performed as 160necessary before returning for helpers that are meant to be synchronous. 161 162If an error occurs and netfs_priv is non-NULL, ops->cleanup() will be called to 163deal with it. If some parts of the request are in progress when an error 164occurs, the request will get partially completed if sufficient data is read. 165 166Additionally, there is:: 167 168 * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq, 169 ssize_t transferred_or_error, 170 bool was_async); 171 172which should be called to complete a read subrequest. This is given the number 173of bytes transferred or a negative error code, plus a flag indicating whether 174the operation was asynchronous (ie. whether the follow-on processing can be 175done in the current context, given this may involve sleeping). 176 177 178Read Helper Structures 179---------------------- 180 181The read helpers make use of a couple of structures to maintain the state of 182the read. The first is a structure that manages a read request as a whole:: 183 184 struct netfs_io_request { 185 struct inode *inode; 186 struct address_space *mapping; 187 struct netfs_cache_resources cache_resources; 188 void *netfs_priv; 189 loff_t start; 190 size_t len; 191 loff_t i_size; 192 const struct netfs_request_ops *netfs_ops; 193 unsigned int debug_id; 194 ... 195 }; 196 197The above fields are the ones the netfs can use. They are: 198 199 * ``inode`` 200 * ``mapping`` 201 202 The inode and the address space of the file being read from. The mapping 203 may or may not point to inode->i_data. 204 205 * ``cache_resources`` 206 207 Resources for the local cache to use, if present. 208 209 * ``netfs_priv`` 210 211 The network filesystem's private data. The value for this can be passed in 212 to the helper functions or set during the request. The ->cleanup() op will 213 be called if this is non-NULL at the end. 214 215 * ``start`` 216 * ``len`` 217 218 The file position of the start of the read request and the length. These 219 may be altered by the ->expand_readahead() op. 220 221 * ``i_size`` 222 223 The size of the file at the start of the request. 224 225 * ``netfs_ops`` 226 227 A pointer to the operation table. The value for this is passed into the 228 helper functions. 229 230 * ``debug_id`` 231 232 A number allocated to this operation that can be displayed in trace lines 233 for reference. 234 235 236The second structure is used to manage individual slices of the overall read 237request:: 238 239 struct netfs_io_subrequest { 240 struct netfs_io_request *rreq; 241 loff_t start; 242 size_t len; 243 size_t transferred; 244 unsigned long flags; 245 unsigned short debug_index; 246 ... 247 }; 248 249Each subrequest is expected to access a single source, though the helpers will 250handle falling back from one source type to another. The members are: 251 252 * ``rreq`` 253 254 A pointer to the read request. 255 256 * ``start`` 257 * ``len`` 258 259 The file position of the start of this slice of the read request and the 260 length. 261 262 * ``transferred`` 263 264 The amount of data transferred so far of the length of this slice. The 265 network filesystem or cache should start the operation this far into the 266 slice. If a short read occurs, the helpers will call again, having updated 267 this to reflect the amount read so far. 268 269 * ``flags`` 270 271 Flags pertaining to the read. There are two of interest to the filesystem 272 or cache: 273 274 * ``NETFS_SREQ_CLEAR_TAIL`` 275 276 This can be set to indicate that the remainder of the slice, from 277 transferred to len, should be cleared. 278 279 * ``NETFS_SREQ_SEEK_DATA_READ`` 280 281 This is a hint to the cache that it might want to try skipping ahead to 282 the next data (ie. using SEEK_DATA). 283 284 * ``debug_index`` 285 286 A number allocated to this slice that can be displayed in trace lines for 287 reference. 288 289 290Read Helper Operations 291---------------------- 292 293The network filesystem must provide the read helpers with a table of operations 294through which it can issue requests and negotiate:: 295 296 struct netfs_request_ops { 297 void (*init_request)(struct netfs_io_request *rreq, struct file *file); 298 int (*begin_cache_operation)(struct netfs_io_request *rreq); 299 void (*expand_readahead)(struct netfs_io_request *rreq); 300 bool (*clamp_length)(struct netfs_io_subrequest *subreq); 301 void (*issue_read)(struct netfs_io_subrequest *subreq); 302 bool (*is_still_valid)(struct netfs_io_request *rreq); 303 int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, 304 struct folio *folio, void **_fsdata); 305 void (*done)(struct netfs_io_request *rreq); 306 void (*cleanup)(struct address_space *mapping, void *netfs_priv); 307 }; 308 309The operations are as follows: 310 311 * ``init_request()`` 312 313 [Optional] This is called to initialise the request structure. It is given 314 the file for reference and can modify the ->netfs_priv value. 315 316 * ``begin_cache_operation()`` 317 318 [Optional] This is called to ask the network filesystem to call into the 319 cache (if present) to initialise the caching state for this read. The netfs 320 library module cannot access the cache directly, so the cache should call 321 something like fscache_begin_read_operation() to do this. 322 323 The cache gets to store its state in ->cache_resources and must set a table 324 of operations of its own there (though of a different type). 325 326 This should return 0 on success and an error code otherwise. If an error is 327 reported, the operation may proceed anyway, just without local caching (only 328 out of memory and interruption errors cause failure here). 329 330 * ``expand_readahead()`` 331 332 [Optional] This is called to allow the filesystem to expand the size of a 333 readahead read request. The filesystem gets to expand the request in both 334 directions, though it's not permitted to reduce it as the numbers may 335 represent an allocation already made. If local caching is enabled, it gets 336 to expand the request first. 337 338 Expansion is communicated by changing ->start and ->len in the request 339 structure. Note that if any change is made, ->len must be increased by at 340 least as much as ->start is reduced. 341 342 * ``clamp_length()`` 343 344 [Optional] This is called to allow the filesystem to reduce the size of a 345 subrequest. The filesystem can use this, for example, to chop up a request 346 that has to be split across multiple servers or to put multiple reads in 347 flight. 348 349 This should return 0 on success and an error code on error. 350 351 * ``issue_read()`` 352 353 [Required] The helpers use this to dispatch a subrequest to the server for 354 reading. In the subrequest, ->start, ->len and ->transferred indicate what 355 data should be read from the server. 356 357 There is no return value; the netfs_subreq_terminated() function should be 358 called to indicate whether or not the operation succeeded and how much data 359 it transferred. The filesystem also should not deal with setting folios 360 uptodate, unlocking them or dropping their refs - the helpers need to deal 361 with this as they have to coordinate with copying to the local cache. 362 363 Note that the helpers have the folios locked, but not pinned. It is 364 possible to use the ITER_XARRAY iov iterator to refer to the range of the 365 inode that is being operated upon without the need to allocate large bvec 366 tables. 367 368 * ``is_still_valid()`` 369 370 [Optional] This is called to find out if the data just read from the local 371 cache is still valid. It should return true if it is still valid and false 372 if not. If it's not still valid, it will be reread from the server. 373 374 * ``check_write_begin()`` 375 376 [Optional] This is called from the netfs_write_begin() helper once it has 377 allocated/grabbed the folio to be modified to allow the filesystem to flush 378 conflicting state before allowing it to be modified. 379 380 It should return 0 if everything is now fine, -EAGAIN if the folio should be 381 regrabbed and any other error code to abort the operation. 382 383 * ``done`` 384 385 [Optional] This is called after the folios in the request have all been 386 unlocked (and marked uptodate if applicable). 387 388 * ``cleanup`` 389 390 [Optional] This is called as the request is being deallocated so that the 391 filesystem can clean up ->netfs_priv. 392 393 394 395Read Helper Procedure 396--------------------- 397 398The read helpers work by the following general procedure: 399 400 * Set up the request. 401 402 * For readahead, allow the local cache and then the network filesystem to 403 propose expansions to the read request. This is then proposed to the VM. 404 If the VM cannot fully perform the expansion, a partially expanded read will 405 be performed, though this may not get written to the cache in its entirety. 406 407 * Loop around slicing chunks off of the request to form subrequests: 408 409 * If a local cache is present, it gets to do the slicing, otherwise the 410 helpers just try to generate maximal slices. 411 412 * The network filesystem gets to clamp the size of each slice if it is to be 413 the source. This allows rsize and chunking to be implemented. 414 415 * The helpers issue a read from the cache or a read from the server or just 416 clears the slice as appropriate. 417 418 * The next slice begins at the end of the last one. 419 420 * As slices finish being read, they terminate. 421 422 * When all the subrequests have terminated, the subrequests are assessed and 423 any that are short or have failed are reissued: 424 425 * Failed cache requests are issued against the server instead. 426 427 * Failed server requests just fail. 428 429 * Short reads against either source will be reissued against that source 430 provided they have transferred some more data: 431 432 * The cache may need to skip holes that it can't do DIO from. 433 434 * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the 435 end of the slice instead of reissuing. 436 437 * Once the data is read, the folios that have been fully read/cleared: 438 439 * Will be marked uptodate. 440 441 * If a cache is present, will be marked with PG_fscache. 442 443 * Unlocked 444 445 * Any folios that need writing to the cache will then have DIO writes issued. 446 447 * Synchronous operations will wait for reading to be complete. 448 449 * Writes to the cache will proceed asynchronously and the folios will have the 450 PG_fscache mark removed when that completes. 451 452 * The request structures will be cleaned up when everything has completed. 453 454 455Read Helper Cache API 456--------------------- 457 458When implementing a local cache to be used by the read helpers, two things are 459required: some way for the network filesystem to initialise the caching for a 460read request and a table of operations for the helpers to call. 461 462The network filesystem's ->begin_cache_operation() method is called to set up a 463cache and this must call into the cache to do the work. If using fscache, for 464example, the cache would call:: 465 466 int fscache_begin_read_operation(struct netfs_io_request *rreq, 467 struct fscache_cookie *cookie); 468 469passing in the request pointer and the cookie corresponding to the file. 470 471The netfs_io_request object contains a place for the cache to hang its 472state:: 473 474 struct netfs_cache_resources { 475 const struct netfs_cache_ops *ops; 476 void *cache_priv; 477 void *cache_priv2; 478 }; 479 480This contains an operations table pointer and two private pointers. The 481operation table looks like the following:: 482 483 struct netfs_cache_ops { 484 void (*end_operation)(struct netfs_cache_resources *cres); 485 486 void (*expand_readahead)(struct netfs_cache_resources *cres, 487 loff_t *_start, size_t *_len, loff_t i_size); 488 489 enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq, 490 loff_t i_size); 491 492 int (*read)(struct netfs_cache_resources *cres, 493 loff_t start_pos, 494 struct iov_iter *iter, 495 bool seek_data, 496 netfs_io_terminated_t term_func, 497 void *term_func_priv); 498 499 int (*prepare_write)(struct netfs_cache_resources *cres, 500 loff_t *_start, size_t *_len, loff_t i_size, 501 bool no_space_allocated_yet); 502 503 int (*write)(struct netfs_cache_resources *cres, 504 loff_t start_pos, 505 struct iov_iter *iter, 506 netfs_io_terminated_t term_func, 507 void *term_func_priv); 508 509 int (*query_occupancy)(struct netfs_cache_resources *cres, 510 loff_t start, size_t len, size_t granularity, 511 loff_t *_data_start, size_t *_data_len); 512 }; 513 514With a termination handler function pointer:: 515 516 typedef void (*netfs_io_terminated_t)(void *priv, 517 ssize_t transferred_or_error, 518 bool was_async); 519 520The methods defined in the table are: 521 522 * ``end_operation()`` 523 524 [Required] Called to clean up the resources at the end of the read request. 525 526 * ``expand_readahead()`` 527 528 [Optional] Called at the beginning of a netfs_readahead() operation to allow 529 the cache to expand a request in either direction. This allows the cache to 530 size the request appropriately for the cache granularity. 531 532 The function is passed poiners to the start and length in its parameters, 533 plus the size of the file for reference, and adjusts the start and length 534 appropriately. It should return one of: 535 536 * ``NETFS_FILL_WITH_ZEROES`` 537 * ``NETFS_DOWNLOAD_FROM_SERVER`` 538 * ``NETFS_READ_FROM_CACHE`` 539 * ``NETFS_INVALID_READ`` 540 541 to indicate whether the slice should just be cleared or whether it should be 542 downloaded from the server or read from the cache - or whether slicing 543 should be given up at the current point. 544 545 * ``prepare_read()`` 546 547 [Required] Called to configure the next slice of a request. ->start and 548 ->len in the subrequest indicate where and how big the next slice can be; 549 the cache gets to reduce the length to match its granularity requirements. 550 551 * ``read()`` 552 553 [Required] Called to read from the cache. The start file offset is given 554 along with an iterator to read to, which gives the length also. It can be 555 given a hint requesting that it seek forward from that start position for 556 data. 557 558 Also provided is a pointer to a termination handler function and private 559 data to pass to that function. The termination function should be called 560 with the number of bytes transferred or an error code, plus a flag 561 indicating whether the termination is definitely happening in the caller's 562 context. 563 564 * ``prepare_write()`` 565 566 [Required] Called to prepare a write to the cache to take place. This 567 involves checking to see whether the cache has sufficient space to honour 568 the write. ``*_start`` and ``*_len`` indicate the region to be written; the 569 region can be shrunk or it can be expanded to a page boundary either way as 570 necessary to align for direct I/O. i_size holds the size of the object and 571 is provided for reference. no_space_allocated_yet is set to true if the 572 caller is certain that no data has been written to that region - for example 573 if it tried to do a read from there already. 574 575 * ``write()`` 576 577 [Required] Called to write to the cache. The start file offset is given 578 along with an iterator to write from, which gives the length also. 579 580 Also provided is a pointer to a termination handler function and private 581 data to pass to that function. The termination function should be called 582 with the number of bytes transferred or an error code, plus a flag 583 indicating whether the termination is definitely happening in the caller's 584 context. 585 586 * ``query_occupancy()`` 587 588 [Required] Called to find out where the next piece of data is within a 589 particular region of the cache. The start and length of the region to be 590 queried are passed in, along with the granularity to which the answer needs 591 to be aligned. The function passes back the start and length of the data, 592 if any, available within that region. Note that there may be a hole at the 593 front. 594 595 It returns 0 if some data was found, -ENODATA if there was no usable data 596 within the region or -ENOBUFS if there is no caching on this file. 597 598Note that these methods are passed a pointer to the cache resource structure, 599not the read request structure as they could be used in other situations where 600there isn't a read request structure as well, such as writing dirty data to the 601cache. 602 603 604API Function Reference 605====================== 606 607.. kernel-doc:: include/linux/netfs.h 608.. kernel-doc:: fs/netfs/buffered_read.c 609.. kernel-doc:: fs/netfs/io.c 610