1.. SPDX-License-Identifier: GPL-2.0 2 3================================= 4Network Filesystem Helper Library 5================================= 6 7.. Contents: 8 9 - Overview. 10 - Per-inode context. 11 - Inode context helper functions. 12 - Buffered read helpers. 13 - Read helper functions. 14 - Read helper structures. 15 - Read helper operations. 16 - Read helper procedure. 17 - Read helper cache API. 18 19 20Overview 21======== 22 23The network filesystem helper library is a set of functions designed to aid a 24network filesystem in implementing VM/VFS operations. For the moment, that 25just includes turning various VM buffered read operations into requests to read 26from the server. The helper library, however, can also interpose other 27services, such as local caching or local data encryption. 28 29Note that the library module doesn't link against local caching directly, so 30access must be provided by the netfs. 31 32 33Per-Inode Context 34================= 35 36The network filesystem helper library needs a place to store a bit of state for 37its use on each netfs inode it is helping to manage. To this end, a context 38structure is defined:: 39 40 struct netfs_inode { 41 struct inode inode; 42 const struct netfs_request_ops *ops; 43 struct fscache_cookie *cache; 44 }; 45 46A network filesystem that wants to use netfs lib must place one of these in its 47inode wrapper struct instead of the VFS ``struct inode``. This can be done in 48a way similar to the following:: 49 50 struct my_inode { 51 struct netfs_inode netfs; /* Netfslib context and vfs inode */ 52 ... 53 }; 54 55This allows netfslib to find its state by using ``container_of()`` from the 56inode pointer, thereby allowing the netfslib helper functions to be pointed to 57directly by the VFS/VM operation tables. 58 59The structure contains the following fields: 60 61 * ``inode`` 62 63 The VFS inode structure. 64 65 * ``ops`` 66 67 The set of operations provided by the network filesystem to netfslib. 68 69 * ``cache`` 70 71 Local caching cookie, or NULL if no caching is enabled. This field does not 72 exist if fscache is disabled. 73 74 75Inode Context Helper Functions 76------------------------------ 77 78To help deal with the per-inode context, a number helper functions are 79provided. Firstly, a function to perform basic initialisation on a context and 80set the operations table pointer:: 81 82 void netfs_inode_init(struct inode *inode, 83 const struct netfs_request_ops *ops); 84 85then a function to cast from the VFS inode structure to the netfs context:: 86 87 struct netfs_inode *netfs_node(struct inode *inode); 88 89and finally, a function to get the cache cookie pointer from the context 90attached to an inode (or NULL if fscache is disabled):: 91 92 struct fscache_cookie *netfs_i_cookie(struct inode *inode); 93 94 95Buffered Read Helpers 96===================== 97 98The library provides a set of read helpers that handle the ->read_folio(), 99->readahead() and much of the ->write_begin() VM operations and translate them 100into a common call framework. 101 102The following services are provided: 103 104 * Handle folios that span multiple pages. 105 106 * Insulate the netfs from VM interface changes. 107 108 * Allow the netfs to arbitrarily split reads up into pieces, even ones that 109 don't match folio sizes or folio alignments and that may cross folios. 110 111 * Allow the netfs to expand a readahead request in both directions to meet its 112 needs. 113 114 * Allow the netfs to partially fulfil a read, which will then be resubmitted. 115 116 * Handle local caching, allowing cached data and server-read data to be 117 interleaved for a single request. 118 119 * Handle clearing of bufferage that aren't on the server. 120 121 * Handle retrying of reads that failed, switching reads from the cache to the 122 server as necessary. 123 124 * In the future, this is a place that other services can be performed, such as 125 local encryption of data to be stored remotely or in the cache. 126 127From the network filesystem, the helpers require a table of operations. This 128includes a mandatory method to issue a read operation along with a number of 129optional methods. 130 131 132Read Helper Functions 133--------------------- 134 135Three read helpers are provided:: 136 137 void netfs_readahead(struct readahead_control *ractl); 138 int netfs_read_folio(struct file *file, 139 struct folio *folio); 140 int netfs_write_begin(struct file *file, 141 struct address_space *mapping, 142 loff_t pos, 143 unsigned int len, 144 struct folio **_folio, 145 void **_fsdata); 146 147Each corresponds to a VM address space operation. These operations use the 148state in the per-inode context. 149 150For ->readahead() and ->read_folio(), the network filesystem just point directly 151at the corresponding read helper; whereas for ->write_begin(), it may be a 152little more complicated as the network filesystem might want to flush 153conflicting writes or track dirty data and needs to put the acquired folio if 154an error occurs after calling the helper. 155 156The helpers manage the read request, calling back into the network filesystem 157through the suppplied table of operations. Waits will be performed as 158necessary before returning for helpers that are meant to be synchronous. 159 160If an error occurs and netfs_priv is non-NULL, ops->cleanup() will be called to 161deal with it. If some parts of the request are in progress when an error 162occurs, the request will get partially completed if sufficient data is read. 163 164Additionally, there is:: 165 166 * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq, 167 ssize_t transferred_or_error, 168 bool was_async); 169 170which should be called to complete a read subrequest. This is given the number 171of bytes transferred or a negative error code, plus a flag indicating whether 172the operation was asynchronous (ie. whether the follow-on processing can be 173done in the current context, given this may involve sleeping). 174 175 176Read Helper Structures 177---------------------- 178 179The read helpers make use of a couple of structures to maintain the state of 180the read. The first is a structure that manages a read request as a whole:: 181 182 struct netfs_io_request { 183 struct inode *inode; 184 struct address_space *mapping; 185 struct netfs_cache_resources cache_resources; 186 void *netfs_priv; 187 loff_t start; 188 size_t len; 189 loff_t i_size; 190 const struct netfs_request_ops *netfs_ops; 191 unsigned int debug_id; 192 ... 193 }; 194 195The above fields are the ones the netfs can use. They are: 196 197 * ``inode`` 198 * ``mapping`` 199 200 The inode and the address space of the file being read from. The mapping 201 may or may not point to inode->i_data. 202 203 * ``cache_resources`` 204 205 Resources for the local cache to use, if present. 206 207 * ``netfs_priv`` 208 209 The network filesystem's private data. The value for this can be passed in 210 to the helper functions or set during the request. The ->cleanup() op will 211 be called if this is non-NULL at the end. 212 213 * ``start`` 214 * ``len`` 215 216 The file position of the start of the read request and the length. These 217 may be altered by the ->expand_readahead() op. 218 219 * ``i_size`` 220 221 The size of the file at the start of the request. 222 223 * ``netfs_ops`` 224 225 A pointer to the operation table. The value for this is passed into the 226 helper functions. 227 228 * ``debug_id`` 229 230 A number allocated to this operation that can be displayed in trace lines 231 for reference. 232 233 234The second structure is used to manage individual slices of the overall read 235request:: 236 237 struct netfs_io_subrequest { 238 struct netfs_io_request *rreq; 239 loff_t start; 240 size_t len; 241 size_t transferred; 242 unsigned long flags; 243 unsigned short debug_index; 244 ... 245 }; 246 247Each subrequest is expected to access a single source, though the helpers will 248handle falling back from one source type to another. The members are: 249 250 * ``rreq`` 251 252 A pointer to the read request. 253 254 * ``start`` 255 * ``len`` 256 257 The file position of the start of this slice of the read request and the 258 length. 259 260 * ``transferred`` 261 262 The amount of data transferred so far of the length of this slice. The 263 network filesystem or cache should start the operation this far into the 264 slice. If a short read occurs, the helpers will call again, having updated 265 this to reflect the amount read so far. 266 267 * ``flags`` 268 269 Flags pertaining to the read. There are two of interest to the filesystem 270 or cache: 271 272 * ``NETFS_SREQ_CLEAR_TAIL`` 273 274 This can be set to indicate that the remainder of the slice, from 275 transferred to len, should be cleared. 276 277 * ``NETFS_SREQ_SEEK_DATA_READ`` 278 279 This is a hint to the cache that it might want to try skipping ahead to 280 the next data (ie. using SEEK_DATA). 281 282 * ``debug_index`` 283 284 A number allocated to this slice that can be displayed in trace lines for 285 reference. 286 287 288Read Helper Operations 289---------------------- 290 291The network filesystem must provide the read helpers with a table of operations 292through which it can issue requests and negotiate:: 293 294 struct netfs_request_ops { 295 void (*init_request)(struct netfs_io_request *rreq, struct file *file); 296 int (*begin_cache_operation)(struct netfs_io_request *rreq); 297 void (*expand_readahead)(struct netfs_io_request *rreq); 298 bool (*clamp_length)(struct netfs_io_subrequest *subreq); 299 void (*issue_read)(struct netfs_io_subrequest *subreq); 300 bool (*is_still_valid)(struct netfs_io_request *rreq); 301 int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, 302 struct folio *folio, void **_fsdata); 303 void (*done)(struct netfs_io_request *rreq); 304 void (*cleanup)(struct address_space *mapping, void *netfs_priv); 305 }; 306 307The operations are as follows: 308 309 * ``init_request()`` 310 311 [Optional] This is called to initialise the request structure. It is given 312 the file for reference and can modify the ->netfs_priv value. 313 314 * ``begin_cache_operation()`` 315 316 [Optional] This is called to ask the network filesystem to call into the 317 cache (if present) to initialise the caching state for this read. The netfs 318 library module cannot access the cache directly, so the cache should call 319 something like fscache_begin_read_operation() to do this. 320 321 The cache gets to store its state in ->cache_resources and must set a table 322 of operations of its own there (though of a different type). 323 324 This should return 0 on success and an error code otherwise. If an error is 325 reported, the operation may proceed anyway, just without local caching (only 326 out of memory and interruption errors cause failure here). 327 328 * ``expand_readahead()`` 329 330 [Optional] This is called to allow the filesystem to expand the size of a 331 readahead read request. The filesystem gets to expand the request in both 332 directions, though it's not permitted to reduce it as the numbers may 333 represent an allocation already made. If local caching is enabled, it gets 334 to expand the request first. 335 336 Expansion is communicated by changing ->start and ->len in the request 337 structure. Note that if any change is made, ->len must be increased by at 338 least as much as ->start is reduced. 339 340 * ``clamp_length()`` 341 342 [Optional] This is called to allow the filesystem to reduce the size of a 343 subrequest. The filesystem can use this, for example, to chop up a request 344 that has to be split across multiple servers or to put multiple reads in 345 flight. 346 347 This should return 0 on success and an error code on error. 348 349 * ``issue_read()`` 350 351 [Required] The helpers use this to dispatch a subrequest to the server for 352 reading. In the subrequest, ->start, ->len and ->transferred indicate what 353 data should be read from the server. 354 355 There is no return value; the netfs_subreq_terminated() function should be 356 called to indicate whether or not the operation succeeded and how much data 357 it transferred. The filesystem also should not deal with setting folios 358 uptodate, unlocking them or dropping their refs - the helpers need to deal 359 with this as they have to coordinate with copying to the local cache. 360 361 Note that the helpers have the folios locked, but not pinned. It is 362 possible to use the ITER_XARRAY iov iterator to refer to the range of the 363 inode that is being operated upon without the need to allocate large bvec 364 tables. 365 366 * ``is_still_valid()`` 367 368 [Optional] This is called to find out if the data just read from the local 369 cache is still valid. It should return true if it is still valid and false 370 if not. If it's not still valid, it will be reread from the server. 371 372 * ``check_write_begin()`` 373 374 [Optional] This is called from the netfs_write_begin() helper once it has 375 allocated/grabbed the folio to be modified to allow the filesystem to flush 376 conflicting state before allowing it to be modified. 377 378 It should return 0 if everything is now fine, -EAGAIN if the folio should be 379 regrabbed and any other error code to abort the operation. 380 381 * ``done`` 382 383 [Optional] This is called after the folios in the request have all been 384 unlocked (and marked uptodate if applicable). 385 386 * ``cleanup`` 387 388 [Optional] This is called as the request is being deallocated so that the 389 filesystem can clean up ->netfs_priv. 390 391 392 393Read Helper Procedure 394--------------------- 395 396The read helpers work by the following general procedure: 397 398 * Set up the request. 399 400 * For readahead, allow the local cache and then the network filesystem to 401 propose expansions to the read request. This is then proposed to the VM. 402 If the VM cannot fully perform the expansion, a partially expanded read will 403 be performed, though this may not get written to the cache in its entirety. 404 405 * Loop around slicing chunks off of the request to form subrequests: 406 407 * If a local cache is present, it gets to do the slicing, otherwise the 408 helpers just try to generate maximal slices. 409 410 * The network filesystem gets to clamp the size of each slice if it is to be 411 the source. This allows rsize and chunking to be implemented. 412 413 * The helpers issue a read from the cache or a read from the server or just 414 clears the slice as appropriate. 415 416 * The next slice begins at the end of the last one. 417 418 * As slices finish being read, they terminate. 419 420 * When all the subrequests have terminated, the subrequests are assessed and 421 any that are short or have failed are reissued: 422 423 * Failed cache requests are issued against the server instead. 424 425 * Failed server requests just fail. 426 427 * Short reads against either source will be reissued against that source 428 provided they have transferred some more data: 429 430 * The cache may need to skip holes that it can't do DIO from. 431 432 * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the 433 end of the slice instead of reissuing. 434 435 * Once the data is read, the folios that have been fully read/cleared: 436 437 * Will be marked uptodate. 438 439 * If a cache is present, will be marked with PG_fscache. 440 441 * Unlocked 442 443 * Any folios that need writing to the cache will then have DIO writes issued. 444 445 * Synchronous operations will wait for reading to be complete. 446 447 * Writes to the cache will proceed asynchronously and the folios will have the 448 PG_fscache mark removed when that completes. 449 450 * The request structures will be cleaned up when everything has completed. 451 452 453Read Helper Cache API 454--------------------- 455 456When implementing a local cache to be used by the read helpers, two things are 457required: some way for the network filesystem to initialise the caching for a 458read request and a table of operations for the helpers to call. 459 460The network filesystem's ->begin_cache_operation() method is called to set up a 461cache and this must call into the cache to do the work. If using fscache, for 462example, the cache would call:: 463 464 int fscache_begin_read_operation(struct netfs_io_request *rreq, 465 struct fscache_cookie *cookie); 466 467passing in the request pointer and the cookie corresponding to the file. 468 469The netfs_io_request object contains a place for the cache to hang its 470state:: 471 472 struct netfs_cache_resources { 473 const struct netfs_cache_ops *ops; 474 void *cache_priv; 475 void *cache_priv2; 476 }; 477 478This contains an operations table pointer and two private pointers. The 479operation table looks like the following:: 480 481 struct netfs_cache_ops { 482 void (*end_operation)(struct netfs_cache_resources *cres); 483 484 void (*expand_readahead)(struct netfs_cache_resources *cres, 485 loff_t *_start, size_t *_len, loff_t i_size); 486 487 enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq, 488 loff_t i_size); 489 490 int (*read)(struct netfs_cache_resources *cres, 491 loff_t start_pos, 492 struct iov_iter *iter, 493 bool seek_data, 494 netfs_io_terminated_t term_func, 495 void *term_func_priv); 496 497 int (*prepare_write)(struct netfs_cache_resources *cres, 498 loff_t *_start, size_t *_len, loff_t i_size, 499 bool no_space_allocated_yet); 500 501 int (*write)(struct netfs_cache_resources *cres, 502 loff_t start_pos, 503 struct iov_iter *iter, 504 netfs_io_terminated_t term_func, 505 void *term_func_priv); 506 507 int (*query_occupancy)(struct netfs_cache_resources *cres, 508 loff_t start, size_t len, size_t granularity, 509 loff_t *_data_start, size_t *_data_len); 510 }; 511 512With a termination handler function pointer:: 513 514 typedef void (*netfs_io_terminated_t)(void *priv, 515 ssize_t transferred_or_error, 516 bool was_async); 517 518The methods defined in the table are: 519 520 * ``end_operation()`` 521 522 [Required] Called to clean up the resources at the end of the read request. 523 524 * ``expand_readahead()`` 525 526 [Optional] Called at the beginning of a netfs_readahead() operation to allow 527 the cache to expand a request in either direction. This allows the cache to 528 size the request appropriately for the cache granularity. 529 530 The function is passed poiners to the start and length in its parameters, 531 plus the size of the file for reference, and adjusts the start and length 532 appropriately. It should return one of: 533 534 * ``NETFS_FILL_WITH_ZEROES`` 535 * ``NETFS_DOWNLOAD_FROM_SERVER`` 536 * ``NETFS_READ_FROM_CACHE`` 537 * ``NETFS_INVALID_READ`` 538 539 to indicate whether the slice should just be cleared or whether it should be 540 downloaded from the server or read from the cache - or whether slicing 541 should be given up at the current point. 542 543 * ``prepare_read()`` 544 545 [Required] Called to configure the next slice of a request. ->start and 546 ->len in the subrequest indicate where and how big the next slice can be; 547 the cache gets to reduce the length to match its granularity requirements. 548 549 * ``read()`` 550 551 [Required] Called to read from the cache. The start file offset is given 552 along with an iterator to read to, which gives the length also. It can be 553 given a hint requesting that it seek forward from that start position for 554 data. 555 556 Also provided is a pointer to a termination handler function and private 557 data to pass to that function. The termination function should be called 558 with the number of bytes transferred or an error code, plus a flag 559 indicating whether the termination is definitely happening in the caller's 560 context. 561 562 * ``prepare_write()`` 563 564 [Required] Called to prepare a write to the cache to take place. This 565 involves checking to see whether the cache has sufficient space to honour 566 the write. ``*_start`` and ``*_len`` indicate the region to be written; the 567 region can be shrunk or it can be expanded to a page boundary either way as 568 necessary to align for direct I/O. i_size holds the size of the object and 569 is provided for reference. no_space_allocated_yet is set to true if the 570 caller is certain that no data has been written to that region - for example 571 if it tried to do a read from there already. 572 573 * ``write()`` 574 575 [Required] Called to write to the cache. The start file offset is given 576 along with an iterator to write from, which gives the length also. 577 578 Also provided is a pointer to a termination handler function and private 579 data to pass to that function. The termination function should be called 580 with the number of bytes transferred or an error code, plus a flag 581 indicating whether the termination is definitely happening in the caller's 582 context. 583 584 * ``query_occupancy()`` 585 586 [Required] Called to find out where the next piece of data is within a 587 particular region of the cache. The start and length of the region to be 588 queried are passed in, along with the granularity to which the answer needs 589 to be aligned. The function passes back the start and length of the data, 590 if any, available within that region. Note that there may be a hole at the 591 front. 592 593 It returns 0 if some data was found, -ENODATA if there was no usable data 594 within the region or -ENOBUFS if there is no caching on this file. 595 596Note that these methods are passed a pointer to the cache resource structure, 597not the read request structure as they could be used in other situations where 598there isn't a read request structure as well, such as writing dirty data to the 599cache. 600 601 602API Function Reference 603====================== 604 605.. kernel-doc:: include/linux/netfs.h 606.. kernel-doc:: fs/netfs/buffered_read.c 607.. kernel-doc:: fs/netfs/io.c 608