1.. SPDX-License-Identifier: GPL-2.0
2
3=================================
4Network Filesystem Helper Library
5=================================
6
7.. Contents:
8
9 - Overview.
10 - Per-inode context.
11   - Inode context helper functions.
12 - Buffered read helpers.
13   - Read helper functions.
14   - Read helper structures.
15   - Read helper operations.
16   - Read helper procedure.
17   - Read helper cache API.
18
19
20Overview
21========
22
23The network filesystem helper library is a set of functions designed to aid a
24network filesystem in implementing VM/VFS operations.  For the moment, that
25just includes turning various VM buffered read operations into requests to read
26from the server.  The helper library, however, can also interpose other
27services, such as local caching or local data encryption.
28
29Note that the library module doesn't link against local caching directly, so
30access must be provided by the netfs.
31
32
33Per-Inode Context
34=================
35
36The network filesystem helper library needs a place to store a bit of state for
37its use on each netfs inode it is helping to manage.  To this end, a context
38structure is defined::
39
40	struct netfs_inode {
41		struct inode inode;
42		const struct netfs_request_ops *ops;
43		struct fscache_cookie *cache;
44	};
45
46A network filesystem that wants to use netfs lib must place one of these in its
47inode wrapper struct instead of the VFS ``struct inode``.  This can be done in
48a way similar to the following::
49
50	struct my_inode {
51		struct netfs_inode netfs; /* Netfslib context and vfs inode */
52		...
53	};
54
55This allows netfslib to find its state by using ``container_of()`` from the
56inode pointer, thereby allowing the netfslib helper functions to be pointed to
57directly by the VFS/VM operation tables.
58
59The structure contains the following fields:
60
61 * ``inode``
62
63   The VFS inode structure.
64
65 * ``ops``
66
67   The set of operations provided by the network filesystem to netfslib.
68
69 * ``cache``
70
71   Local caching cookie, or NULL if no caching is enabled.  This field does not
72   exist if fscache is disabled.
73
74
75Inode Context Helper Functions
76------------------------------
77
78To help deal with the per-inode context, a number helper functions are
79provided.  Firstly, a function to perform basic initialisation on a context and
80set the operations table pointer::
81
82	void netfs_inode_init(struct inode *inode,
83			      const struct netfs_request_ops *ops);
84
85then a function to cast from the VFS inode structure to the netfs context::
86
87	struct netfs_inode *netfs_node(struct inode *inode);
88
89and finally, a function to get the cache cookie pointer from the context
90attached to an inode (or NULL if fscache is disabled)::
91
92	struct fscache_cookie *netfs_i_cookie(struct inode *inode);
93
94
95Buffered Read Helpers
96=====================
97
98The library provides a set of read helpers that handle the ->read_folio(),
99->readahead() and much of the ->write_begin() VM operations and translate them
100into a common call framework.
101
102The following services are provided:
103
104 * Handle folios that span multiple pages.
105
106 * Insulate the netfs from VM interface changes.
107
108 * Allow the netfs to arbitrarily split reads up into pieces, even ones that
109   don't match folio sizes or folio alignments and that may cross folios.
110
111 * Allow the netfs to expand a readahead request in both directions to meet its
112   needs.
113
114 * Allow the netfs to partially fulfil a read, which will then be resubmitted.
115
116 * Handle local caching, allowing cached data and server-read data to be
117   interleaved for a single request.
118
119 * Handle clearing of bufferage that aren't on the server.
120
121 * Handle retrying of reads that failed, switching reads from the cache to the
122   server as necessary.
123
124 * In the future, this is a place that other services can be performed, such as
125   local encryption of data to be stored remotely or in the cache.
126
127From the network filesystem, the helpers require a table of operations.  This
128includes a mandatory method to issue a read operation along with a number of
129optional methods.
130
131
132Read Helper Functions
133---------------------
134
135Three read helpers are provided::
136
137	void netfs_readahead(struct readahead_control *ractl);
138	int netfs_read_folio(struct file *file,
139			   struct folio *folio);
140	int netfs_write_begin(struct file *file,
141			      struct address_space *mapping,
142			      loff_t pos,
143			      unsigned int len,
144			      struct folio **_folio,
145			      void **_fsdata);
146
147Each corresponds to a VM address space operation.  These operations use the
148state in the per-inode context.
149
150For ->readahead() and ->read_folio(), the network filesystem just point directly
151at the corresponding read helper; whereas for ->write_begin(), it may be a
152little more complicated as the network filesystem might want to flush
153conflicting writes or track dirty data and needs to put the acquired folio if
154an error occurs after calling the helper.
155
156The helpers manage the read request, calling back into the network filesystem
157through the suppplied table of operations.  Waits will be performed as
158necessary before returning for helpers that are meant to be synchronous.
159
160If an error occurs and netfs_priv is non-NULL, ops->cleanup() will be called to
161deal with it.  If some parts of the request are in progress when an error
162occurs, the request will get partially completed if sufficient data is read.
163
164Additionally, there is::
165
166  * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
167				 ssize_t transferred_or_error,
168				 bool was_async);
169
170which should be called to complete a read subrequest.  This is given the number
171of bytes transferred or a negative error code, plus a flag indicating whether
172the operation was asynchronous (ie. whether the follow-on processing can be
173done in the current context, given this may involve sleeping).
174
175
176Read Helper Structures
177----------------------
178
179The read helpers make use of a couple of structures to maintain the state of
180the read.  The first is a structure that manages a read request as a whole::
181
182	struct netfs_io_request {
183		struct inode		*inode;
184		struct address_space	*mapping;
185		struct netfs_cache_resources cache_resources;
186		void			*netfs_priv;
187		loff_t			start;
188		size_t			len;
189		loff_t			i_size;
190		const struct netfs_request_ops *netfs_ops;
191		unsigned int		debug_id;
192		...
193	};
194
195The above fields are the ones the netfs can use.  They are:
196
197 * ``inode``
198 * ``mapping``
199
200   The inode and the address space of the file being read from.  The mapping
201   may or may not point to inode->i_data.
202
203 * ``cache_resources``
204
205   Resources for the local cache to use, if present.
206
207 * ``netfs_priv``
208
209   The network filesystem's private data.  The value for this can be passed in
210   to the helper functions or set during the request.  The ->cleanup() op will
211   be called if this is non-NULL at the end.
212
213 * ``start``
214 * ``len``
215
216   The file position of the start of the read request and the length.  These
217   may be altered by the ->expand_readahead() op.
218
219 * ``i_size``
220
221   The size of the file at the start of the request.
222
223 * ``netfs_ops``
224
225   A pointer to the operation table.  The value for this is passed into the
226   helper functions.
227
228 * ``debug_id``
229
230   A number allocated to this operation that can be displayed in trace lines
231   for reference.
232
233
234The second structure is used to manage individual slices of the overall read
235request::
236
237	struct netfs_io_subrequest {
238		struct netfs_io_request *rreq;
239		loff_t			start;
240		size_t			len;
241		size_t			transferred;
242		unsigned long		flags;
243		unsigned short		debug_index;
244		...
245	};
246
247Each subrequest is expected to access a single source, though the helpers will
248handle falling back from one source type to another.  The members are:
249
250 * ``rreq``
251
252   A pointer to the read request.
253
254 * ``start``
255 * ``len``
256
257   The file position of the start of this slice of the read request and the
258   length.
259
260 * ``transferred``
261
262   The amount of data transferred so far of the length of this slice.  The
263   network filesystem or cache should start the operation this far into the
264   slice.  If a short read occurs, the helpers will call again, having updated
265   this to reflect the amount read so far.
266
267 * ``flags``
268
269   Flags pertaining to the read.  There are two of interest to the filesystem
270   or cache:
271
272   * ``NETFS_SREQ_CLEAR_TAIL``
273
274     This can be set to indicate that the remainder of the slice, from
275     transferred to len, should be cleared.
276
277   * ``NETFS_SREQ_SEEK_DATA_READ``
278
279     This is a hint to the cache that it might want to try skipping ahead to
280     the next data (ie. using SEEK_DATA).
281
282 * ``debug_index``
283
284   A number allocated to this slice that can be displayed in trace lines for
285   reference.
286
287
288Read Helper Operations
289----------------------
290
291The network filesystem must provide the read helpers with a table of operations
292through which it can issue requests and negotiate::
293
294	struct netfs_request_ops {
295		void (*init_request)(struct netfs_io_request *rreq, struct file *file);
296		int (*begin_cache_operation)(struct netfs_io_request *rreq);
297		void (*expand_readahead)(struct netfs_io_request *rreq);
298		bool (*clamp_length)(struct netfs_io_subrequest *subreq);
299		void (*issue_read)(struct netfs_io_subrequest *subreq);
300		bool (*is_still_valid)(struct netfs_io_request *rreq);
301		int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
302					 struct folio *folio, void **_fsdata);
303		void (*done)(struct netfs_io_request *rreq);
304		void (*cleanup)(struct address_space *mapping, void *netfs_priv);
305	};
306
307The operations are as follows:
308
309 * ``init_request()``
310
311   [Optional] This is called to initialise the request structure.  It is given
312   the file for reference and can modify the ->netfs_priv value.
313
314 * ``begin_cache_operation()``
315
316   [Optional] This is called to ask the network filesystem to call into the
317   cache (if present) to initialise the caching state for this read.  The netfs
318   library module cannot access the cache directly, so the cache should call
319   something like fscache_begin_read_operation() to do this.
320
321   The cache gets to store its state in ->cache_resources and must set a table
322   of operations of its own there (though of a different type).
323
324   This should return 0 on success and an error code otherwise.  If an error is
325   reported, the operation may proceed anyway, just without local caching (only
326   out of memory and interruption errors cause failure here).
327
328 * ``expand_readahead()``
329
330   [Optional] This is called to allow the filesystem to expand the size of a
331   readahead read request.  The filesystem gets to expand the request in both
332   directions, though it's not permitted to reduce it as the numbers may
333   represent an allocation already made.  If local caching is enabled, it gets
334   to expand the request first.
335
336   Expansion is communicated by changing ->start and ->len in the request
337   structure.  Note that if any change is made, ->len must be increased by at
338   least as much as ->start is reduced.
339
340 * ``clamp_length()``
341
342   [Optional] This is called to allow the filesystem to reduce the size of a
343   subrequest.  The filesystem can use this, for example, to chop up a request
344   that has to be split across multiple servers or to put multiple reads in
345   flight.
346
347   This should return 0 on success and an error code on error.
348
349 * ``issue_read()``
350
351   [Required] The helpers use this to dispatch a subrequest to the server for
352   reading.  In the subrequest, ->start, ->len and ->transferred indicate what
353   data should be read from the server.
354
355   There is no return value; the netfs_subreq_terminated() function should be
356   called to indicate whether or not the operation succeeded and how much data
357   it transferred.  The filesystem also should not deal with setting folios
358   uptodate, unlocking them or dropping their refs - the helpers need to deal
359   with this as they have to coordinate with copying to the local cache.
360
361   Note that the helpers have the folios locked, but not pinned.  It is
362   possible to use the ITER_XARRAY iov iterator to refer to the range of the
363   inode that is being operated upon without the need to allocate large bvec
364   tables.
365
366 * ``is_still_valid()``
367
368   [Optional] This is called to find out if the data just read from the local
369   cache is still valid.  It should return true if it is still valid and false
370   if not.  If it's not still valid, it will be reread from the server.
371
372 * ``check_write_begin()``
373
374   [Optional] This is called from the netfs_write_begin() helper once it has
375   allocated/grabbed the folio to be modified to allow the filesystem to flush
376   conflicting state before allowing it to be modified.
377
378   It should return 0 if everything is now fine, -EAGAIN if the folio should be
379   regrabbed and any other error code to abort the operation.
380
381 * ``done``
382
383   [Optional] This is called after the folios in the request have all been
384   unlocked (and marked uptodate if applicable).
385
386 * ``cleanup``
387
388   [Optional] This is called as the request is being deallocated so that the
389   filesystem can clean up ->netfs_priv.
390
391
392
393Read Helper Procedure
394---------------------
395
396The read helpers work by the following general procedure:
397
398 * Set up the request.
399
400 * For readahead, allow the local cache and then the network filesystem to
401   propose expansions to the read request.  This is then proposed to the VM.
402   If the VM cannot fully perform the expansion, a partially expanded read will
403   be performed, though this may not get written to the cache in its entirety.
404
405 * Loop around slicing chunks off of the request to form subrequests:
406
407   * If a local cache is present, it gets to do the slicing, otherwise the
408     helpers just try to generate maximal slices.
409
410   * The network filesystem gets to clamp the size of each slice if it is to be
411     the source.  This allows rsize and chunking to be implemented.
412
413   * The helpers issue a read from the cache or a read from the server or just
414     clears the slice as appropriate.
415
416   * The next slice begins at the end of the last one.
417
418   * As slices finish being read, they terminate.
419
420 * When all the subrequests have terminated, the subrequests are assessed and
421   any that are short or have failed are reissued:
422
423   * Failed cache requests are issued against the server instead.
424
425   * Failed server requests just fail.
426
427   * Short reads against either source will be reissued against that source
428     provided they have transferred some more data:
429
430     * The cache may need to skip holes that it can't do DIO from.
431
432     * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
433       end of the slice instead of reissuing.
434
435 * Once the data is read, the folios that have been fully read/cleared:
436
437   * Will be marked uptodate.
438
439   * If a cache is present, will be marked with PG_fscache.
440
441   * Unlocked
442
443 * Any folios that need writing to the cache will then have DIO writes issued.
444
445 * Synchronous operations will wait for reading to be complete.
446
447 * Writes to the cache will proceed asynchronously and the folios will have the
448   PG_fscache mark removed when that completes.
449
450 * The request structures will be cleaned up when everything has completed.
451
452
453Read Helper Cache API
454---------------------
455
456When implementing a local cache to be used by the read helpers, two things are
457required: some way for the network filesystem to initialise the caching for a
458read request and a table of operations for the helpers to call.
459
460The network filesystem's ->begin_cache_operation() method is called to set up a
461cache and this must call into the cache to do the work.  If using fscache, for
462example, the cache would call::
463
464	int fscache_begin_read_operation(struct netfs_io_request *rreq,
465					 struct fscache_cookie *cookie);
466
467passing in the request pointer and the cookie corresponding to the file.
468
469The netfs_io_request object contains a place for the cache to hang its
470state::
471
472	struct netfs_cache_resources {
473		const struct netfs_cache_ops	*ops;
474		void				*cache_priv;
475		void				*cache_priv2;
476	};
477
478This contains an operations table pointer and two private pointers.  The
479operation table looks like the following::
480
481	struct netfs_cache_ops {
482		void (*end_operation)(struct netfs_cache_resources *cres);
483
484		void (*expand_readahead)(struct netfs_cache_resources *cres,
485					 loff_t *_start, size_t *_len, loff_t i_size);
486
487		enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq,
488						       loff_t i_size);
489
490		int (*read)(struct netfs_cache_resources *cres,
491			    loff_t start_pos,
492			    struct iov_iter *iter,
493			    bool seek_data,
494			    netfs_io_terminated_t term_func,
495			    void *term_func_priv);
496
497		int (*prepare_write)(struct netfs_cache_resources *cres,
498				     loff_t *_start, size_t *_len, loff_t i_size,
499				     bool no_space_allocated_yet);
500
501		int (*write)(struct netfs_cache_resources *cres,
502			     loff_t start_pos,
503			     struct iov_iter *iter,
504			     netfs_io_terminated_t term_func,
505			     void *term_func_priv);
506
507		int (*query_occupancy)(struct netfs_cache_resources *cres,
508				       loff_t start, size_t len, size_t granularity,
509				       loff_t *_data_start, size_t *_data_len);
510	};
511
512With a termination handler function pointer::
513
514	typedef void (*netfs_io_terminated_t)(void *priv,
515					      ssize_t transferred_or_error,
516					      bool was_async);
517
518The methods defined in the table are:
519
520 * ``end_operation()``
521
522   [Required] Called to clean up the resources at the end of the read request.
523
524 * ``expand_readahead()``
525
526   [Optional] Called at the beginning of a netfs_readahead() operation to allow
527   the cache to expand a request in either direction.  This allows the cache to
528   size the request appropriately for the cache granularity.
529
530   The function is passed poiners to the start and length in its parameters,
531   plus the size of the file for reference, and adjusts the start and length
532   appropriately.  It should return one of:
533
534   * ``NETFS_FILL_WITH_ZEROES``
535   * ``NETFS_DOWNLOAD_FROM_SERVER``
536   * ``NETFS_READ_FROM_CACHE``
537   * ``NETFS_INVALID_READ``
538
539   to indicate whether the slice should just be cleared or whether it should be
540   downloaded from the server or read from the cache - or whether slicing
541   should be given up at the current point.
542
543 * ``prepare_read()``
544
545   [Required] Called to configure the next slice of a request.  ->start and
546   ->len in the subrequest indicate where and how big the next slice can be;
547   the cache gets to reduce the length to match its granularity requirements.
548
549 * ``read()``
550
551   [Required] Called to read from the cache.  The start file offset is given
552   along with an iterator to read to, which gives the length also.  It can be
553   given a hint requesting that it seek forward from that start position for
554   data.
555
556   Also provided is a pointer to a termination handler function and private
557   data to pass to that function.  The termination function should be called
558   with the number of bytes transferred or an error code, plus a flag
559   indicating whether the termination is definitely happening in the caller's
560   context.
561
562 * ``prepare_write()``
563
564   [Required] Called to prepare a write to the cache to take place.  This
565   involves checking to see whether the cache has sufficient space to honour
566   the write.  ``*_start`` and ``*_len`` indicate the region to be written; the
567   region can be shrunk or it can be expanded to a page boundary either way as
568   necessary to align for direct I/O.  i_size holds the size of the object and
569   is provided for reference.  no_space_allocated_yet is set to true if the
570   caller is certain that no data has been written to that region - for example
571   if it tried to do a read from there already.
572
573 * ``write()``
574
575   [Required] Called to write to the cache.  The start file offset is given
576   along with an iterator to write from, which gives the length also.
577
578   Also provided is a pointer to a termination handler function and private
579   data to pass to that function.  The termination function should be called
580   with the number of bytes transferred or an error code, plus a flag
581   indicating whether the termination is definitely happening in the caller's
582   context.
583
584 * ``query_occupancy()``
585
586   [Required] Called to find out where the next piece of data is within a
587   particular region of the cache.  The start and length of the region to be
588   queried are passed in, along with the granularity to which the answer needs
589   to be aligned.  The function passes back the start and length of the data,
590   if any, available within that region.  Note that there may be a hole at the
591   front.
592
593   It returns 0 if some data was found, -ENODATA if there was no usable data
594   within the region or -ENOBUFS if there is no caching on this file.
595
596Note that these methods are passed a pointer to the cache resource structure,
597not the read request structure as they could be used in other situations where
598there isn't a read request structure as well, such as writing dirty data to the
599cache.
600
601
602API Function Reference
603======================
604
605.. kernel-doc:: include/linux/netfs.h
606.. kernel-doc:: fs/netfs/buffered_read.c
607.. kernel-doc:: fs/netfs/io.c
608