xref: /openbmc/linux/Documentation/filesystems/caching/fscache.rst (revision c0ecca6604b80e438b032578634c6e133c7028f6)
1.. SPDX-License-Identifier: GPL-2.0
2
3==========================
4General Filesystem Caching
5==========================
6
7Overview
8========
9
10This facility is a general purpose cache for network filesystems, though it
11could be used for caching other things such as ISO9660 filesystems too.
12
13FS-Cache mediates between cache backends (such as CacheFS) and network
14filesystems::
15
16	+---------+
17	|         |                        +--------------+
18	|   NFS   |--+                     |              |
19	|         |  |                 +-->|   CacheFS    |
20	+---------+  |   +----------+  |   |  /dev/hda5   |
21	             |   |          |  |   +--------------+
22	+---------+  +-->|          |  |
23	|         |      |          |--+
24	|   AFS   |----->| FS-Cache |
25	|         |      |          |--+
26	+---------+  +-->|          |  |
27	             |   |          |  |   +--------------+
28	+---------+  |   +----------+  |   |              |
29	|         |  |                 +-->|  CacheFiles  |
30	|  ISOFS  |--+                     |  /var/cache  |
31	|         |                        +--------------+
32	+---------+
33
34Or to look at it another way, FS-Cache is a module that provides a caching
35facility to a network filesystem such that the cache is transparent to the
36user::
37
38	+---------+
39	|         |
40	| Server  |
41	|         |
42	+---------+
43	     |                  NETWORK
44	~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45	     |
46	     |           +----------+
47	     V           |          |
48	+---------+      |          |
49	|         |      |          |
50	|   NFS   |----->| FS-Cache |
51	|         |      |          |--+
52	+---------+      |          |  |   +--------------+   +--------------+
53	     |           |          |  |   |              |   |              |
54	     V           +----------+  +-->|  CacheFiles  |-->|  Ext3        |
55	+---------+                        |  /var/cache  |   |  /dev/sda6   |
56	|         |                        +--------------+   +--------------+
57	|   VFS   |                                ^                     ^
58	|         |                                |                     |
59	+---------+                                +--------------+      |
60	     |                  KERNEL SPACE                      |      |
61	~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
62	     |                  USER SPACE                        |      |
63	     V                                                    |      |
64	+---------+                                           +--------------+
65	|         |                                           |              |
66	| Process |                                           | cachefilesd  |
67	|         |                                           |              |
68	+---------+                                           +--------------+
69
70
71FS-Cache does not follow the idea of completely loading every netfs file
72opened in its entirety into a cache before permitting it to be accessed and
73then serving the pages out of that cache rather than the netfs inode because:
74
75 (1) It must be practical to operate without a cache.
76
77 (2) The size of any accessible file must not be limited to the size of the
78     cache.
79
80 (3) The combined size of all opened files (this includes mapped libraries)
81     must not be limited to the size of the cache.
82
83 (4) The user should not be forced to download an entire file just to do a
84     one-off access of a small portion of it (such as might be done with the
85     "file" program).
86
87It instead serves the cache out in PAGE_SIZE chunks as and when requested by
88the netfs('s) using it.
89
90
91FS-Cache provides the following facilities:
92
93 (1) More than one cache can be used at once.  Caches can be selected
94     explicitly by use of tags.
95
96 (2) Caches can be added / removed at any time.
97
98 (3) The netfs is provided with an interface that allows either party to
99     withdraw caching facilities from a file (required for (2)).
100
101 (4) The interface to the netfs returns as few errors as possible, preferring
102     rather to let the netfs remain oblivious.
103
104 (5) Cookies are used to represent indices, files and other objects to the
105     netfs.  The simplest cookie is just a NULL pointer - indicating nothing
106     cached there.
107
108 (6) The netfs is allowed to propose - dynamically - any index hierarchy it
109     desires, though it must be aware that the index search function is
110     recursive, stack space is limited, and indices can only be children of
111     indices.
112
113 (7) Data I/O is done direct to and from the netfs's pages.  The netfs
114     indicates that page A is at index B of the data-file represented by cookie
115     C, and that it should be read or written.  The cache backend may or may
116     not start I/O on that page, but if it does, a netfs callback will be
117     invoked to indicate completion.  The I/O may be either synchronous or
118     asynchronous.
119
120 (8) Cookies can be "retired" upon release.  At this point FS-Cache will mark
121     them as obsolete and the index hierarchy rooted at that point will get
122     recycled.
123
124 (9) The netfs provides a "match" function for index searches.  In addition to
125     saying whether a match was made or not, this can also specify that an
126     entry should be updated or deleted.
127
128(10) As much as possible is done asynchronously.
129
130
131FS-Cache maintains a virtual indexing tree in which all indices, files, objects
132and pages are kept.  Bits of this tree may actually reside in one or more
133caches::
134
135                                            FSDEF
136                                              |
137                         +------------------------------------+
138                         |                                    |
139                        NFS                                  AFS
140                         |                                    |
141            +--------------------------+                +-----------+
142            |                          |                |           |
143         homedir                     mirror          afs.org   redhat.com
144            |                          |                            |
145      +------------+           +---------------+              +----------+
146      |            |           |               |              |          |
147    00001        00002       00007           00125        vol00001   vol00002
148      |            |           |               |                         |
149  +---+---+     +-----+      +---+      +------+------+            +-----+----+
150  |   |   |     |     |      |   |      |      |      |            |     |    |
151 PG0 PG1 PG2   PG0  XATTR   PG0 PG1   DIRENT DIRENT DIRENT        R/W   R/O  Bak
152                      |                                            |
153                     PG0                                       +-------+
154                                                               |       |
155                                                             00001   00003
156                                                               |
157                                                           +---+---+
158                                                           |   |   |
159                                                          PG0 PG1 PG2
160
161In the example above, you can see two netfs's being backed: NFS and AFS.  These
162have different index hierarchies:
163
164   * The NFS primary index contains per-server indices.  Each server index is
165     indexed by NFS file handles to get data file objects.  Each data file
166     objects can have an array of pages, but may also have further child
167     objects, such as extended attributes and directory entries.  Extended
168     attribute objects themselves have page-array contents.
169
170   * The AFS primary index contains per-cell indices.  Each cell index contains
171     per-logical-volume indices.  Each of volume index contains up to three
172     indices for the read-write, read-only and backup mirrors of those volumes.
173     Each of these contains vnode data file objects, each of which contains an
174     array of pages.
175
176The very top index is the FS-Cache master index in which individual netfs's
177have entries.
178
179Any index object may reside in more than one cache, provided it only has index
180children.  Any index with non-index object children will be assumed to only
181reside in one cache.
182
183
184The netfs API to FS-Cache can be found in:
185
186	Documentation/filesystems/caching/netfs-api.rst
187
188The cache backend API to FS-Cache can be found in:
189
190	Documentation/filesystems/caching/backend-api.rst
191
192A description of the internal representations and object state machine can be
193found in:
194
195	Documentation/filesystems/caching/object.rst
196
197
198Statistical Information
199=======================
200
201If FS-Cache is compiled with the following options enabled::
202
203	CONFIG_FSCACHE_STATS=y
204	CONFIG_FSCACHE_HISTOGRAM=y
205
206then it will gather certain statistics and display them through a number of
207proc files.
208
209/proc/fs/fscache/stats
210----------------------
211
212     This shows counts of a number of events that can happen in FS-Cache:
213
214+--------------+-------+-------------------------------------------------------+
215|CLASS         |EVENT  |MEANING                                                |
216+==============+=======+=======================================================+
217|Cookies       |idx=N  |Number of index cookies allocated                      |
218+              +-------+-------------------------------------------------------+
219|              |dat=N  |Number of data storage cookies allocated               |
220+              +-------+-------------------------------------------------------+
221|              |spc=N  |Number of special cookies allocated                    |
222+--------------+-------+-------------------------------------------------------+
223|Objects       |alc=N  |Number of objects allocated                            |
224+              +-------+-------------------------------------------------------+
225|              |nal=N  |Number of object allocation failures                   |
226+              +-------+-------------------------------------------------------+
227|              |avl=N  |Number of objects that reached the available state     |
228+              +-------+-------------------------------------------------------+
229|              |ded=N  |Number of objects that reached the dead state          |
230+--------------+-------+-------------------------------------------------------+
231|ChkAux        |non=N  |Number of objects that didn't have a coherency check   |
232+              +-------+-------------------------------------------------------+
233|              |ok=N   |Number of objects that passed a coherency check        |
234+              +-------+-------------------------------------------------------+
235|              |upd=N  |Number of objects that needed a coherency data update  |
236+              +-------+-------------------------------------------------------+
237|              |obs=N  |Number of objects that were declared obsolete          |
238+--------------+-------+-------------------------------------------------------+
239|Pages         |mrk=N  |Number of pages marked as being cached                 |
240|              |unc=N  |Number of uncache page requests seen                   |
241+--------------+-------+-------------------------------------------------------+
242|Acquire       |n=N    |Number of acquire cookie requests seen                 |
243+              +-------+-------------------------------------------------------+
244|              |nul=N  |Number of acq reqs given a NULL parent                 |
245+              +-------+-------------------------------------------------------+
246|              |noc=N  |Number of acq reqs rejected due to no cache available  |
247+              +-------+-------------------------------------------------------+
248|              |ok=N   |Number of acq reqs succeeded                           |
249+              +-------+-------------------------------------------------------+
250|              |nbf=N  |Number of acq reqs rejected due to error               |
251+              +-------+-------------------------------------------------------+
252|              |oom=N  |Number of acq reqs failed on ENOMEM                    |
253+--------------+-------+-------------------------------------------------------+
254|Lookups       |n=N    |Number of lookup calls made on cache backends          |
255+              +-------+-------------------------------------------------------+
256|              |neg=N  |Number of negative lookups made                        |
257+              +-------+-------------------------------------------------------+
258|              |pos=N  |Number of positive lookups made                        |
259+              +-------+-------------------------------------------------------+
260|              |crt=N  |Number of objects created by lookup                    |
261+              +-------+-------------------------------------------------------+
262|              |tmo=N  |Number of lookups timed out and requeued               |
263+--------------+-------+-------------------------------------------------------+
264|Updates       |n=N    |Number of update cookie requests seen                  |
265+              +-------+-------------------------------------------------------+
266|              |nul=N  |Number of upd reqs given a NULL parent                 |
267+              +-------+-------------------------------------------------------+
268|              |run=N  |Number of upd reqs granted CPU time                    |
269+--------------+-------+-------------------------------------------------------+
270|Relinqs       |n=N    |Number of relinquish cookie requests seen              |
271+              +-------+-------------------------------------------------------+
272|              |nul=N  |Number of rlq reqs given a NULL parent                 |
273+              +-------+-------------------------------------------------------+
274|              |wcr=N  |Number of rlq reqs waited on completion of creation    |
275+--------------+-------+-------------------------------------------------------+
276|AttrChg       |n=N    |Number of attribute changed requests seen              |
277+              +-------+-------------------------------------------------------+
278|              |ok=N   |Number of attr changed requests queued                 |
279+              +-------+-------------------------------------------------------+
280|              |nbf=N  |Number of attr changed rejected -ENOBUFS               |
281+              +-------+-------------------------------------------------------+
282|              |oom=N  |Number of attr changed failed -ENOMEM                  |
283+              +-------+-------------------------------------------------------+
284|              |run=N  |Number of attr changed ops given CPU time              |
285+--------------+-------+-------------------------------------------------------+
286|Allocs        |n=N    |Number of allocation requests seen                     |
287+              +-------+-------------------------------------------------------+
288|              |ok=N   |Number of successful alloc reqs                        |
289+              +-------+-------------------------------------------------------+
290|              |wt=N   |Number of alloc reqs that waited on lookup completion  |
291+              +-------+-------------------------------------------------------+
292|              |nbf=N  |Number of alloc reqs rejected -ENOBUFS                 |
293+              +-------+-------------------------------------------------------+
294|              |int=N  |Number of alloc reqs aborted -ERESTARTSYS              |
295+              +-------+-------------------------------------------------------+
296|              |ops=N  |Number of alloc reqs submitted                         |
297+              +-------+-------------------------------------------------------+
298|              |owt=N  |Number of alloc reqs waited for CPU time               |
299+              +-------+-------------------------------------------------------+
300|              |abt=N  |Number of alloc reqs aborted due to object death       |
301+--------------+-------+-------------------------------------------------------+
302|Retrvls       |n=N    |Number of retrieval (read) requests seen               |
303+              +-------+-------------------------------------------------------+
304|              |ok=N   |Number of successful retr reqs                         |
305+              +-------+-------------------------------------------------------+
306|              |wt=N   |Number of retr reqs that waited on lookup completion   |
307+              +-------+-------------------------------------------------------+
308|              |nod=N  |Number of retr reqs returned -ENODATA                  |
309+              +-------+-------------------------------------------------------+
310|              |nbf=N  |Number of retr reqs rejected -ENOBUFS                  |
311+              +-------+-------------------------------------------------------+
312|              |int=N  |Number of retr reqs aborted -ERESTARTSYS               |
313+              +-------+-------------------------------------------------------+
314|              |oom=N  |Number of retr reqs failed -ENOMEM                     |
315+              +-------+-------------------------------------------------------+
316|              |ops=N  |Number of retr reqs submitted                          |
317+              +-------+-------------------------------------------------------+
318|              |owt=N  |Number of retr reqs waited for CPU time                |
319+              +-------+-------------------------------------------------------+
320|              |abt=N  |Number of retr reqs aborted due to object death        |
321+--------------+-------+-------------------------------------------------------+
322|Stores        |n=N    |Number of storage (write) requests seen                |
323+              +-------+-------------------------------------------------------+
324|              |ok=N   |Number of successful store reqs                        |
325+              +-------+-------------------------------------------------------+
326|              |agn=N  |Number of store reqs on a page already pending storage |
327+              +-------+-------------------------------------------------------+
328|              |nbf=N  |Number of store reqs rejected -ENOBUFS                 |
329+              +-------+-------------------------------------------------------+
330|              |oom=N  |Number of store reqs failed -ENOMEM                    |
331+              +-------+-------------------------------------------------------+
332|              |ops=N  |Number of store reqs submitted                         |
333+              +-------+-------------------------------------------------------+
334|              |run=N  |Number of store reqs granted CPU time                  |
335+              +-------+-------------------------------------------------------+
336|              |pgs=N  |Number of pages given store req processing time        |
337+              +-------+-------------------------------------------------------+
338|              |rxd=N  |Number of store reqs deleted from tracking tree        |
339+              +-------+-------------------------------------------------------+
340|              |olm=N  |Number of store reqs over store limit                  |
341+--------------+-------+-------------------------------------------------------+
342|VmScan        |nos=N  |Number of release reqs against pages with no           |
343|              |       |pending store                                          |
344+              +-------+-------------------------------------------------------+
345|              |gon=N  |Number of release reqs against pages stored by         |
346|              |       |time lock granted                                      |
347+              +-------+-------------------------------------------------------+
348|              |bsy=N  |Number of release reqs ignored due to in-progress store|
349+              +-------+-------------------------------------------------------+
350|              |can=N  |Number of page stores cancelled due to release req     |
351+--------------+-------+-------------------------------------------------------+
352|Ops           |pend=N |Number of times async ops added to pending queues      |
353+              +-------+-------------------------------------------------------+
354|              |run=N  |Number of times async ops given CPU time               |
355+              +-------+-------------------------------------------------------+
356|              |enq=N  |Number of times async ops queued for processing        |
357+              +-------+-------------------------------------------------------+
358|              |can=N  |Number of async ops cancelled                          |
359+              +-------+-------------------------------------------------------+
360|              |rej=N  |Number of async ops rejected due to object             |
361|              |       |lookup/create failure                                  |
362+              +-------+-------------------------------------------------------+
363|              |ini=N  |Number of async ops initialised                        |
364+              +-------+-------------------------------------------------------+
365|              |dfr=N  |Number of async ops queued for deferred release        |
366+              +-------+-------------------------------------------------------+
367|              |rel=N  |Number of async ops released                           |
368|              |       |(should equal ini=N when idle)                         |
369+              +-------+-------------------------------------------------------+
370|              |gc=N   |Number of deferred-release async ops garbage collected |
371+--------------+-------+-------------------------------------------------------+
372|CacheOp       |alo=N  |Number of in-progress alloc_object() cache ops         |
373+              +-------+-------------------------------------------------------+
374|              |luo=N  |Number of in-progress lookup_object() cache ops        |
375+              +-------+-------------------------------------------------------+
376|              |luc=N  |Number of in-progress lookup_complete() cache ops      |
377+              +-------+-------------------------------------------------------+
378|              |gro=N  |Number of in-progress grab_object() cache ops          |
379+              +-------+-------------------------------------------------------+
380|              |upo=N  |Number of in-progress update_object() cache ops        |
381+              +-------+-------------------------------------------------------+
382|              |dro=N  |Number of in-progress drop_object() cache ops          |
383+              +-------+-------------------------------------------------------+
384|              |pto=N  |Number of in-progress put_object() cache ops           |
385+              +-------+-------------------------------------------------------+
386|              |syn=N  |Number of in-progress sync_cache() cache ops           |
387+              +-------+-------------------------------------------------------+
388|              |atc=N  |Number of in-progress attr_changed() cache ops         |
389+              +-------+-------------------------------------------------------+
390|              |rap=N  |Number of in-progress read_or_alloc_page() cache ops   |
391+              +-------+-------------------------------------------------------+
392|              |ras=N  |Number of in-progress read_or_alloc_pages() cache ops  |
393+              +-------+-------------------------------------------------------+
394|              |alp=N  |Number of in-progress allocate_page() cache ops        |
395+              +-------+-------------------------------------------------------+
396|              |als=N  |Number of in-progress allocate_pages() cache ops       |
397+              +-------+-------------------------------------------------------+
398|              |wrp=N  |Number of in-progress write_page() cache ops           |
399+              +-------+-------------------------------------------------------+
400|              |ucp=N  |Number of in-progress uncache_page() cache ops         |
401+              +-------+-------------------------------------------------------+
402|              |dsp=N  |Number of in-progress dissociate_pages() cache ops     |
403+--------------+-------+-------------------------------------------------------+
404|CacheEv       |nsp=N  |Number of object lookups/creations rejected due to     |
405|              |       |lack of space                                          |
406+              +-------+-------------------------------------------------------+
407|              |stl=N  |Number of stale objects deleted                        |
408+              +-------+-------------------------------------------------------+
409|              |rtr=N  |Number of objects retired when relinquished            |
410+              +-------+-------------------------------------------------------+
411|              |cul=N  |Number of objects culled                               |
412+--------------+-------+-------------------------------------------------------+
413
414
415
416/proc/fs/fscache/histogram
417--------------------------
418
419     ::
420
421	cat /proc/fs/fscache/histogram
422	JIFS  SECS  OBJ INST  OP RUNS   OBJ RUNS  RETRV DLY RETRIEVLS
423	===== ===== ========= ========= ========= ========= =========
424
425     This shows the breakdown of the number of times each amount of time
426     between 0 jiffies and HZ-1 jiffies a variety of tasks took to run.  The
427     columns are as follows:
428
429	=========	=======================================================
430	COLUMN		TIME MEASUREMENT
431	=========	=======================================================
432	OBJ INST	Length of time to instantiate an object
433	OP RUNS		Length of time a call to process an operation took
434	OBJ RUNS	Length of time a call to process an object event took
435	RETRV DLY	Time between an requesting a read and lookup completing
436	RETRIEVLS	Time between beginning and end of a retrieval
437	=========	=======================================================
438
439     Each row shows the number of events that took a particular range of times.
440     Each step is 1 jiffy in size.  The JIFS column indicates the particular
441     jiffy range covered, and the SECS field the equivalent number of seconds.
442
443
444
445Object List
446===========
447
448If CONFIG_FSCACHE_OBJECT_LIST is enabled, the FS-Cache facility will maintain a
449list of all the objects currently allocated and allow them to be viewed
450through::
451
452	/proc/fs/fscache/objects
453
454This will look something like::
455
456	[root@andromeda ~]# head /proc/fs/fscache/objects
457	OBJECT   PARENT   STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA       OBJECT_KEY, AUX_DATA
458	======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================
459	   17e4b        2 ACTV     0   0   0   0  0     0 7b  4 0 0 | NFS.fh           DT  0 ffff88001dd82820 010006017edcf8bbc93b43298fdfbe71e50b57b13a172c0117f38472, e567634700000000000000000000000063f2404a000000000000000000000000c9030000000000000000000063f2404a
460	   1693a        2 ACTV     0   0   0   0  0     0 7b  4 0 0 | NFS.fh           DT  0 ffff88002db23380 010006017edcf8bbc93b43298fdfbe71e50b57b1e0162c01a2df0ea6, 420ebc4a000000000000000000000000420ebc4a0000000000000000000000000e1801000000000000000000420ebc4a
461
462where the first set of columns before the '|' describe the object:
463
464	=======	===============================================================
465	COLUMN	DESCRIPTION
466	=======	===============================================================
467	OBJECT	Object debugging ID (appears as OBJ%x in some debug messages)
468	PARENT	Debugging ID of parent object
469	STAT	Object state
470	CHLDN	Number of child objects of this object
471	OPS	Number of outstanding operations on this object
472	OOP	Number of outstanding child object management operations
473	IPR
474	EX	Number of outstanding exclusive operations
475	READS	Number of outstanding read operations
476	EM	Object's event mask
477	EV	Events raised on this object
478	F	Object flags
479	S	Object work item busy state mask (1:pending 2:running)
480	=======	===============================================================
481
482and the second set of columns describe the object's cookie, if present:
483
484	================ ======================================================
485	COLUMN		 DESCRIPTION
486	================ ======================================================
487	NETFS_COOKIE_DEF Name of netfs cookie definition
488	TY		 Cookie type (IX - index, DT - data, hex - special)
489	FL		 Cookie flags
490	NETFS_DATA	 Netfs private data stored in the cookie
491	OBJECT_KEY	 Object key } 1 column, with separating comma
492	AUX_DATA	 Object aux data } presence may be configured
493	================ ======================================================
494
495The data shown may be filtered by attaching the a key to an appropriate keyring
496before viewing the file.  Something like::
497
498		keyctl add user fscache:objlist <restrictions> @s
499
500where <restrictions> are a selection of the following letters:
501
502	==	=========================================================
503	K	Show hexdump of object key (don't show if not given)
504	A	Show hexdump of object aux data (don't show if not given)
505	==	=========================================================
506
507and the following paired letters:
508
509	==	=========================================================
510	C	Show objects that have a cookie
511	c	Show objects that don't have a cookie
512	B	Show objects that are busy
513	b	Show objects that aren't busy
514	W	Show objects that have pending writes
515	w	Show objects that don't have pending writes
516	R	Show objects that have outstanding reads
517	r	Show objects that don't have outstanding reads
518	S	Show objects that have work queued
519	s	Show objects that don't have work queued
520	==	=========================================================
521
522If neither side of a letter pair is given, then both are implied.  For example:
523
524	keyctl add user fscache:objlist KB @s
525
526shows objects that are busy, and lists their object keys, but does not dump
527their auxiliary data.  It also implies "CcWwRrSs", but as 'B' is given, 'b' is
528not implied.
529
530By default all objects and all fields will be shown.
531
532
533Debugging
534=========
535
536If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
537debugging enabled by adjusting the value in::
538
539	/sys/module/fscache/parameters/debug
540
541This is a bitmask of debugging streams to enable:
542
543	=======	=======	===============================	=======================
544	BIT	VALUE	STREAM				POINT
545	=======	=======	===============================	=======================
546	0	1	Cache management		Function entry trace
547	1	2					Function exit trace
548	2	4					General
549	3	8	Cookie management		Function entry trace
550	4	16					Function exit trace
551	5	32					General
552	6	64	Page handling			Function entry trace
553	7	128					Function exit trace
554	8	256					General
555	9	512	Operation management		Function entry trace
556	10	1024					Function exit trace
557	11	2048					General
558	=======	=======	===============================	=======================
559
560The appropriate set of values should be OR'd together and the result written to
561the control file.  For example::
562
563	echo $((1|8|64)) >/sys/module/fscache/parameters/debug
564
565will turn on all function entry debugging.
566