xref: /openbmc/linux/Documentation/filesystems/caching/cachefiles.rst (revision c900529f3d9161bfde5cca0754f83b4d3c3e0220)
1d74802adSMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2d74802adSMauro Carvalho Chehab
3e0484344SDavid Howells===================================
4e0484344SDavid HowellsCache on Already Mounted Filesystem
5e0484344SDavid Howells===================================
6d74802adSMauro Carvalho Chehab
7d74802adSMauro Carvalho Chehab.. Contents:
8d74802adSMauro Carvalho Chehab
9d74802adSMauro Carvalho Chehab (*) Overview.
10d74802adSMauro Carvalho Chehab
11d74802adSMauro Carvalho Chehab (*) Requirements.
12d74802adSMauro Carvalho Chehab
13d74802adSMauro Carvalho Chehab (*) Configuration.
14d74802adSMauro Carvalho Chehab
15d74802adSMauro Carvalho Chehab (*) Starting the cache.
16d74802adSMauro Carvalho Chehab
17d74802adSMauro Carvalho Chehab (*) Things to avoid.
18d74802adSMauro Carvalho Chehab
19d74802adSMauro Carvalho Chehab (*) Cache culling.
20d74802adSMauro Carvalho Chehab
21d74802adSMauro Carvalho Chehab (*) Cache structure.
22d74802adSMauro Carvalho Chehab
23d74802adSMauro Carvalho Chehab (*) Security model and SELinux.
24d74802adSMauro Carvalho Chehab
25d74802adSMauro Carvalho Chehab (*) A note on security.
26d74802adSMauro Carvalho Chehab
27d74802adSMauro Carvalho Chehab (*) Statistical information.
28d74802adSMauro Carvalho Chehab
29d74802adSMauro Carvalho Chehab (*) Debugging.
30d74802adSMauro Carvalho Chehab
3199302ebdSJeffle Xu (*) On-demand Read.
32d74802adSMauro Carvalho Chehab
33d74802adSMauro Carvalho Chehab
34d74802adSMauro Carvalho ChehabOverview
35d74802adSMauro Carvalho Chehab========
36d74802adSMauro Carvalho Chehab
37d74802adSMauro Carvalho ChehabCacheFiles is a caching backend that's meant to use as a cache a directory on
38d74802adSMauro Carvalho Chehaban already mounted filesystem of a local type (such as Ext3).
39d74802adSMauro Carvalho Chehab
40d74802adSMauro Carvalho ChehabCacheFiles uses a userspace daemon to do some of the cache management - such as
41d74802adSMauro Carvalho Chehabreaping stale nodes and culling.  This is called cachefilesd and lives in
42d74802adSMauro Carvalho Chehab/sbin.
43d74802adSMauro Carvalho Chehab
44d74802adSMauro Carvalho ChehabThe filesystem and data integrity of the cache are only as good as those of the
45d74802adSMauro Carvalho Chehabfilesystem providing the backing services.  Note that CacheFiles does not
46d74802adSMauro Carvalho Chehabattempt to journal anything since the journalling interfaces of the various
47d74802adSMauro Carvalho Chehabfilesystems are very specific in nature.
48d74802adSMauro Carvalho Chehab
49d74802adSMauro Carvalho ChehabCacheFiles creates a misc character device - "/dev/cachefiles" - that is used
50d74802adSMauro Carvalho Chehabto communication with the daemon.  Only one thing may have this open at once,
51d74802adSMauro Carvalho Chehaband while it is open, a cache is at least partially in existence.  The daemon
52d74802adSMauro Carvalho Chehabopens this and sends commands down it to control the cache.
53d74802adSMauro Carvalho Chehab
54d74802adSMauro Carvalho ChehabCacheFiles is currently limited to a single cache.
55d74802adSMauro Carvalho Chehab
56d74802adSMauro Carvalho ChehabCacheFiles attempts to maintain at least a certain percentage of free space on
57d74802adSMauro Carvalho Chehabthe filesystem, shrinking the cache by culling the objects it contains to make
58d74802adSMauro Carvalho Chehabspace if necessary - see the "Cache Culling" section.  This means it can be
59d74802adSMauro Carvalho Chehabplaced on the same medium as a live set of data, and will expand to make use of
60d74802adSMauro Carvalho Chehabspare space and automatically contract when the set of data requires more
61d74802adSMauro Carvalho Chehabspace.
62d74802adSMauro Carvalho Chehab
63d74802adSMauro Carvalho Chehab
64d74802adSMauro Carvalho Chehab
65d74802adSMauro Carvalho ChehabRequirements
66d74802adSMauro Carvalho Chehab============
67d74802adSMauro Carvalho Chehab
68d74802adSMauro Carvalho ChehabThe use of CacheFiles and its daemon requires the following features to be
69d74802adSMauro Carvalho Chehabavailable in the system and in the cache filesystem:
70d74802adSMauro Carvalho Chehab
71d74802adSMauro Carvalho Chehab	- dnotify.
72d74802adSMauro Carvalho Chehab
73d74802adSMauro Carvalho Chehab	- extended attributes (xattrs).
74d74802adSMauro Carvalho Chehab
75d74802adSMauro Carvalho Chehab	- openat() and friends.
76d74802adSMauro Carvalho Chehab
77d74802adSMauro Carvalho Chehab	- bmap() support on files in the filesystem (FIBMAP ioctl).
78d74802adSMauro Carvalho Chehab
79d74802adSMauro Carvalho Chehab	- The use of bmap() to detect a partial page at the end of the file.
80d74802adSMauro Carvalho Chehab
81d74802adSMauro Carvalho ChehabIt is strongly recommended that the "dir_index" option is enabled on Ext3
82d74802adSMauro Carvalho Chehabfilesystems being used as a cache.
83d74802adSMauro Carvalho Chehab
84d74802adSMauro Carvalho Chehab
85d74802adSMauro Carvalho ChehabConfiguration
86d74802adSMauro Carvalho Chehab=============
87d74802adSMauro Carvalho Chehab
88d74802adSMauro Carvalho ChehabThe cache is configured by a script in /etc/cachefilesd.conf.  These commands
89d74802adSMauro Carvalho Chehabset up cache ready for use.  The following script commands are available:
90d74802adSMauro Carvalho Chehab
91d74802adSMauro Carvalho Chehab brun <N>%, bcull <N>%, bstop <N>%, frun <N>%, fcull <N>%, fstop <N>%
92d74802adSMauro Carvalho Chehab	Configure the culling limits.  Optional.  See the section on culling
93d74802adSMauro Carvalho Chehab	The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
94d74802adSMauro Carvalho Chehab
95d74802adSMauro Carvalho Chehab	The commands beginning with a 'b' are file space (block) limits, those
96d74802adSMauro Carvalho Chehab	beginning with an 'f' are file count limits.
97d74802adSMauro Carvalho Chehab
98d74802adSMauro Carvalho Chehab dir <path>
99d74802adSMauro Carvalho Chehab	Specify the directory containing the root of the cache.  Mandatory.
100d74802adSMauro Carvalho Chehab
101d74802adSMauro Carvalho Chehab tag <name>
102d74802adSMauro Carvalho Chehab	Specify a tag to FS-Cache to use in distinguishing multiple caches.
103d74802adSMauro Carvalho Chehab	Optional.  The default is "CacheFiles".
104d74802adSMauro Carvalho Chehab
105d74802adSMauro Carvalho Chehab debug <mask>
106d74802adSMauro Carvalho Chehab	Specify a numeric bitmask to control debugging in the kernel module.
107d74802adSMauro Carvalho Chehab	Optional.  The default is zero (all off).  The following values can be
108d74802adSMauro Carvalho Chehab	OR'd into the mask to collect various information:
109d74802adSMauro Carvalho Chehab
110d74802adSMauro Carvalho Chehab		==	=================================================
111d74802adSMauro Carvalho Chehab		1	Turn on trace of function entry (_enter() macros)
112d74802adSMauro Carvalho Chehab		2	Turn on trace of function exit (_leave() macros)
113d74802adSMauro Carvalho Chehab		4	Turn on trace of internal debug points (_debug())
114d74802adSMauro Carvalho Chehab		==	=================================================
115d74802adSMauro Carvalho Chehab
116d74802adSMauro Carvalho Chehab	This mask can also be set through sysfs, eg::
117d74802adSMauro Carvalho Chehab
118d74802adSMauro Carvalho Chehab		echo 5 >/sys/modules/cachefiles/parameters/debug
119d74802adSMauro Carvalho Chehab
120d74802adSMauro Carvalho Chehab
121d74802adSMauro Carvalho ChehabStarting the Cache
122d74802adSMauro Carvalho Chehab==================
123d74802adSMauro Carvalho Chehab
124d74802adSMauro Carvalho ChehabThe cache is started by running the daemon.  The daemon opens the cache device,
125d74802adSMauro Carvalho Chehabconfigures the cache and tells it to begin caching.  At that point the cache
126d74802adSMauro Carvalho Chehabbinds to fscache and the cache becomes live.
127d74802adSMauro Carvalho Chehab
128d74802adSMauro Carvalho ChehabThe daemon is run as follows::
129d74802adSMauro Carvalho Chehab
130d74802adSMauro Carvalho Chehab	/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
131d74802adSMauro Carvalho Chehab
132d74802adSMauro Carvalho ChehabThe flags are:
133d74802adSMauro Carvalho Chehab
134d74802adSMauro Carvalho Chehab ``-d``
135d74802adSMauro Carvalho Chehab	Increase the debugging level.  This can be specified multiple times and
136d74802adSMauro Carvalho Chehab	is cumulative with itself.
137d74802adSMauro Carvalho Chehab
138d74802adSMauro Carvalho Chehab ``-s``
139d74802adSMauro Carvalho Chehab	Send messages to stderr instead of syslog.
140d74802adSMauro Carvalho Chehab
141d74802adSMauro Carvalho Chehab ``-n``
142d74802adSMauro Carvalho Chehab	Don't daemonise and go into background.
143d74802adSMauro Carvalho Chehab
144d74802adSMauro Carvalho Chehab ``-f <configfile>``
145d74802adSMauro Carvalho Chehab	Use an alternative configuration file rather than the default one.
146d74802adSMauro Carvalho Chehab
147d74802adSMauro Carvalho Chehab
148d74802adSMauro Carvalho ChehabThings to Avoid
149d74802adSMauro Carvalho Chehab===============
150d74802adSMauro Carvalho Chehab
151d74802adSMauro Carvalho ChehabDo not mount other things within the cache as this will cause problems.  The
152d74802adSMauro Carvalho Chehabkernel module contains its own very cut-down path walking facility that ignores
153d74802adSMauro Carvalho Chehabmountpoints, but the daemon can't avoid them.
154d74802adSMauro Carvalho Chehab
155d74802adSMauro Carvalho ChehabDo not create, rename or unlink files and directories in the cache while the
156d74802adSMauro Carvalho Chehabcache is active, as this may cause the state to become uncertain.
157d74802adSMauro Carvalho Chehab
158d74802adSMauro Carvalho ChehabRenaming files in the cache might make objects appear to be other objects (the
159d74802adSMauro Carvalho Chehabfilename is part of the lookup key).
160d74802adSMauro Carvalho Chehab
161d74802adSMauro Carvalho ChehabDo not change or remove the extended attributes attached to cache files by the
162d74802adSMauro Carvalho Chehabcache as this will cause the cache state management to get confused.
163d74802adSMauro Carvalho Chehab
164d74802adSMauro Carvalho ChehabDo not create files or directories in the cache, lest the cache get confused or
165d74802adSMauro Carvalho Chehabserve incorrect data.
166d74802adSMauro Carvalho Chehab
167d74802adSMauro Carvalho ChehabDo not chmod files in the cache.  The module creates things with minimal
168d74802adSMauro Carvalho Chehabpermissions to prevent random users being able to access them directly.
169d74802adSMauro Carvalho Chehab
170d74802adSMauro Carvalho Chehab
171d74802adSMauro Carvalho ChehabCache Culling
172d74802adSMauro Carvalho Chehab=============
173d74802adSMauro Carvalho Chehab
174d74802adSMauro Carvalho ChehabThe cache may need culling occasionally to make space.  This involves
175d74802adSMauro Carvalho Chehabdiscarding objects from the cache that have been used less recently than
176d74802adSMauro Carvalho Chehabanything else.  Culling is based on the access time of data objects.  Empty
177d74802adSMauro Carvalho Chehabdirectories are culled if not in use.
178d74802adSMauro Carvalho Chehab
179d74802adSMauro Carvalho ChehabCache culling is done on the basis of the percentage of blocks and the
180d74802adSMauro Carvalho Chehabpercentage of files available in the underlying filesystem.  There are six
181d74802adSMauro Carvalho Chehab"limits":
182d74802adSMauro Carvalho Chehab
183d74802adSMauro Carvalho Chehab brun, frun
184d74802adSMauro Carvalho Chehab     If the amount of free space and the number of available files in the cache
185d74802adSMauro Carvalho Chehab     rises above both these limits, then culling is turned off.
186d74802adSMauro Carvalho Chehab
187d74802adSMauro Carvalho Chehab bcull, fcull
188d74802adSMauro Carvalho Chehab     If the amount of available space or the number of available files in the
189d74802adSMauro Carvalho Chehab     cache falls below either of these limits, then culling is started.
190d74802adSMauro Carvalho Chehab
191d74802adSMauro Carvalho Chehab bstop, fstop
192d74802adSMauro Carvalho Chehab     If the amount of available space or the number of available files in the
193d74802adSMauro Carvalho Chehab     cache falls below either of these limits, then no further allocation of
194d74802adSMauro Carvalho Chehab     disk space or files is permitted until culling has raised things above
195d74802adSMauro Carvalho Chehab     these limits again.
196d74802adSMauro Carvalho Chehab
197d74802adSMauro Carvalho ChehabThese must be configured thusly::
198d74802adSMauro Carvalho Chehab
199d74802adSMauro Carvalho Chehab	0 <= bstop < bcull < brun < 100
200d74802adSMauro Carvalho Chehab	0 <= fstop < fcull < frun < 100
201d74802adSMauro Carvalho Chehab
202d74802adSMauro Carvalho ChehabNote that these are percentages of available space and available files, and do
203d74802adSMauro Carvalho Chehab_not_ appear as 100 minus the percentage displayed by the "df" program.
204d74802adSMauro Carvalho Chehab
205d74802adSMauro Carvalho ChehabThe userspace daemon scans the cache to build up a table of cullable objects.
206d74802adSMauro Carvalho ChehabThese are then culled in least recently used order.  A new scan of the cache is
207d74802adSMauro Carvalho Chehabstarted as soon as space is made in the table.  Objects will be skipped if
208d74802adSMauro Carvalho Chehabtheir atimes have changed or if the kernel module says it is still using them.
209d74802adSMauro Carvalho Chehab
210d74802adSMauro Carvalho Chehab
211d74802adSMauro Carvalho ChehabCache Structure
212d74802adSMauro Carvalho Chehab===============
213d74802adSMauro Carvalho Chehab
214d74802adSMauro Carvalho ChehabThe CacheFiles module will create two directories in the directory it was
215d74802adSMauro Carvalho Chehabgiven:
216d74802adSMauro Carvalho Chehab
217d74802adSMauro Carvalho Chehab * cache/
218d74802adSMauro Carvalho Chehab * graveyard/
219d74802adSMauro Carvalho Chehab
220d74802adSMauro Carvalho ChehabThe active cache objects all reside in the first directory.  The CacheFiles
221d74802adSMauro Carvalho Chehabkernel module moves any retired or culled objects that it can't simply unlink
222d74802adSMauro Carvalho Chehabto the graveyard from which the daemon will actually delete them.
223d74802adSMauro Carvalho Chehab
224d74802adSMauro Carvalho ChehabThe daemon uses dnotify to monitor the graveyard directory, and will delete
225d74802adSMauro Carvalho Chehabanything that appears therein.
226d74802adSMauro Carvalho Chehab
227d74802adSMauro Carvalho Chehab
228d74802adSMauro Carvalho ChehabThe module represents index objects as directories with the filename "I..." or
229d74802adSMauro Carvalho Chehab"J...".  Note that the "cache/" directory is itself a special index.
230d74802adSMauro Carvalho Chehab
231d74802adSMauro Carvalho ChehabData objects are represented as files if they have no children, or directories
232d74802adSMauro Carvalho Chehabif they do.  Their filenames all begin "D..." or "E...".  If represented as a
233d74802adSMauro Carvalho Chehabdirectory, data objects will have a file in the directory called "data" that
234d74802adSMauro Carvalho Chehabactually holds the data.
235d74802adSMauro Carvalho Chehab
236d74802adSMauro Carvalho ChehabSpecial objects are similar to data objects, except their filenames begin
237d74802adSMauro Carvalho Chehab"S..." or "T...".
238d74802adSMauro Carvalho Chehab
239d74802adSMauro Carvalho Chehab
240d74802adSMauro Carvalho ChehabIf an object has children, then it will be represented as a directory.
241d74802adSMauro Carvalho ChehabImmediately in the representative directory are a collection of directories
242d74802adSMauro Carvalho Chehabnamed for hash values of the child object keys with an '@' prepended.  Into
243d74802adSMauro Carvalho Chehabthis directory, if possible, will be placed the representations of the child
244d74802adSMauro Carvalho Chehabobjects::
245d74802adSMauro Carvalho Chehab
246d74802adSMauro Carvalho Chehab	 /INDEX    /INDEX     /INDEX                            /DATA FILES
247d74802adSMauro Carvalho Chehab	/=========/==========/=================================/================
248d74802adSMauro Carvalho Chehab	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
249d74802adSMauro Carvalho Chehab	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
250d74802adSMauro Carvalho Chehab	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
251d74802adSMauro Carvalho Chehab	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
252d74802adSMauro Carvalho Chehab
253d74802adSMauro Carvalho Chehab
254d74802adSMauro Carvalho ChehabIf the key is so long that it exceeds NAME_MAX with the decorations added on to
255d74802adSMauro Carvalho Chehabit, then it will be cut into pieces, the first few of which will be used to
256d74802adSMauro Carvalho Chehabmake a nest of directories, and the last one of which will be the objects
257d74802adSMauro Carvalho Chehabinside the last directory.  The names of the intermediate directories will have
258d74802adSMauro Carvalho Chehab'+' prepended::
259d74802adSMauro Carvalho Chehab
260d74802adSMauro Carvalho Chehab	J1223/@23/+xy...z/+kl...m/Epqr
261d74802adSMauro Carvalho Chehab
262d74802adSMauro Carvalho Chehab
263d74802adSMauro Carvalho ChehabNote that keys are raw data, and not only may they exceed NAME_MAX in size,
264d74802adSMauro Carvalho Chehabthey may also contain things like '/' and NUL characters, and so they may not
265d74802adSMauro Carvalho Chehabbe suitable for turning directly into a filename.
266d74802adSMauro Carvalho Chehab
267d74802adSMauro Carvalho ChehabTo handle this, CacheFiles will use a suitably printable filename directly and
268d74802adSMauro Carvalho Chehab"base-64" encode ones that aren't directly suitable.  The two versions of
269d74802adSMauro Carvalho Chehabobject filenames indicate the encoding:
270d74802adSMauro Carvalho Chehab
271d74802adSMauro Carvalho Chehab	===============	===============	===============
272d74802adSMauro Carvalho Chehab	OBJECT TYPE	PRINTABLE	ENCODED
273d74802adSMauro Carvalho Chehab	===============	===============	===============
274d74802adSMauro Carvalho Chehab	Index		"I..."		"J..."
275d74802adSMauro Carvalho Chehab	Data		"D..."		"E..."
276d74802adSMauro Carvalho Chehab	Special		"S..."		"T..."
277d74802adSMauro Carvalho Chehab	===============	===============	===============
278d74802adSMauro Carvalho Chehab
279d74802adSMauro Carvalho ChehabIntermediate directories are always "@" or "+" as appropriate.
280d74802adSMauro Carvalho Chehab
281d74802adSMauro Carvalho Chehab
282d74802adSMauro Carvalho ChehabEach object in the cache has an extended attribute label that holds the object
283d74802adSMauro Carvalho Chehabtype ID (required to distinguish special objects) and the auxiliary data from
284d74802adSMauro Carvalho Chehabthe netfs.  The latter is used to detect stale objects in the cache and update
285d74802adSMauro Carvalho Chehabor retire them.
286d74802adSMauro Carvalho Chehab
287d74802adSMauro Carvalho Chehab
288d74802adSMauro Carvalho ChehabNote that CacheFiles will erase from the cache any file it doesn't recognise or
289d74802adSMauro Carvalho Chehabany file of an incorrect type (such as a FIFO file or a device file).
290d74802adSMauro Carvalho Chehab
291d74802adSMauro Carvalho Chehab
292d74802adSMauro Carvalho ChehabSecurity Model and SELinux
293d74802adSMauro Carvalho Chehab==========================
294d74802adSMauro Carvalho Chehab
295d74802adSMauro Carvalho ChehabCacheFiles is implemented to deal properly with the LSM security features of
296d74802adSMauro Carvalho Chehabthe Linux kernel and the SELinux facility.
297d74802adSMauro Carvalho Chehab
298d74802adSMauro Carvalho ChehabOne of the problems that CacheFiles faces is that it is generally acting on
299d74802adSMauro Carvalho Chehabbehalf of a process, and running in that process's context, and that includes a
300d74802adSMauro Carvalho Chehabsecurity context that is not appropriate for accessing the cache - either
301d74802adSMauro Carvalho Chehabbecause the files in the cache are inaccessible to that process, or because if
302d74802adSMauro Carvalho Chehabthe process creates a file in the cache, that file may be inaccessible to other
303d74802adSMauro Carvalho Chehabprocesses.
304d74802adSMauro Carvalho Chehab
305d74802adSMauro Carvalho ChehabThe way CacheFiles works is to temporarily change the security context (fsuid,
306d74802adSMauro Carvalho Chehabfsgid and actor security label) that the process acts as - without changing the
307d74802adSMauro Carvalho Chehabsecurity context of the process when it the target of an operation performed by
308d74802adSMauro Carvalho Chehabsome other process (so signalling and suchlike still work correctly).
309d74802adSMauro Carvalho Chehab
310d74802adSMauro Carvalho Chehab
311d74802adSMauro Carvalho ChehabWhen the CacheFiles module is asked to bind to its cache, it:
312d74802adSMauro Carvalho Chehab
313d74802adSMauro Carvalho Chehab (1) Finds the security label attached to the root cache directory and uses
314d74802adSMauro Carvalho Chehab     that as the security label with which it will create files.  By default,
315d74802adSMauro Carvalho Chehab     this is::
316d74802adSMauro Carvalho Chehab
317d74802adSMauro Carvalho Chehab	cachefiles_var_t
318d74802adSMauro Carvalho Chehab
319d74802adSMauro Carvalho Chehab (2) Finds the security label of the process which issued the bind request
320d74802adSMauro Carvalho Chehab     (presumed to be the cachefilesd daemon), which by default will be::
321d74802adSMauro Carvalho Chehab
322d74802adSMauro Carvalho Chehab	cachefilesd_t
323d74802adSMauro Carvalho Chehab
324d74802adSMauro Carvalho Chehab     and asks LSM to supply a security ID as which it should act given the
325d74802adSMauro Carvalho Chehab     daemon's label.  By default, this will be::
326d74802adSMauro Carvalho Chehab
327d74802adSMauro Carvalho Chehab	cachefiles_kernel_t
328d74802adSMauro Carvalho Chehab
329d74802adSMauro Carvalho Chehab     SELinux transitions the daemon's security ID to the module's security ID
330d74802adSMauro Carvalho Chehab     based on a rule of this form in the policy::
331d74802adSMauro Carvalho Chehab
332d74802adSMauro Carvalho Chehab	type_transition <daemon's-ID> kernel_t : process <module's-ID>;
333d74802adSMauro Carvalho Chehab
334d74802adSMauro Carvalho Chehab     For instance::
335d74802adSMauro Carvalho Chehab
336d74802adSMauro Carvalho Chehab	type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
337d74802adSMauro Carvalho Chehab
338d74802adSMauro Carvalho Chehab
339d74802adSMauro Carvalho ChehabThe module's security ID gives it permission to create, move and remove files
340d74802adSMauro Carvalho Chehaband directories in the cache, to find and access directories and files in the
341d74802adSMauro Carvalho Chehabcache, to set and access extended attributes on cache objects, and to read and
342d74802adSMauro Carvalho Chehabwrite files in the cache.
343d74802adSMauro Carvalho Chehab
344d74802adSMauro Carvalho ChehabThe daemon's security ID gives it only a very restricted set of permissions: it
345d74802adSMauro Carvalho Chehabmay scan directories, stat files and erase files and directories.  It may
346d74802adSMauro Carvalho Chehabnot read or write files in the cache, and so it is precluded from accessing the
347d74802adSMauro Carvalho Chehabdata cached therein; nor is it permitted to create new files in the cache.
348d74802adSMauro Carvalho Chehab
349d74802adSMauro Carvalho Chehab
350d74802adSMauro Carvalho ChehabThere are policy source files available in:
351d74802adSMauro Carvalho Chehab
3527f01cfb9SAlexander A. Klimov	https://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
353d74802adSMauro Carvalho Chehab
354d74802adSMauro Carvalho Chehaband later versions.  In that tarball, see the files::
355d74802adSMauro Carvalho Chehab
356d74802adSMauro Carvalho Chehab	cachefilesd.te
357d74802adSMauro Carvalho Chehab	cachefilesd.fc
358d74802adSMauro Carvalho Chehab	cachefilesd.if
359d74802adSMauro Carvalho Chehab
360d74802adSMauro Carvalho ChehabThey are built and installed directly by the RPM.
361d74802adSMauro Carvalho Chehab
362d74802adSMauro Carvalho ChehabIf a non-RPM based system is being used, then copy the above files to their own
363d74802adSMauro Carvalho Chehabdirectory and run::
364d74802adSMauro Carvalho Chehab
365d74802adSMauro Carvalho Chehab	make -f /usr/share/selinux/devel/Makefile
366d74802adSMauro Carvalho Chehab	semodule -i cachefilesd.pp
367d74802adSMauro Carvalho Chehab
368d74802adSMauro Carvalho ChehabYou will need checkpolicy and selinux-policy-devel installed prior to the
369d74802adSMauro Carvalho Chehabbuild.
370d74802adSMauro Carvalho Chehab
371d74802adSMauro Carvalho Chehab
372d74802adSMauro Carvalho ChehabBy default, the cache is located in /var/fscache, but if it is desirable that
373d74802adSMauro Carvalho Chehabit should be elsewhere, than either the above policy files must be altered, or
374d74802adSMauro Carvalho Chehaban auxiliary policy must be installed to label the alternate location of the
375d74802adSMauro Carvalho Chehabcache.
376d74802adSMauro Carvalho Chehab
377d74802adSMauro Carvalho ChehabFor instructions on how to add an auxiliary policy to enable the cache to be
378d74802adSMauro Carvalho Chehablocated elsewhere when SELinux is in enforcing mode, please see::
379d74802adSMauro Carvalho Chehab
380d74802adSMauro Carvalho Chehab	/usr/share/doc/cachefilesd-*/move-cache.txt
381d74802adSMauro Carvalho Chehab
382d74802adSMauro Carvalho ChehabWhen the cachefilesd rpm is installed; alternatively, the document can be found
383d74802adSMauro Carvalho Chehabin the sources.
384d74802adSMauro Carvalho Chehab
385d74802adSMauro Carvalho Chehab
386d74802adSMauro Carvalho ChehabA Note on Security
387d74802adSMauro Carvalho Chehab==================
388d74802adSMauro Carvalho Chehab
389d74802adSMauro Carvalho ChehabCacheFiles makes use of the split security in the task_struct.  It allocates
390d74802adSMauro Carvalho Chehabits own task_security structure, and redirects current->cred to point to it
391d74802adSMauro Carvalho Chehabwhen it acts on behalf of another process, in that process's context.
392d74802adSMauro Carvalho Chehab
393d74802adSMauro Carvalho ChehabThe reason it does this is that it calls vfs_mkdir() and suchlike rather than
394d74802adSMauro Carvalho Chehabbypassing security and calling inode ops directly.  Therefore the VFS and LSM
395d74802adSMauro Carvalho Chehabmay deny the CacheFiles access to the cache data because under some
396d74802adSMauro Carvalho Chehabcircumstances the caching code is running in the security context of whatever
397d74802adSMauro Carvalho Chehabprocess issued the original syscall on the netfs.
398d74802adSMauro Carvalho Chehab
399d74802adSMauro Carvalho ChehabFurthermore, should CacheFiles create a file or directory, the security
400d74802adSMauro Carvalho Chehabparameters with that object is created (UID, GID, security label) would be
401d74802adSMauro Carvalho Chehabderived from that process that issued the system call, thus potentially
402d74802adSMauro Carvalho Chehabpreventing other processes from accessing the cache - including CacheFiles's
403d74802adSMauro Carvalho Chehabcache management daemon (cachefilesd).
404d74802adSMauro Carvalho Chehab
405d74802adSMauro Carvalho ChehabWhat is required is to temporarily override the security of the process that
406d74802adSMauro Carvalho Chehabissued the system call.  We can't, however, just do an in-place change of the
407d74802adSMauro Carvalho Chehabsecurity data as that affects the process as an object, not just as a subject.
408d74802adSMauro Carvalho ChehabThis means it may lose signals or ptrace events for example, and affects what
409d74802adSMauro Carvalho Chehabthe process looks like in /proc.
410d74802adSMauro Carvalho Chehab
411d74802adSMauro Carvalho ChehabSo CacheFiles makes use of a logical split in the security between the
412d74802adSMauro Carvalho Chehabobjective security (task->real_cred) and the subjective security (task->cred).
413d74802adSMauro Carvalho ChehabThe objective security holds the intrinsic security properties of a process and
414d74802adSMauro Carvalho Chehabis never overridden.  This is what appears in /proc, and is what is used when a
415d74802adSMauro Carvalho Chehabprocess is the target of an operation by some other process (SIGKILL for
416d74802adSMauro Carvalho Chehabexample).
417d74802adSMauro Carvalho Chehab
418d74802adSMauro Carvalho ChehabThe subjective security holds the active security properties of a process, and
419*d56b699dSBjorn Helgaasmay be overridden.  This is not seen externally, and is used when a process
420d74802adSMauro Carvalho Chehabacts upon another object, for example SIGKILLing another process or opening a
421d74802adSMauro Carvalho Chehabfile.
422d74802adSMauro Carvalho Chehab
423d74802adSMauro Carvalho ChehabLSM hooks exist that allow SELinux (or Smack or whatever) to reject a request
424d74802adSMauro Carvalho Chehabfor CacheFiles to run in a context of a specific security label, or to create
425d74802adSMauro Carvalho Chehabfiles and directories with another security label.
426d74802adSMauro Carvalho Chehab
427d74802adSMauro Carvalho Chehab
428d74802adSMauro Carvalho ChehabStatistical Information
429d74802adSMauro Carvalho Chehab=======================
430d74802adSMauro Carvalho Chehab
431d74802adSMauro Carvalho ChehabIf FS-Cache is compiled with the following option enabled::
432d74802adSMauro Carvalho Chehab
433d74802adSMauro Carvalho Chehab	CONFIG_CACHEFILES_HISTOGRAM=y
434d74802adSMauro Carvalho Chehab
435d74802adSMauro Carvalho Chehabthen it will gather certain statistics and display them through a proc file.
436d74802adSMauro Carvalho Chehab
437d74802adSMauro Carvalho Chehab /proc/fs/cachefiles/histogram
438d74802adSMauro Carvalho Chehab
439d74802adSMauro Carvalho Chehab     ::
440d74802adSMauro Carvalho Chehab
441d74802adSMauro Carvalho Chehab	cat /proc/fs/cachefiles/histogram
442d74802adSMauro Carvalho Chehab	JIFS  SECS  LOOKUPS   MKDIRS    CREATES
443d74802adSMauro Carvalho Chehab	===== ===== ========= ========= =========
444d74802adSMauro Carvalho Chehab
445d74802adSMauro Carvalho Chehab     This shows the breakdown of the number of times each amount of time
446d74802adSMauro Carvalho Chehab     between 0 jiffies and HZ-1 jiffies a variety of tasks took to run.  The
447d74802adSMauro Carvalho Chehab     columns are as follows:
448d74802adSMauro Carvalho Chehab
449d74802adSMauro Carvalho Chehab	=======		=======================================================
450d74802adSMauro Carvalho Chehab	COLUMN		TIME MEASUREMENT
451d74802adSMauro Carvalho Chehab	=======		=======================================================
452d74802adSMauro Carvalho Chehab	LOOKUPS		Length of time to perform a lookup on the backing fs
453d74802adSMauro Carvalho Chehab	MKDIRS		Length of time to perform a mkdir on the backing fs
454d74802adSMauro Carvalho Chehab	CREATES		Length of time to perform a create on the backing fs
455d74802adSMauro Carvalho Chehab	=======		=======================================================
456d74802adSMauro Carvalho Chehab
457d74802adSMauro Carvalho Chehab     Each row shows the number of events that took a particular range of times.
458d74802adSMauro Carvalho Chehab     Each step is 1 jiffy in size.  The JIFS column indicates the particular
459d74802adSMauro Carvalho Chehab     jiffy range covered, and the SECS field the equivalent number of seconds.
460d74802adSMauro Carvalho Chehab
461d74802adSMauro Carvalho Chehab
462d74802adSMauro Carvalho ChehabDebugging
463d74802adSMauro Carvalho Chehab=========
464d74802adSMauro Carvalho Chehab
465d74802adSMauro Carvalho ChehabIf CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime
466d74802adSMauro Carvalho Chehabdebugging enabled by adjusting the value in::
467d74802adSMauro Carvalho Chehab
468d74802adSMauro Carvalho Chehab	/sys/module/cachefiles/parameters/debug
469d74802adSMauro Carvalho Chehab
470d74802adSMauro Carvalho ChehabThis is a bitmask of debugging streams to enable:
471d74802adSMauro Carvalho Chehab
472d74802adSMauro Carvalho Chehab	=======	=======	===============================	=======================
473d74802adSMauro Carvalho Chehab	BIT	VALUE	STREAM				POINT
474d74802adSMauro Carvalho Chehab	=======	=======	===============================	=======================
475d74802adSMauro Carvalho Chehab	0	1	General				Function entry trace
476d74802adSMauro Carvalho Chehab	1	2					Function exit trace
477d74802adSMauro Carvalho Chehab	2	4					General
478d74802adSMauro Carvalho Chehab	=======	=======	===============================	=======================
479d74802adSMauro Carvalho Chehab
480d74802adSMauro Carvalho ChehabThe appropriate set of values should be OR'd together and the result written to
481d74802adSMauro Carvalho Chehabthe control file.  For example::
482d74802adSMauro Carvalho Chehab
483d74802adSMauro Carvalho Chehab	echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
484d74802adSMauro Carvalho Chehab
485d74802adSMauro Carvalho Chehabwill turn on all function entry debugging.
48699302ebdSJeffle Xu
48799302ebdSJeffle Xu
48899302ebdSJeffle XuOn-demand Read
48999302ebdSJeffle Xu==============
49099302ebdSJeffle Xu
49199302ebdSJeffle XuWhen working in its original mode, CacheFiles serves as a local cache for a
49299302ebdSJeffle Xuremote networking fs - while in on-demand read mode, CacheFiles can boost the
49399302ebdSJeffle Xuscenario where on-demand read semantics are needed, e.g. container image
49499302ebdSJeffle Xudistribution.
49599302ebdSJeffle Xu
49699302ebdSJeffle XuThe essential difference between these two modes is seen when a cache miss
49799302ebdSJeffle Xuoccurs: In the original mode, the netfs will fetch the data from the remote
49899302ebdSJeffle Xuserver and then write it to the cache file; in on-demand read mode, fetching
49999302ebdSJeffle Xuthe data and writing it into the cache is delegated to a user daemon.
50099302ebdSJeffle Xu
50199302ebdSJeffle Xu``CONFIG_CACHEFILES_ONDEMAND`` should be enabled to support on-demand read mode.
50299302ebdSJeffle Xu
50399302ebdSJeffle Xu
50499302ebdSJeffle XuProtocol Communication
50599302ebdSJeffle Xu----------------------
50699302ebdSJeffle Xu
50799302ebdSJeffle XuThe on-demand read mode uses a simple protocol for communication between kernel
50899302ebdSJeffle Xuand user daemon. The protocol can be modeled as::
50999302ebdSJeffle Xu
51099302ebdSJeffle Xu	kernel --[request]--> user daemon --[reply]--> kernel
51199302ebdSJeffle Xu
51299302ebdSJeffle XuCacheFiles will send requests to the user daemon when needed.  The user daemon
51399302ebdSJeffle Xushould poll the devnode ('/dev/cachefiles') to check if there's a pending
51499302ebdSJeffle Xurequest to be processed.  A POLLIN event will be returned when there's a pending
51599302ebdSJeffle Xurequest.
51699302ebdSJeffle Xu
51799302ebdSJeffle XuThe user daemon then reads the devnode to fetch a request to process.  It should
51899302ebdSJeffle Xube noted that each read only gets one request. When it has finished processing
51999302ebdSJeffle Xuthe request, the user daemon should write the reply to the devnode.
52099302ebdSJeffle Xu
52199302ebdSJeffle XuEach request starts with a message header of the form::
52299302ebdSJeffle Xu
52399302ebdSJeffle Xu	struct cachefiles_msg {
52499302ebdSJeffle Xu		__u32 msg_id;
52599302ebdSJeffle Xu		__u32 opcode;
52699302ebdSJeffle Xu		__u32 len;
52799302ebdSJeffle Xu		__u32 object_id;
52899302ebdSJeffle Xu		__u8  data[];
52999302ebdSJeffle Xu	};
53099302ebdSJeffle Xu
53199302ebdSJeffle Xuwhere:
53299302ebdSJeffle Xu
53399302ebdSJeffle Xu	* ``msg_id`` is a unique ID identifying this request among all pending
53499302ebdSJeffle Xu	  requests.
53599302ebdSJeffle Xu
53699302ebdSJeffle Xu	* ``opcode`` indicates the type of this request.
53799302ebdSJeffle Xu
53899302ebdSJeffle Xu	* ``object_id`` is a unique ID identifying the cache file operated on.
53999302ebdSJeffle Xu
54099302ebdSJeffle Xu	* ``data`` indicates the payload of this request.
54199302ebdSJeffle Xu
54299302ebdSJeffle Xu	* ``len`` indicates the whole length of this request, including the
54399302ebdSJeffle Xu	  header and following type-specific payload.
54499302ebdSJeffle Xu
54599302ebdSJeffle Xu
54699302ebdSJeffle XuTurning on On-demand Mode
54799302ebdSJeffle Xu-------------------------
54899302ebdSJeffle Xu
54999302ebdSJeffle XuAn optional parameter becomes available to the "bind" command::
55099302ebdSJeffle Xu
55199302ebdSJeffle Xu	bind [ondemand]
55299302ebdSJeffle Xu
55399302ebdSJeffle XuWhen the "bind" command is given no argument, it defaults to the original mode.
55499302ebdSJeffle XuWhen it is given the "ondemand" argument, i.e. "bind ondemand", on-demand read
55599302ebdSJeffle Xumode will be enabled.
55699302ebdSJeffle Xu
55799302ebdSJeffle Xu
55899302ebdSJeffle XuThe OPEN Request
55999302ebdSJeffle Xu----------------
56099302ebdSJeffle Xu
56199302ebdSJeffle XuWhen the netfs opens a cache file for the first time, a request with the
56299302ebdSJeffle XuCACHEFILES_OP_OPEN opcode, a.k.a an OPEN request will be sent to the user
56399302ebdSJeffle Xudaemon.  The payload format is of the form::
56499302ebdSJeffle Xu
56599302ebdSJeffle Xu	struct cachefiles_open {
56699302ebdSJeffle Xu		__u32 volume_key_size;
56799302ebdSJeffle Xu		__u32 cookie_key_size;
56899302ebdSJeffle Xu		__u32 fd;
56999302ebdSJeffle Xu		__u32 flags;
57099302ebdSJeffle Xu		__u8  data[];
57199302ebdSJeffle Xu	};
57299302ebdSJeffle Xu
57399302ebdSJeffle Xuwhere:
57499302ebdSJeffle Xu
57599302ebdSJeffle Xu	* ``data`` contains the volume_key followed directly by the cookie_key.
57699302ebdSJeffle Xu	  The volume key is a NUL-terminated string; the cookie key is binary
57799302ebdSJeffle Xu	  data.
57899302ebdSJeffle Xu
57999302ebdSJeffle Xu	* ``volume_key_size`` indicates the size of the volume key in bytes.
58099302ebdSJeffle Xu
58199302ebdSJeffle Xu	* ``cookie_key_size`` indicates the size of the cookie key in bytes.
58299302ebdSJeffle Xu
58399302ebdSJeffle Xu	* ``fd`` indicates an anonymous fd referring to the cache file, through
58499302ebdSJeffle Xu	  which the user daemon can perform write/llseek file operations on the
58599302ebdSJeffle Xu	  cache file.
58699302ebdSJeffle Xu
58799302ebdSJeffle Xu
58899302ebdSJeffle XuThe user daemon can use the given (volume_key, cookie_key) pair to distinguish
58999302ebdSJeffle Xuthe requested cache file.  With the given anonymous fd, the user daemon can
59099302ebdSJeffle Xufetch the data and write it to the cache file in the background, even when
59199302ebdSJeffle Xukernel has not triggered a cache miss yet.
59299302ebdSJeffle Xu
59399302ebdSJeffle XuBe noted that each cache file has a unique object_id, while it may have multiple
59499302ebdSJeffle Xuanonymous fds.  The user daemon may duplicate anonymous fds from the initial
59599302ebdSJeffle Xuanonymous fd indicated by the @fd field through dup().  Thus each object_id can
59699302ebdSJeffle Xube mapped to multiple anonymous fds, while the usr daemon itself needs to
59799302ebdSJeffle Xumaintain the mapping.
59899302ebdSJeffle Xu
59999302ebdSJeffle XuWhen implementing a user daemon, please be careful of RLIMIT_NOFILE,
60099302ebdSJeffle Xu``/proc/sys/fs/nr_open`` and ``/proc/sys/fs/file-max``.  Typically these needn't
60199302ebdSJeffle Xube huge since they're related to the number of open device blobs rather than
60299302ebdSJeffle Xuopen files of each individual filesystem.
60399302ebdSJeffle Xu
60499302ebdSJeffle XuThe user daemon should reply the OPEN request by issuing a "copen" (complete
60599302ebdSJeffle Xuopen) command on the devnode::
60699302ebdSJeffle Xu
60799302ebdSJeffle Xu	copen <msg_id>,<cache_size>
60899302ebdSJeffle Xu
60999302ebdSJeffle Xuwhere:
61099302ebdSJeffle Xu
61199302ebdSJeffle Xu	* ``msg_id`` must match the msg_id field of the OPEN request.
61299302ebdSJeffle Xu
61399302ebdSJeffle Xu	* When >= 0, ``cache_size`` indicates the size of the cache file;
61499302ebdSJeffle Xu	  when < 0, ``cache_size`` indicates any error code encountered by the
61599302ebdSJeffle Xu	  user daemon.
61699302ebdSJeffle Xu
61799302ebdSJeffle Xu
61899302ebdSJeffle XuThe CLOSE Request
61999302ebdSJeffle Xu-----------------
62099302ebdSJeffle Xu
62199302ebdSJeffle XuWhen a cookie withdrawn, a CLOSE request (opcode CACHEFILES_OP_CLOSE) will be
62299302ebdSJeffle Xusent to the user daemon.  This tells the user daemon to close all anonymous fds
62399302ebdSJeffle Xuassociated with the given object_id.  The CLOSE request has no extra payload,
62499302ebdSJeffle Xuand shouldn't be replied.
62599302ebdSJeffle Xu
62699302ebdSJeffle Xu
62799302ebdSJeffle XuThe READ Request
62899302ebdSJeffle Xu----------------
62999302ebdSJeffle Xu
63099302ebdSJeffle XuWhen a cache miss is encountered in on-demand read mode, CacheFiles will send a
63199302ebdSJeffle XuREAD request (opcode CACHEFILES_OP_READ) to the user daemon. This tells the user
63299302ebdSJeffle Xudaemon to fetch the contents of the requested file range.  The payload is of the
63399302ebdSJeffle Xuform::
63499302ebdSJeffle Xu
63599302ebdSJeffle Xu	struct cachefiles_read {
63699302ebdSJeffle Xu		__u64 off;
63799302ebdSJeffle Xu		__u64 len;
63899302ebdSJeffle Xu	};
63999302ebdSJeffle Xu
64099302ebdSJeffle Xuwhere:
64199302ebdSJeffle Xu
64299302ebdSJeffle Xu	* ``off`` indicates the starting offset of the requested file range.
64399302ebdSJeffle Xu
64499302ebdSJeffle Xu	* ``len`` indicates the length of the requested file range.
64599302ebdSJeffle Xu
64699302ebdSJeffle Xu
64799302ebdSJeffle XuWhen it receives a READ request, the user daemon should fetch the requested data
64899302ebdSJeffle Xuand write it to the cache file identified by object_id.
64999302ebdSJeffle Xu
65099302ebdSJeffle XuWhen it has finished processing the READ request, the user daemon should reply
65199302ebdSJeffle Xuby using the CACHEFILES_IOC_READ_COMPLETE ioctl on one of the anonymous fds
65299302ebdSJeffle Xuassociated with the object_id given in the READ request.  The ioctl is of the
65399302ebdSJeffle Xuform::
65499302ebdSJeffle Xu
65599302ebdSJeffle Xu	ioctl(fd, CACHEFILES_IOC_READ_COMPLETE, msg_id);
65699302ebdSJeffle Xu
65799302ebdSJeffle Xuwhere:
65899302ebdSJeffle Xu
65999302ebdSJeffle Xu	* ``fd`` is one of the anonymous fds associated with the object_id
66099302ebdSJeffle Xu	  given.
66199302ebdSJeffle Xu
66299302ebdSJeffle Xu	* ``msg_id`` must match the msg_id field of the READ request.
663