1.. SPDX-License-Identifier: GPL-2.0 2 3=================================== 4Cache on Already Mounted Filesystem 5=================================== 6 7.. Contents: 8 9 (*) Overview. 10 11 (*) Requirements. 12 13 (*) Configuration. 14 15 (*) Starting the cache. 16 17 (*) Things to avoid. 18 19 (*) Cache culling. 20 21 (*) Cache structure. 22 23 (*) Security model and SELinux. 24 25 (*) A note on security. 26 27 (*) Statistical information. 28 29 (*) Debugging. 30 31 32 33Overview 34======== 35 36CacheFiles is a caching backend that's meant to use as a cache a directory on 37an already mounted filesystem of a local type (such as Ext3). 38 39CacheFiles uses a userspace daemon to do some of the cache management - such as 40reaping stale nodes and culling. This is called cachefilesd and lives in 41/sbin. 42 43The filesystem and data integrity of the cache are only as good as those of the 44filesystem providing the backing services. Note that CacheFiles does not 45attempt to journal anything since the journalling interfaces of the various 46filesystems are very specific in nature. 47 48CacheFiles creates a misc character device - "/dev/cachefiles" - that is used 49to communication with the daemon. Only one thing may have this open at once, 50and while it is open, a cache is at least partially in existence. The daemon 51opens this and sends commands down it to control the cache. 52 53CacheFiles is currently limited to a single cache. 54 55CacheFiles attempts to maintain at least a certain percentage of free space on 56the filesystem, shrinking the cache by culling the objects it contains to make 57space if necessary - see the "Cache Culling" section. This means it can be 58placed on the same medium as a live set of data, and will expand to make use of 59spare space and automatically contract when the set of data requires more 60space. 61 62 63 64Requirements 65============ 66 67The use of CacheFiles and its daemon requires the following features to be 68available in the system and in the cache filesystem: 69 70 - dnotify. 71 72 - extended attributes (xattrs). 73 74 - openat() and friends. 75 76 - bmap() support on files in the filesystem (FIBMAP ioctl). 77 78 - The use of bmap() to detect a partial page at the end of the file. 79 80It is strongly recommended that the "dir_index" option is enabled on Ext3 81filesystems being used as a cache. 82 83 84Configuration 85============= 86 87The cache is configured by a script in /etc/cachefilesd.conf. These commands 88set up cache ready for use. The following script commands are available: 89 90 brun <N>%, bcull <N>%, bstop <N>%, frun <N>%, fcull <N>%, fstop <N>% 91 Configure the culling limits. Optional. See the section on culling 92 The defaults are 7% (run), 5% (cull) and 1% (stop) respectively. 93 94 The commands beginning with a 'b' are file space (block) limits, those 95 beginning with an 'f' are file count limits. 96 97 dir <path> 98 Specify the directory containing the root of the cache. Mandatory. 99 100 tag <name> 101 Specify a tag to FS-Cache to use in distinguishing multiple caches. 102 Optional. The default is "CacheFiles". 103 104 debug <mask> 105 Specify a numeric bitmask to control debugging in the kernel module. 106 Optional. The default is zero (all off). The following values can be 107 OR'd into the mask to collect various information: 108 109 == ================================================= 110 1 Turn on trace of function entry (_enter() macros) 111 2 Turn on trace of function exit (_leave() macros) 112 4 Turn on trace of internal debug points (_debug()) 113 == ================================================= 114 115 This mask can also be set through sysfs, eg:: 116 117 echo 5 >/sys/modules/cachefiles/parameters/debug 118 119 120Starting the Cache 121================== 122 123The cache is started by running the daemon. The daemon opens the cache device, 124configures the cache and tells it to begin caching. At that point the cache 125binds to fscache and the cache becomes live. 126 127The daemon is run as follows:: 128 129 /sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>] 130 131The flags are: 132 133 ``-d`` 134 Increase the debugging level. This can be specified multiple times and 135 is cumulative with itself. 136 137 ``-s`` 138 Send messages to stderr instead of syslog. 139 140 ``-n`` 141 Don't daemonise and go into background. 142 143 ``-f <configfile>`` 144 Use an alternative configuration file rather than the default one. 145 146 147Things to Avoid 148=============== 149 150Do not mount other things within the cache as this will cause problems. The 151kernel module contains its own very cut-down path walking facility that ignores 152mountpoints, but the daemon can't avoid them. 153 154Do not create, rename or unlink files and directories in the cache while the 155cache is active, as this may cause the state to become uncertain. 156 157Renaming files in the cache might make objects appear to be other objects (the 158filename is part of the lookup key). 159 160Do not change or remove the extended attributes attached to cache files by the 161cache as this will cause the cache state management to get confused. 162 163Do not create files or directories in the cache, lest the cache get confused or 164serve incorrect data. 165 166Do not chmod files in the cache. The module creates things with minimal 167permissions to prevent random users being able to access them directly. 168 169 170Cache Culling 171============= 172 173The cache may need culling occasionally to make space. This involves 174discarding objects from the cache that have been used less recently than 175anything else. Culling is based on the access time of data objects. Empty 176directories are culled if not in use. 177 178Cache culling is done on the basis of the percentage of blocks and the 179percentage of files available in the underlying filesystem. There are six 180"limits": 181 182 brun, frun 183 If the amount of free space and the number of available files in the cache 184 rises above both these limits, then culling is turned off. 185 186 bcull, fcull 187 If the amount of available space or the number of available files in the 188 cache falls below either of these limits, then culling is started. 189 190 bstop, fstop 191 If the amount of available space or the number of available files in the 192 cache falls below either of these limits, then no further allocation of 193 disk space or files is permitted until culling has raised things above 194 these limits again. 195 196These must be configured thusly:: 197 198 0 <= bstop < bcull < brun < 100 199 0 <= fstop < fcull < frun < 100 200 201Note that these are percentages of available space and available files, and do 202_not_ appear as 100 minus the percentage displayed by the "df" program. 203 204The userspace daemon scans the cache to build up a table of cullable objects. 205These are then culled in least recently used order. A new scan of the cache is 206started as soon as space is made in the table. Objects will be skipped if 207their atimes have changed or if the kernel module says it is still using them. 208 209 210Cache Structure 211=============== 212 213The CacheFiles module will create two directories in the directory it was 214given: 215 216 * cache/ 217 * graveyard/ 218 219The active cache objects all reside in the first directory. The CacheFiles 220kernel module moves any retired or culled objects that it can't simply unlink 221to the graveyard from which the daemon will actually delete them. 222 223The daemon uses dnotify to monitor the graveyard directory, and will delete 224anything that appears therein. 225 226 227The module represents index objects as directories with the filename "I..." or 228"J...". Note that the "cache/" directory is itself a special index. 229 230Data objects are represented as files if they have no children, or directories 231if they do. Their filenames all begin "D..." or "E...". If represented as a 232directory, data objects will have a file in the directory called "data" that 233actually holds the data. 234 235Special objects are similar to data objects, except their filenames begin 236"S..." or "T...". 237 238 239If an object has children, then it will be represented as a directory. 240Immediately in the representative directory are a collection of directories 241named for hash values of the child object keys with an '@' prepended. Into 242this directory, if possible, will be placed the representations of the child 243objects:: 244 245 /INDEX /INDEX /INDEX /DATA FILES 246 /=========/==========/=================================/================ 247 cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400 248 cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry 249 cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry 250 cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry 251 252 253If the key is so long that it exceeds NAME_MAX with the decorations added on to 254it, then it will be cut into pieces, the first few of which will be used to 255make a nest of directories, and the last one of which will be the objects 256inside the last directory. The names of the intermediate directories will have 257'+' prepended:: 258 259 J1223/@23/+xy...z/+kl...m/Epqr 260 261 262Note that keys are raw data, and not only may they exceed NAME_MAX in size, 263they may also contain things like '/' and NUL characters, and so they may not 264be suitable for turning directly into a filename. 265 266To handle this, CacheFiles will use a suitably printable filename directly and 267"base-64" encode ones that aren't directly suitable. The two versions of 268object filenames indicate the encoding: 269 270 =============== =============== =============== 271 OBJECT TYPE PRINTABLE ENCODED 272 =============== =============== =============== 273 Index "I..." "J..." 274 Data "D..." "E..." 275 Special "S..." "T..." 276 =============== =============== =============== 277 278Intermediate directories are always "@" or "+" as appropriate. 279 280 281Each object in the cache has an extended attribute label that holds the object 282type ID (required to distinguish special objects) and the auxiliary data from 283the netfs. The latter is used to detect stale objects in the cache and update 284or retire them. 285 286 287Note that CacheFiles will erase from the cache any file it doesn't recognise or 288any file of an incorrect type (such as a FIFO file or a device file). 289 290 291Security Model and SELinux 292========================== 293 294CacheFiles is implemented to deal properly with the LSM security features of 295the Linux kernel and the SELinux facility. 296 297One of the problems that CacheFiles faces is that it is generally acting on 298behalf of a process, and running in that process's context, and that includes a 299security context that is not appropriate for accessing the cache - either 300because the files in the cache are inaccessible to that process, or because if 301the process creates a file in the cache, that file may be inaccessible to other 302processes. 303 304The way CacheFiles works is to temporarily change the security context (fsuid, 305fsgid and actor security label) that the process acts as - without changing the 306security context of the process when it the target of an operation performed by 307some other process (so signalling and suchlike still work correctly). 308 309 310When the CacheFiles module is asked to bind to its cache, it: 311 312 (1) Finds the security label attached to the root cache directory and uses 313 that as the security label with which it will create files. By default, 314 this is:: 315 316 cachefiles_var_t 317 318 (2) Finds the security label of the process which issued the bind request 319 (presumed to be the cachefilesd daemon), which by default will be:: 320 321 cachefilesd_t 322 323 and asks LSM to supply a security ID as which it should act given the 324 daemon's label. By default, this will be:: 325 326 cachefiles_kernel_t 327 328 SELinux transitions the daemon's security ID to the module's security ID 329 based on a rule of this form in the policy:: 330 331 type_transition <daemon's-ID> kernel_t : process <module's-ID>; 332 333 For instance:: 334 335 type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t; 336 337 338The module's security ID gives it permission to create, move and remove files 339and directories in the cache, to find and access directories and files in the 340cache, to set and access extended attributes on cache objects, and to read and 341write files in the cache. 342 343The daemon's security ID gives it only a very restricted set of permissions: it 344may scan directories, stat files and erase files and directories. It may 345not read or write files in the cache, and so it is precluded from accessing the 346data cached therein; nor is it permitted to create new files in the cache. 347 348 349There are policy source files available in: 350 351 https://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2 352 353and later versions. In that tarball, see the files:: 354 355 cachefilesd.te 356 cachefilesd.fc 357 cachefilesd.if 358 359They are built and installed directly by the RPM. 360 361If a non-RPM based system is being used, then copy the above files to their own 362directory and run:: 363 364 make -f /usr/share/selinux/devel/Makefile 365 semodule -i cachefilesd.pp 366 367You will need checkpolicy and selinux-policy-devel installed prior to the 368build. 369 370 371By default, the cache is located in /var/fscache, but if it is desirable that 372it should be elsewhere, than either the above policy files must be altered, or 373an auxiliary policy must be installed to label the alternate location of the 374cache. 375 376For instructions on how to add an auxiliary policy to enable the cache to be 377located elsewhere when SELinux is in enforcing mode, please see:: 378 379 /usr/share/doc/cachefilesd-*/move-cache.txt 380 381When the cachefilesd rpm is installed; alternatively, the document can be found 382in the sources. 383 384 385A Note on Security 386================== 387 388CacheFiles makes use of the split security in the task_struct. It allocates 389its own task_security structure, and redirects current->cred to point to it 390when it acts on behalf of another process, in that process's context. 391 392The reason it does this is that it calls vfs_mkdir() and suchlike rather than 393bypassing security and calling inode ops directly. Therefore the VFS and LSM 394may deny the CacheFiles access to the cache data because under some 395circumstances the caching code is running in the security context of whatever 396process issued the original syscall on the netfs. 397 398Furthermore, should CacheFiles create a file or directory, the security 399parameters with that object is created (UID, GID, security label) would be 400derived from that process that issued the system call, thus potentially 401preventing other processes from accessing the cache - including CacheFiles's 402cache management daemon (cachefilesd). 403 404What is required is to temporarily override the security of the process that 405issued the system call. We can't, however, just do an in-place change of the 406security data as that affects the process as an object, not just as a subject. 407This means it may lose signals or ptrace events for example, and affects what 408the process looks like in /proc. 409 410So CacheFiles makes use of a logical split in the security between the 411objective security (task->real_cred) and the subjective security (task->cred). 412The objective security holds the intrinsic security properties of a process and 413is never overridden. This is what appears in /proc, and is what is used when a 414process is the target of an operation by some other process (SIGKILL for 415example). 416 417The subjective security holds the active security properties of a process, and 418may be overridden. This is not seen externally, and is used whan a process 419acts upon another object, for example SIGKILLing another process or opening a 420file. 421 422LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request 423for CacheFiles to run in a context of a specific security label, or to create 424files and directories with another security label. 425 426 427Statistical Information 428======================= 429 430If FS-Cache is compiled with the following option enabled:: 431 432 CONFIG_CACHEFILES_HISTOGRAM=y 433 434then it will gather certain statistics and display them through a proc file. 435 436 /proc/fs/cachefiles/histogram 437 438 :: 439 440 cat /proc/fs/cachefiles/histogram 441 JIFS SECS LOOKUPS MKDIRS CREATES 442 ===== ===== ========= ========= ========= 443 444 This shows the breakdown of the number of times each amount of time 445 between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The 446 columns are as follows: 447 448 ======= ======================================================= 449 COLUMN TIME MEASUREMENT 450 ======= ======================================================= 451 LOOKUPS Length of time to perform a lookup on the backing fs 452 MKDIRS Length of time to perform a mkdir on the backing fs 453 CREATES Length of time to perform a create on the backing fs 454 ======= ======================================================= 455 456 Each row shows the number of events that took a particular range of times. 457 Each step is 1 jiffy in size. The JIFS column indicates the particular 458 jiffy range covered, and the SECS field the equivalent number of seconds. 459 460 461Debugging 462========= 463 464If CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime 465debugging enabled by adjusting the value in:: 466 467 /sys/module/cachefiles/parameters/debug 468 469This is a bitmask of debugging streams to enable: 470 471 ======= ======= =============================== ======================= 472 BIT VALUE STREAM POINT 473 ======= ======= =============================== ======================= 474 0 1 General Function entry trace 475 1 2 Function exit trace 476 2 4 General 477 ======= ======= =============================== ======================= 478 479The appropriate set of values should be OR'd together and the result written to 480the control file. For example:: 481 482 echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug 483 484will turn on all function entry debugging. 485