1.. SPDX-License-Identifier: GPL-2.0 2 3============================== 4Network Filesystem Caching API 5============================== 6 7Fscache provides an API by which a network filesystem can make use of local 8caching facilities. The API is arranged around a number of principles: 9 10 (1) A cache is logically organised into volumes and data storage objects 11 within those volumes. 12 13 (2) Volumes and data storage objects are represented by various types of 14 cookie. 15 16 (3) Cookies have keys that distinguish them from their peers. 17 18 (4) Cookies have coherency data that allows a cache to determine if the 19 cached data is still valid. 20 21 (5) I/O is done asynchronously where possible. 22 23This API is used by:: 24 25 #include <linux/fscache.h>. 26 27.. This document contains the following sections: 28 29 (1) Overview 30 (2) Volume registration 31 (3) Data file registration 32 (4) Declaring a cookie to be in use 33 (5) Resizing a data file (truncation) 34 (6) Data I/O API 35 (7) Data file coherency 36 (8) Data file invalidation 37 (9) Write back resource management 38 (10) Caching of local modifications 39 (11) Page release and invalidation 40 41 42Overview 43======== 44 45The fscache hierarchy is organised on two levels from a network filesystem's 46point of view. The upper level represents "volumes" and the lower level 47represents "data storage objects". These are represented by two types of 48cookie, hereafter referred to as "volume cookies" and "cookies". 49 50A network filesystem acquires a volume cookie for a volume using a volume key, 51which represents all the information that defines that volume (e.g. cell name 52or server address, volume ID or share name). This must be rendered as a 53printable string that can be used as a directory name (ie. no '/' characters 54and shouldn't begin with a '.'). The maximum name length is one less than the 55maximum size of a filename component (allowing the cache backend one char for 56its own purposes). 57 58A filesystem would typically have a volume cookie for each superblock. 59 60The filesystem then acquires a cookie for each file within that volume using an 61object key. Object keys are binary blobs and only need to be unique within 62their parent volume. The cache backend is reponsible for rendering the binary 63blob into something it can use and may employ hash tables, trees or whatever to 64improve its ability to find an object. This is transparent to the network 65filesystem. 66 67A filesystem would typically have a cookie for each inode, and would acquire it 68in iget and relinquish it when evicting the cookie. 69 70Once it has a cookie, the filesystem needs to mark the cookie as being in use. 71This causes fscache to send the cache backend off to look up/create resources 72for the cookie in the background, to check its coherency and, if necessary, to 73mark the object as being under modification. 74 75A filesystem would typically "use" the cookie in its file open routine and 76unuse it in file release and it needs to use the cookie around calls to 77truncate the cookie locally. It *also* needs to use the cookie when the 78pagecache becomes dirty and unuse it when writeback is complete. This is 79slightly tricky, and provision is made for it. 80 81When performing a read, write or resize on a cookie, the filesystem must first 82begin an operation. This copies the resources into a holding struct and puts 83extra pins into the cache to stop cache withdrawal from tearing down the 84structures being used. The actual operation can then be issued and conflicting 85invalidations can be detected upon completion. 86 87The filesystem is expected to use netfslib to access the cache, but that's not 88actually required and it can use the fscache I/O API directly. 89 90 91Volume Registration 92=================== 93 94The first step for a network filsystem is to acquire a volume cookie for the 95volume it wants to access:: 96 97 struct fscache_volume * 98 fscache_acquire_volume(const char *volume_key, 99 const char *cache_name, 100 const void *coherency_data, 101 size_t coherency_len); 102 103This function creates a volume cookie with the specified volume key as its name 104and notes the coherency data. 105 106The volume key must be a printable string with no '/' characters in it. It 107should begin with the name of the filesystem and should be no longer than 254 108characters. It should uniquely represent the volume and will be matched with 109what's stored in the cache. 110 111The caller may also specify the name of the cache to use. If specified, 112fscache will look up or create a cache cookie of that name and will use a cache 113of that name if it is online or comes online. If no cache name is specified, 114it will use the first cache that comes to hand and set the name to that. 115 116The specified coherency data is stored in the cookie and will be matched 117against coherency data stored on disk. The data pointer may be NULL if no data 118is provided. If the coherency data doesn't match, the entire cache volume will 119be invalidated. 120 121This function can return errors such as EBUSY if the volume key is already in 122use by an acquired volume or ENOMEM if an allocation failure occured. It may 123also return a NULL volume cookie if fscache is not enabled. It is safe to 124pass a NULL cookie to any function that takes a volume cookie. This will 125cause that function to do nothing. 126 127 128When the network filesystem has finished with a volume, it should relinquish it 129by calling:: 130 131 void fscache_relinquish_volume(struct fscache_volume *volume, 132 const void *coherency_data, 133 bool invalidate); 134 135This will cause the volume to be committed or removed, and if sealed the 136coherency data will be set to the value supplied. The amount of coherency data 137must match the length specified when the volume was acquired. Note that all 138data cookies obtained in this volume must be relinquished before the volume is 139relinquished. 140 141 142Data File Registration 143====================== 144 145Once it has a volume cookie, a network filesystem can use it to acquire a 146cookie for data storage:: 147 148 struct fscache_cookie * 149 fscache_acquire_cookie(struct fscache_volume *volume, 150 u8 advice, 151 const void *index_key, 152 size_t index_key_len, 153 const void *aux_data, 154 size_t aux_data_len, 155 loff_t object_size) 156 157This creates the cookie in the volume using the specified index key. The index 158key is a binary blob of the given length and must be unique for the volume. 159This is saved into the cookie. There are no restrictions on the content, but 160its length shouldn't exceed about three quarters of the maximum filename length 161to allow for encoding. 162 163The caller should also pass in a piece of coherency data in aux_data. A buffer 164of size aux_data_len will be allocated and the coherency data copied in. It is 165assumed that the size is invariant over time. The coherency data is used to 166check the validity of data in the cache. Functions are provided by which the 167coherency data can be updated. 168 169The file size of the object being cached should also be provided. This may be 170used to trim the data and will be stored with the coherency data. 171 172This function never returns an error, though it may return a NULL cookie on 173allocation failure or if fscache is not enabled. It is safe to pass in a NULL 174volume cookie and pass the NULL cookie returned to any function that takes it. 175This will cause that function to do nothing. 176 177 178When the network filesystem has finished with a cookie, it should relinquish it 179by calling:: 180 181 void fscache_relinquish_cookie(struct fscache_cookie *cookie, 182 bool retire); 183 184This will cause fscache to either commit the storage backing the cookie or 185delete it. 186 187 188Marking A Cookie In-Use 189======================= 190 191Once a cookie has been acquired by a network filesystem, the filesystem should 192tell fscache when it intends to use the cookie (typically done on file open) 193and should say when it has finished with it (typically on file close):: 194 195 void fscache_use_cookie(struct fscache_cookie *cookie, 196 bool will_modify); 197 void fscache_unuse_cookie(struct fscache_cookie *cookie, 198 const void *aux_data, 199 const loff_t *object_size); 200 201The *use* function tells fscache that it will use the cookie and, additionally, 202indicate if the user is intending to modify the contents locally. If not yet 203done, this will trigger the cache backend to go and gather the resources it 204needs to access/store data in the cache. This is done in the background, and 205so may not be complete by the time the function returns. 206 207The *unuse* function indicates that a filesystem has finished using a cookie. 208It optionally updates the stored coherency data and object size and then 209decreases the in-use counter. When the last user unuses the cookie, it is 210scheduled for garbage collection. If not reused within a short time, the 211resources will be released to reduce system resource consumption. 212 213A cookie must be marked in-use before it can be accessed for read, write or 214resize - and an in-use mark must be kept whilst there is dirty data in the 215pagecache in order to avoid an oops due to trying to open a file during process 216exit. 217 218Note that in-use marks are cumulative. For each time a cookie is marked 219in-use, it must be unused. 220 221 222Resizing A Data File (Truncation) 223================================= 224 225If a network filesystem file is resized locally by truncation, the following 226should be called to notify the cache:: 227 228 void fscache_resize_cookie(struct fscache_cookie *cookie, 229 loff_t new_size); 230 231The caller must have first marked the cookie in-use. The cookie and the new 232size are passed in and the cache is synchronously resized. This is expected to 233be called from ``->setattr()`` inode operation under the inode lock. 234 235 236Data I/O API 237============ 238 239To do data I/O operations directly through a cookie, the following functions 240are available:: 241 242 int fscache_begin_read_operation(struct netfs_cache_resources *cres, 243 struct fscache_cookie *cookie); 244 int fscache_read(struct netfs_cache_resources *cres, 245 loff_t start_pos, 246 struct iov_iter *iter, 247 enum netfs_read_from_hole read_hole, 248 netfs_io_terminated_t term_func, 249 void *term_func_priv); 250 int fscache_write(struct netfs_cache_resources *cres, 251 loff_t start_pos, 252 struct iov_iter *iter, 253 netfs_io_terminated_t term_func, 254 void *term_func_priv); 255 256The *begin* function sets up an operation, attaching the resources required to 257the cache resources block from the cookie. Assuming it doesn't return an error 258(for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise do 259nothing), then one of the other two functions can be issued. 260 261The *read* and *write* functions initiate a direct-IO operation. Both take the 262previously set up cache resources block, an indication of the start file 263position, and an I/O iterator that describes buffer and indicates the amount of 264data. 265 266The read function also takes a parameter to indicate how it should handle a 267partially populated region (a hole) in the disk content. This may be to ignore 268it, skip over an initial hole and place zeros in the buffer or give an error. 269 270The read and write functions can be given an optional termination function that 271will be run on completion:: 272 273 typedef 274 void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error, 275 bool was_async); 276 277If a termination function is given, the operation will be run asynchronously 278and the termination function will be called upon completion. If not given, the 279operation will be run synchronously. Note that in the asynchronous case, it is 280possible for the operation to complete before the function returns. 281 282Both the read and write functions end the operation when they complete, 283detaching any pinned resources. 284 285The read operation will fail with ESTALE if invalidation occurred whilst the 286operation was ongoing. 287 288 289Data File Coherency 290=================== 291 292To request an update of the coherency data and file size on a cookie, the 293following should be called:: 294 295 void fscache_update_cookie(struct fscache_cookie *cookie, 296 const void *aux_data, 297 const loff_t *object_size); 298 299This will update the cookie's coherency data and/or file size. 300 301 302Data File Invalidation 303====================== 304 305Sometimes it will be necessary to invalidate an object that contains data. 306Typically this will be necessary when the server informs the network filesystem 307of a remote third-party change - at which point the filesystem has to throw 308away the state and cached data that it had for an file and reload from the 309server. 310 311To indicate that a cache object should be invalidated, the following should be 312called:: 313 314 void fscache_invalidate(struct fscache_cookie *cookie, 315 const void *aux_data, 316 loff_t size, 317 unsigned int flags); 318 319This increases the invalidation counter in the cookie to cause outstanding 320reads to fail with -ESTALE, sets the coherency data and file size from the 321information supplied, blocks new I/O on the cookie and dispatches the cache to 322go and get rid of the old data. 323 324Invalidation runs asynchronously in a worker thread so that it doesn't block 325too much. 326 327 328Write-Back Resource Management 329============================== 330 331To write data to the cache from network filesystem writeback, the cache 332resources required need to be pinned at the point the modification is made (for 333instance when the page is marked dirty) as it's not possible to open a file in 334a thread that's exiting. 335 336The following facilities are provided to manage this: 337 338 * An inode flag, ``I_PINNING_FSCACHE_WB``, is provided to indicate that an 339 in-use is held on the cookie for this inode. It can only be changed if the 340 the inode lock is held. 341 342 * A flag, ``unpinned_fscache_wb`` is placed in the ``writeback_control`` 343 struct that gets set if ``__writeback_single_inode()`` clears 344 ``I_PINNING_FSCACHE_WB`` because all the dirty pages were cleared. 345 346To support this, the following functions are provided:: 347 348 int fscache_set_page_dirty(struct page *page, 349 struct fscache_cookie *cookie); 350 void fscache_unpin_writeback(struct writeback_control *wbc, 351 struct fscache_cookie *cookie); 352 void fscache_clear_inode_writeback(struct fscache_cookie *cookie, 353 struct inode *inode, 354 const void *aux); 355 356The *set* function is intended to be called from the filesystem's 357``set_page_dirty`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not 358set, it sets that flag and increments the use count on the cookie (the caller 359must already have called ``fscache_use_cookie()``). 360 361The *unpin* function is intended to be called from the filesystem's 362``write_inode`` superblock operation. It cleans up after writing by unusing 363the cookie if unpinned_fscache_wb is set in the writeback_control struct. 364 365The *clear* function is intended to be called from the netfs's ``evict_inode`` 366superblock operation. It must be called *after* 367``truncate_inode_pages_final()``, but *before* ``clear_inode()``. This cleans 368up any hanging ``I_PINNING_FSCACHE_WB``. It also allows the coherency data to 369be updated. 370 371 372Caching of Local Modifications 373============================== 374 375If a network filesystem has locally modified data that it wants to write to the 376cache, it needs to mark the pages to indicate that a write is in progress, and 377if the mark is already present, it needs to wait for it to be removed first 378(presumably due to an already in-progress operation). This prevents multiple 379competing DIO writes to the same storage in the cache. 380 381Firstly, the netfs should determine if caching is available by doing something 382like:: 383 384 bool caching = fscache_cookie_enabled(cookie); 385 386If caching is to be attempted, pages should be waited for and then marked using 387the following functions provided by the netfs helper library:: 388 389 void set_page_fscache(struct page *page); 390 void wait_on_page_fscache(struct page *page); 391 int wait_on_page_fscache_killable(struct page *page); 392 393Once all the pages in the span are marked, the netfs can ask fscache to 394schedule a write of that region:: 395 396 void fscache_write_to_cache(struct fscache_cookie *cookie, 397 struct address_space *mapping, 398 loff_t start, size_t len, loff_t i_size, 399 netfs_io_terminated_t term_func, 400 void *term_func_priv, 401 bool caching) 402 403And if an error occurs before that point is reached, the marks can be removed 404by calling:: 405 406 void fscache_clear_page_bits(struct fscache_cookie *cookie, 407 struct address_space *mapping, 408 loff_t start, size_t len, 409 bool caching) 410 411In both of these functions, the cookie representing the cache object to be 412written to and a pointer to the mapping to which the source pages are attached 413are passed in; start and len indicate the size of the region that's going to be 414written (it doesn't have to align to page boundaries necessarily, but it does 415have to align to DIO boundaries on the backing filesystem). The caching 416parameter indicates if caching should be skipped, and if false, the functions 417do nothing. 418 419The write function takes some additional parameters: i_size indicates the size 420of the netfs file and term_func indicates an optional completion function, to 421which term_func_priv will be passed, along with the error or amount written. 422 423Note that the write function will always run asynchronously and will unmark all 424the pages upon completion before calling term_func. 425 426 427Page Release and Invalidation 428============================= 429 430Fscache keeps track of whether we have any data in the cache yet for a cache 431object we've just created. It knows it doesn't have to do any reading until it 432has done a write and then the page it wrote from has been released by the VM, 433after which it *has* to look in the cache. 434 435To inform fscache that a page might now be in the cache, the following function 436should be called from the ``releasepage`` address space op:: 437 438 void fscache_note_page_release(struct fscache_cookie *cookie); 439 440if the page has been released (ie. releasepage returned true). 441 442Page release and page invalidation should also wait for any mark left on the 443page to say that a DIO write is underway from that page:: 444 445 void wait_on_page_fscache(struct page *page); 446 int wait_on_page_fscache_killable(struct page *page); 447 448 449API Function Reference 450====================== 451 452.. kernel-doc:: include/linux/fscache.h 453