1===================== 2Linux Filesystems API 3===================== 4 5The Linux VFS 6============= 7 8The Filesystem types 9-------------------- 10 11.. kernel-doc:: include/linux/fs.h 12 :internal: 13 14The Directory Cache 15------------------- 16 17.. kernel-doc:: fs/dcache.c 18 :export: 19 20.. kernel-doc:: include/linux/dcache.h 21 :internal: 22 23Inode Handling 24-------------- 25 26.. kernel-doc:: fs/inode.c 27 :export: 28 29.. kernel-doc:: fs/bad_inode.c 30 :export: 31 32Registration and Superblocks 33---------------------------- 34 35.. kernel-doc:: fs/super.c 36 :export: 37 38File Locks 39---------- 40 41.. kernel-doc:: fs/locks.c 42 :export: 43 44.. kernel-doc:: fs/locks.c 45 :internal: 46 47Other Functions 48--------------- 49 50.. kernel-doc:: fs/mpage.c 51 :export: 52 53.. kernel-doc:: fs/namei.c 54 :export: 55 56.. kernel-doc:: fs/buffer.c 57 :export: 58 59.. kernel-doc:: block/bio.c 60 :export: 61 62.. kernel-doc:: fs/seq_file.c 63 :export: 64 65.. kernel-doc:: fs/filesystems.c 66 :export: 67 68.. kernel-doc:: fs/fs-writeback.c 69 :export: 70 71.. kernel-doc:: fs/block_dev.c 72 :export: 73 74.. kernel-doc:: fs/anon_inodes.c 75 :export: 76 77.. kernel-doc:: fs/attr.c 78 :export: 79 80.. kernel-doc:: fs/d_path.c 81 :export: 82 83.. kernel-doc:: fs/dax.c 84 :export: 85 86.. kernel-doc:: fs/direct-io.c 87 :export: 88 89.. kernel-doc:: fs/file_table.c 90 :export: 91 92.. kernel-doc:: fs/libfs.c 93 :export: 94 95.. kernel-doc:: fs/posix_acl.c 96 :export: 97 98.. kernel-doc:: fs/stat.c 99 :export: 100 101.. kernel-doc:: fs/sync.c 102 :export: 103 104.. kernel-doc:: fs/xattr.c 105 :export: 106 107The proc filesystem 108=================== 109 110sysctl interface 111---------------- 112 113.. kernel-doc:: kernel/sysctl.c 114 :export: 115 116proc filesystem interface 117------------------------- 118 119.. kernel-doc:: fs/proc/base.c 120 :internal: 121 122Events based on file descriptors 123================================ 124 125.. kernel-doc:: fs/eventfd.c 126 :export: 127 128The Filesystem for Exporting Kernel Objects 129=========================================== 130 131.. kernel-doc:: fs/sysfs/file.c 132 :export: 133 134.. kernel-doc:: fs/sysfs/symlink.c 135 :export: 136 137The debugfs filesystem 138====================== 139 140debugfs interface 141----------------- 142 143.. kernel-doc:: fs/debugfs/inode.c 144 :export: 145 146.. kernel-doc:: fs/debugfs/file.c 147 :export: 148 149The Linux Journalling API 150========================= 151 152Overview 153-------- 154 155Details 156~~~~~~~ 157 158The journalling layer is easy to use. You need to first of all create a 159journal_t data structure. There are two calls to do this dependent on 160how you decide to allocate the physical media on which the journal 161resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in 162filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used 163for journal stored on a raw device (in a continuous range of blocks). A 164journal_t is a typedef for a struct pointer, so when you are finally 165finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up 166any used kernel memory. 167 168Once you have got your journal_t object you need to 'mount' or load the 169journal file. The journalling layer expects the space for the journal 170was already allocated and initialized properly by the userspace tools. 171When loading the journal you must call :c:func:`jbd2_journal_load` to process 172journal contents. If the client file system detects the journal contents 173does not need to be processed (or even need not have valid contents), it 174may call :c:func:`jbd2_journal_wipe` to clear the journal contents before 175calling :c:func:`jbd2_journal_load`. 176 177Note that jbd2_journal_wipe(..,0) calls 178:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding 179transactions in the journal and similarly :c:func:`jbd2_journal_load` will 180call :c:func:`jbd2_journal_recover` if necessary. I would advise reading 181:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage. 182 183Now you can go ahead and start modifying the underlying filesystem. 184Almost. 185 186You still need to actually journal your filesystem changes, this is done 187by wrapping them into transactions. Additionally you also need to wrap 188the modification of each of the buffers with calls to the journal layer, 189so it knows what the modifications you are actually making are. To do 190this use :c:func:`jbd2_journal_start` which returns a transaction handle. 191 192:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`, 193which indicates the end of a transaction are nestable calls, so you can 194reenter a transaction if necessary, but remember you must call 195:c:func:`jbd2_journal_stop` the same number of times as 196:c:func:`jbd2_journal_start` before the transaction is completed (or more 197accurately leaves the update phase). Ext4/VFS makes use of this feature to 198simplify handling of inode dirtying, quota support, etc. 199 200Inside each transaction you need to wrap the modifications to the 201individual buffers (blocks). Before you start to modify a buffer you 202need to call :c:func:`jbd2_journal_get_create_access()` / 203:c:func:`jbd2_journal_get_write_access()` / 204:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the 205journalling layer to copy the unmodified 206data if it needs to. After all the buffer may be part of a previously 207uncommitted transaction. At this point you are at last ready to modify a 208buffer, and once you are have done so you need to call 209:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a 210buffer you now know is now longer required to be pushed back on the 211device you can call :c:func:`jbd2_journal_forget` in much the same way as you 212might have used :c:func:`bforget` in the past. 213 214A :c:func:`jbd2_journal_flush` may be called at any time to commit and 215checkpoint all your transactions. 216 217Then at umount time , in your :c:func:`put_super` you can then call 218:c:func:`jbd2_journal_destroy` to clean up your in-core journal object. 219 220Unfortunately there a couple of ways the journal layer can cause a 221deadlock. The first thing to note is that each task can only have a 222single outstanding transaction at any one time, remember nothing commits 223until the outermost :c:func:`jbd2_journal_stop`. This means you must complete 224the transaction at the end of each file/inode/address etc. operation you 225perform, so that the journalling system isn't re-entered on another 226journal. Since transactions can't be nested/batched across differing 227journals, and another filesystem other than yours (say ext4) may be 228modified in a later syscall. 229 230The second case to bear in mind is that :c:func:`jbd2_journal_start` can block 231if there isn't enough space in the journal for your transaction (based 232on the passed nblocks param) - when it blocks it merely(!) needs to wait 233for transactions to complete and be committed from other tasks, so 234essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid 235deadlocks you must treat :c:func:`jbd2_journal_start` / 236:c:func:`jbd2_journal_stop` as if they were semaphores and include them in 237your semaphore ordering rules to prevent 238deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking 239behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as 240easily as on :c:func:`jbd2_journal_start`. 241 242Try to reserve the right number of blocks the first time. ;-). This will 243be the maximum number of blocks you are going to touch in this 244transaction. I advise having a look at at least ext4_jbd.h to see the 245basis on which ext4 uses to make these decisions. 246 247Another wriggle to watch out for is your on-disk block allocation 248strategy. Why? Because, if you do a delete, you need to ensure you 249haven't reused any of the freed blocks until the transaction freeing 250these blocks commits. If you reused these blocks and crash happens, 251there is no way to restore the contents of the reallocated blocks at the 252end of the last fully committed transaction. One simple way of doing 253this is to mark blocks as free in internal in-memory block allocation 254structures only after the transaction freeing them commits. Ext4 uses 255journal commit callback for this purpose. 256 257With journal commit callbacks you can ask the journalling layer to call 258a callback function when the transaction is finally committed to disk, 259so that you can do some of your own management. You ask the journalling 260layer for calling the callback by simply setting 261``journal->j_commit_callback`` function pointer and that function is 262called after each transaction commit. You can also use 263``transaction->t_private_list`` for attaching entries to a transaction 264that need processing when the transaction commits. 265 266JBD2 also provides a way to block all transaction updates via 267:c:func:`jbd2_journal_lock_updates()` / 268:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a 269window with a clean and stable fs for a moment. E.g. 270 271:: 272 273 274 jbd2_journal_lock_updates() //stop new stuff happening.. 275 jbd2_journal_flush() // checkpoint everything. 276 ..do stuff on stable fs 277 jbd2_journal_unlock_updates() // carry on with filesystem use. 278 279The opportunities for abuse and DOS attacks with this should be obvious, 280if you allow unprivileged userspace to trigger codepaths containing 281these calls. 282 283Summary 284~~~~~~~ 285 286Using the journal is a matter of wrapping the different context changes, 287being each mount, each modification (transaction) and each changed 288buffer to tell the journalling layer about them. 289 290Data Types 291---------- 292 293The journalling layer uses typedefs to 'hide' the concrete definitions 294of the structures used. As a client of the JBD2 layer you can just rely 295on the using the pointer as a magic cookie of some sort. Obviously the 296hiding is not enforced as this is 'C'. 297 298Structures 299~~~~~~~~~~ 300 301.. kernel-doc:: include/linux/jbd2.h 302 :internal: 303 304Functions 305--------- 306 307The functions here are split into two groups those that affect a journal 308as a whole, and those which are used to manage transactions 309 310Journal Level 311~~~~~~~~~~~~~ 312 313.. kernel-doc:: fs/jbd2/journal.c 314 :export: 315 316.. kernel-doc:: fs/jbd2/recovery.c 317 :internal: 318 319Transasction Level 320~~~~~~~~~~~~~~~~~~ 321 322.. kernel-doc:: fs/jbd2/transaction.c 323 324See also 325-------- 326 327`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen 328Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__ 329 330`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen 331Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__ 332 333splice API 334========== 335 336splice is a method for moving blocks of data around inside the kernel, 337without continually transferring them between the kernel and user space. 338 339.. kernel-doc:: fs/splice.c 340 341pipes API 342========= 343 344Pipe interfaces are all for in-kernel (builtin image) use. They are not 345exported for use by modules. 346 347.. kernel-doc:: include/linux/pipe_fs_i.h 348 :internal: 349 350.. kernel-doc:: fs/pipe.c 351 352Encryption API 353============== 354 355A library which filesystems can hook into to support transparent 356encryption of files and directories. 357 358.. toctree:: 359 :maxdepth: 2 360 361 fscrypt 362