1===================== 2Linux Filesystems API 3===================== 4 5The Linux VFS 6============= 7 8The Filesystem types 9-------------------- 10 11.. kernel-doc:: include/linux/fs.h 12 :internal: 13 14The Directory Cache 15------------------- 16 17.. kernel-doc:: fs/dcache.c 18 :export: 19 20.. kernel-doc:: include/linux/dcache.h 21 :internal: 22 23Inode Handling 24-------------- 25 26.. kernel-doc:: fs/inode.c 27 :export: 28 29.. kernel-doc:: fs/bad_inode.c 30 :export: 31 32Registration and Superblocks 33---------------------------- 34 35.. kernel-doc:: fs/super.c 36 :export: 37 38File Locks 39---------- 40 41.. kernel-doc:: fs/locks.c 42 :export: 43 44.. kernel-doc:: fs/locks.c 45 :internal: 46 47Other Functions 48--------------- 49 50.. kernel-doc:: fs/mpage.c 51 :export: 52 53.. kernel-doc:: fs/namei.c 54 :export: 55 56.. kernel-doc:: fs/buffer.c 57 :export: 58 59.. kernel-doc:: block/bio.c 60 :export: 61 62.. kernel-doc:: fs/seq_file.c 63 :export: 64 65.. kernel-doc:: fs/filesystems.c 66 :export: 67 68.. kernel-doc:: fs/fs-writeback.c 69 :export: 70 71.. kernel-doc:: fs/block_dev.c 72 :export: 73 74The proc filesystem 75=================== 76 77sysctl interface 78---------------- 79 80.. kernel-doc:: kernel/sysctl.c 81 :export: 82 83proc filesystem interface 84------------------------- 85 86.. kernel-doc:: fs/proc/base.c 87 :internal: 88 89Events based on file descriptors 90================================ 91 92.. kernel-doc:: fs/eventfd.c 93 :export: 94 95The Filesystem for Exporting Kernel Objects 96=========================================== 97 98.. kernel-doc:: fs/sysfs/file.c 99 :export: 100 101.. kernel-doc:: fs/sysfs/symlink.c 102 :export: 103 104The debugfs filesystem 105====================== 106 107debugfs interface 108----------------- 109 110.. kernel-doc:: fs/debugfs/inode.c 111 :export: 112 113.. kernel-doc:: fs/debugfs/file.c 114 :export: 115 116The Linux Journalling API 117========================= 118 119Overview 120-------- 121 122Details 123~~~~~~~ 124 125The journalling layer is easy to use. You need to first of all create a 126journal_t data structure. There are two calls to do this dependent on 127how you decide to allocate the physical media on which the journal 128resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in 129filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used 130for journal stored on a raw device (in a continuous range of blocks). A 131journal_t is a typedef for a struct pointer, so when you are finally 132finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up 133any used kernel memory. 134 135Once you have got your journal_t object you need to 'mount' or load the 136journal file. The journalling layer expects the space for the journal 137was already allocated and initialized properly by the userspace tools. 138When loading the journal you must call :c:func:`jbd2_journal_load` to process 139journal contents. If the client file system detects the journal contents 140does not need to be processed (or even need not have valid contents), it 141may call :c:func:`jbd2_journal_wipe` to clear the journal contents before 142calling :c:func:`jbd2_journal_load`. 143 144Note that jbd2_journal_wipe(..,0) calls 145:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding 146transactions in the journal and similarly :c:func:`jbd2_journal_load` will 147call :c:func:`jbd2_journal_recover` if necessary. I would advise reading 148:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage. 149 150Now you can go ahead and start modifying the underlying filesystem. 151Almost. 152 153You still need to actually journal your filesystem changes, this is done 154by wrapping them into transactions. Additionally you also need to wrap 155the modification of each of the buffers with calls to the journal layer, 156so it knows what the modifications you are actually making are. To do 157this use :c:func:`jbd2_journal_start` which returns a transaction handle. 158 159:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`, 160which indicates the end of a transaction are nestable calls, so you can 161reenter a transaction if necessary, but remember you must call 162:c:func:`jbd2_journal_stop` the same number of times as 163:c:func:`jbd2_journal_start` before the transaction is completed (or more 164accurately leaves the update phase). Ext4/VFS makes use of this feature to 165simplify handling of inode dirtying, quota support, etc. 166 167Inside each transaction you need to wrap the modifications to the 168individual buffers (blocks). Before you start to modify a buffer you 169need to call :c:func:`jbd2_journal_get_create_access()` / 170:c:func:`jbd2_journal_get_write_access()` / 171:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the 172journalling layer to copy the unmodified 173data if it needs to. After all the buffer may be part of a previously 174uncommitted transaction. At this point you are at last ready to modify a 175buffer, and once you are have done so you need to call 176:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a 177buffer you now know is now longer required to be pushed back on the 178device you can call :c:func:`jbd2_journal_forget` in much the same way as you 179might have used :c:func:`bforget` in the past. 180 181A :c:func:`jbd2_journal_flush` may be called at any time to commit and 182checkpoint all your transactions. 183 184Then at umount time , in your :c:func:`put_super` you can then call 185:c:func:`jbd2_journal_destroy` to clean up your in-core journal object. 186 187Unfortunately there a couple of ways the journal layer can cause a 188deadlock. The first thing to note is that each task can only have a 189single outstanding transaction at any one time, remember nothing commits 190until the outermost :c:func:`jbd2_journal_stop`. This means you must complete 191the transaction at the end of each file/inode/address etc. operation you 192perform, so that the journalling system isn't re-entered on another 193journal. Since transactions can't be nested/batched across differing 194journals, and another filesystem other than yours (say ext4) may be 195modified in a later syscall. 196 197The second case to bear in mind is that :c:func:`jbd2_journal_start` can block 198if there isn't enough space in the journal for your transaction (based 199on the passed nblocks param) - when it blocks it merely(!) needs to wait 200for transactions to complete and be committed from other tasks, so 201essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid 202deadlocks you must treat :c:func:`jbd2_journal_start` / 203:c:func:`jbd2_journal_stop` as if they were semaphores and include them in 204your semaphore ordering rules to prevent 205deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking 206behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as 207easily as on :c:func:`jbd2_journal_start`. 208 209Try to reserve the right number of blocks the first time. ;-). This will 210be the maximum number of blocks you are going to touch in this 211transaction. I advise having a look at at least ext4_jbd.h to see the 212basis on which ext4 uses to make these decisions. 213 214Another wriggle to watch out for is your on-disk block allocation 215strategy. Why? Because, if you do a delete, you need to ensure you 216haven't reused any of the freed blocks until the transaction freeing 217these blocks commits. If you reused these blocks and crash happens, 218there is no way to restore the contents of the reallocated blocks at the 219end of the last fully committed transaction. One simple way of doing 220this is to mark blocks as free in internal in-memory block allocation 221structures only after the transaction freeing them commits. Ext4 uses 222journal commit callback for this purpose. 223 224With journal commit callbacks you can ask the journalling layer to call 225a callback function when the transaction is finally committed to disk, 226so that you can do some of your own management. You ask the journalling 227layer for calling the callback by simply setting 228``journal->j_commit_callback`` function pointer and that function is 229called after each transaction commit. You can also use 230``transaction->t_private_list`` for attaching entries to a transaction 231that need processing when the transaction commits. 232 233JBD2 also provides a way to block all transaction updates via 234:c:func:`jbd2_journal_lock_updates()` / 235:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a 236window with a clean and stable fs for a moment. E.g. 237 238:: 239 240 241 jbd2_journal_lock_updates() //stop new stuff happening.. 242 jbd2_journal_flush() // checkpoint everything. 243 ..do stuff on stable fs 244 jbd2_journal_unlock_updates() // carry on with filesystem use. 245 246The opportunities for abuse and DOS attacks with this should be obvious, 247if you allow unprivileged userspace to trigger codepaths containing 248these calls. 249 250Summary 251~~~~~~~ 252 253Using the journal is a matter of wrapping the different context changes, 254being each mount, each modification (transaction) and each changed 255buffer to tell the journalling layer about them. 256 257Data Types 258---------- 259 260The journalling layer uses typedefs to 'hide' the concrete definitions 261of the structures used. As a client of the JBD2 layer you can just rely 262on the using the pointer as a magic cookie of some sort. Obviously the 263hiding is not enforced as this is 'C'. 264 265Structures 266~~~~~~~~~~ 267 268.. kernel-doc:: include/linux/jbd2.h 269 :internal: 270 271Functions 272--------- 273 274The functions here are split into two groups those that affect a journal 275as a whole, and those which are used to manage transactions 276 277Journal Level 278~~~~~~~~~~~~~ 279 280.. kernel-doc:: fs/jbd2/journal.c 281 :export: 282 283.. kernel-doc:: fs/jbd2/recovery.c 284 :internal: 285 286Transasction Level 287~~~~~~~~~~~~~~~~~~ 288 289.. kernel-doc:: fs/jbd2/transaction.c 290 291See also 292-------- 293 294`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen 295Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__ 296 297`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen 298Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__ 299 300splice API 301========== 302 303splice is a method for moving blocks of data around inside the kernel, 304without continually transferring them between the kernel and user space. 305 306.. kernel-doc:: fs/splice.c 307 308pipes API 309========= 310 311Pipe interfaces are all for in-kernel (builtin image) use. They are not 312exported for use by modules. 313 314.. kernel-doc:: include/linux/pipe_fs_i.h 315 :internal: 316 317.. kernel-doc:: fs/pipe.c 318