14064174bSJonathan CorbetThe Linux Journalling API 24064174bSJonathan Corbet========================= 34064174bSJonathan Corbet 44064174bSJonathan CorbetOverview 54064174bSJonathan Corbet-------- 64064174bSJonathan Corbet 74064174bSJonathan CorbetDetails 84064174bSJonathan Corbet~~~~~~~ 94064174bSJonathan Corbet 104064174bSJonathan CorbetThe journalling layer is easy to use. You need to first of all create a 114064174bSJonathan Corbetjournal_t data structure. There are two calls to do this dependent on 124064174bSJonathan Corbethow you decide to allocate the physical media on which the journal 137caf3e3fSPuranjay Mohanresides. The jbd2_journal_init_inode() call is for journals stored in 147caf3e3fSPuranjay Mohanfilesystem inodes, or the jbd2_journal_init_dev() call can be used 154064174bSJonathan Corbetfor journal stored on a raw device (in a continuous range of blocks). A 164064174bSJonathan Corbetjournal_t is a typedef for a struct pointer, so when you are finally 177caf3e3fSPuranjay Mohanfinished make sure you call jbd2_journal_destroy() on it to free up 184064174bSJonathan Corbetany used kernel memory. 194064174bSJonathan Corbet 204064174bSJonathan CorbetOnce you have got your journal_t object you need to 'mount' or load the 214064174bSJonathan Corbetjournal file. The journalling layer expects the space for the journal 224064174bSJonathan Corbetwas already allocated and initialized properly by the userspace tools. 237caf3e3fSPuranjay MohanWhen loading the journal you must call jbd2_journal_load() to process 244064174bSJonathan Corbetjournal contents. If the client file system detects the journal contents 254064174bSJonathan Corbetdoes not need to be processed (or even need not have valid contents), it 267caf3e3fSPuranjay Mohanmay call jbd2_journal_wipe() to clear the journal contents before 277caf3e3fSPuranjay Mohancalling jbd2_journal_load(). 284064174bSJonathan Corbet 294064174bSJonathan CorbetNote that jbd2_journal_wipe(..,0) calls 307caf3e3fSPuranjay Mohanjbd2_journal_skip_recovery() for you if it detects any outstanding 317caf3e3fSPuranjay Mohantransactions in the journal and similarly jbd2_journal_load() will 327caf3e3fSPuranjay Mohancall jbd2_journal_recover() if necessary. I would advise reading 337caf3e3fSPuranjay Mohanext4_load_journal() in fs/ext4/super.c for examples on this stage. 344064174bSJonathan Corbet 354064174bSJonathan CorbetNow you can go ahead and start modifying the underlying filesystem. 364064174bSJonathan CorbetAlmost. 374064174bSJonathan Corbet 384064174bSJonathan CorbetYou still need to actually journal your filesystem changes, this is done 394064174bSJonathan Corbetby wrapping them into transactions. Additionally you also need to wrap 404064174bSJonathan Corbetthe modification of each of the buffers with calls to the journal layer, 414064174bSJonathan Corbetso it knows what the modifications you are actually making are. To do 427caf3e3fSPuranjay Mohanthis use jbd2_journal_start() which returns a transaction handle. 434064174bSJonathan Corbet 447caf3e3fSPuranjay Mohanjbd2_journal_start() and its counterpart jbd2_journal_stop(), 454064174bSJonathan Corbetwhich indicates the end of a transaction are nestable calls, so you can 464064174bSJonathan Corbetreenter a transaction if necessary, but remember you must call 477caf3e3fSPuranjay Mohanjbd2_journal_stop() the same number of times as 487caf3e3fSPuranjay Mohanjbd2_journal_start() before the transaction is completed (or more 494064174bSJonathan Corbetaccurately leaves the update phase). Ext4/VFS makes use of this feature to 504064174bSJonathan Corbetsimplify handling of inode dirtying, quota support, etc. 514064174bSJonathan Corbet 524064174bSJonathan CorbetInside each transaction you need to wrap the modifications to the 534064174bSJonathan Corbetindividual buffers (blocks). Before you start to modify a buffer you 547caf3e3fSPuranjay Mohanneed to call jbd2_journal_get_create_access() / 557caf3e3fSPuranjay Mohanjbd2_journal_get_write_access() / 567caf3e3fSPuranjay Mohanjbd2_journal_get_undo_access() as appropriate, this allows the 574064174bSJonathan Corbetjournalling layer to copy the unmodified 584064174bSJonathan Corbetdata if it needs to. After all the buffer may be part of a previously 594064174bSJonathan Corbetuncommitted transaction. At this point you are at last ready to modify a 604064174bSJonathan Corbetbuffer, and once you are have done so you need to call 617caf3e3fSPuranjay Mohanjbd2_journal_dirty_metadata(). Or if you've asked for access to a 624064174bSJonathan Corbetbuffer you now know is now longer required to be pushed back on the 637caf3e3fSPuranjay Mohandevice you can call jbd2_journal_forget() in much the same way as you 647caf3e3fSPuranjay Mohanmight have used bforget() in the past. 654064174bSJonathan Corbet 667caf3e3fSPuranjay MohanA jbd2_journal_flush() may be called at any time to commit and 674064174bSJonathan Corbetcheckpoint all your transactions. 684064174bSJonathan Corbet 697caf3e3fSPuranjay MohanThen at umount time , in your put_super() you can then call 707caf3e3fSPuranjay Mohanjbd2_journal_destroy() to clean up your in-core journal object. 714064174bSJonathan Corbet 724064174bSJonathan CorbetUnfortunately there a couple of ways the journal layer can cause a 734064174bSJonathan Corbetdeadlock. The first thing to note is that each task can only have a 744064174bSJonathan Corbetsingle outstanding transaction at any one time, remember nothing commits 757caf3e3fSPuranjay Mohanuntil the outermost jbd2_journal_stop(). This means you must complete 764064174bSJonathan Corbetthe transaction at the end of each file/inode/address etc. operation you 774064174bSJonathan Corbetperform, so that the journalling system isn't re-entered on another 784064174bSJonathan Corbetjournal. Since transactions can't be nested/batched across differing 794064174bSJonathan Corbetjournals, and another filesystem other than yours (say ext4) may be 804064174bSJonathan Corbetmodified in a later syscall. 814064174bSJonathan Corbet 827caf3e3fSPuranjay MohanThe second case to bear in mind is that jbd2_journal_start() can block 834064174bSJonathan Corbetif there isn't enough space in the journal for your transaction (based 844064174bSJonathan Corbeton the passed nblocks param) - when it blocks it merely(!) needs to wait 854064174bSJonathan Corbetfor transactions to complete and be committed from other tasks, so 867caf3e3fSPuranjay Mohanessentially we are waiting for jbd2_journal_stop(). So to avoid 877caf3e3fSPuranjay Mohandeadlocks you must treat jbd2_journal_start() / 887caf3e3fSPuranjay Mohanjbd2_journal_stop() as if they were semaphores and include them in 894064174bSJonathan Corbetyour semaphore ordering rules to prevent 907caf3e3fSPuranjay Mohandeadlocks. Note that jbd2_journal_extend() has similar blocking 917caf3e3fSPuranjay Mohanbehaviour to jbd2_journal_start() so you can deadlock here just as 927caf3e3fSPuranjay Mohaneasily as on jbd2_journal_start(). 934064174bSJonathan Corbet 944064174bSJonathan CorbetTry to reserve the right number of blocks the first time. ;-). This will 954064174bSJonathan Corbetbe the maximum number of blocks you are going to touch in this 964064174bSJonathan Corbettransaction. I advise having a look at at least ext4_jbd.h to see the 974064174bSJonathan Corbetbasis on which ext4 uses to make these decisions. 984064174bSJonathan Corbet 994064174bSJonathan CorbetAnother wriggle to watch out for is your on-disk block allocation 1004064174bSJonathan Corbetstrategy. Why? Because, if you do a delete, you need to ensure you 1014064174bSJonathan Corbethaven't reused any of the freed blocks until the transaction freeing 1024064174bSJonathan Corbetthese blocks commits. If you reused these blocks and crash happens, 1034064174bSJonathan Corbetthere is no way to restore the contents of the reallocated blocks at the 1044064174bSJonathan Corbetend of the last fully committed transaction. One simple way of doing 1054064174bSJonathan Corbetthis is to mark blocks as free in internal in-memory block allocation 1064064174bSJonathan Corbetstructures only after the transaction freeing them commits. Ext4 uses 1074064174bSJonathan Corbetjournal commit callback for this purpose. 1084064174bSJonathan Corbet 1094064174bSJonathan CorbetWith journal commit callbacks you can ask the journalling layer to call 1104064174bSJonathan Corbeta callback function when the transaction is finally committed to disk, 1114064174bSJonathan Corbetso that you can do some of your own management. You ask the journalling 1124064174bSJonathan Corbetlayer for calling the callback by simply setting 1134064174bSJonathan Corbet``journal->j_commit_callback`` function pointer and that function is 1144064174bSJonathan Corbetcalled after each transaction commit. You can also use 1154064174bSJonathan Corbet``transaction->t_private_list`` for attaching entries to a transaction 1164064174bSJonathan Corbetthat need processing when the transaction commits. 1174064174bSJonathan Corbet 1184064174bSJonathan CorbetJBD2 also provides a way to block all transaction updates via 1197caf3e3fSPuranjay Mohanjbd2_journal_lock_updates() / 1207caf3e3fSPuranjay Mohanjbd2_journal_unlock_updates(). Ext4 uses this when it wants a 1214064174bSJonathan Corbetwindow with a clean and stable fs for a moment. E.g. 1224064174bSJonathan Corbet 1234064174bSJonathan Corbet:: 1244064174bSJonathan Corbet 1254064174bSJonathan Corbet 1264064174bSJonathan Corbet jbd2_journal_lock_updates() //stop new stuff happening.. 1274064174bSJonathan Corbet jbd2_journal_flush() // checkpoint everything. 1284064174bSJonathan Corbet ..do stuff on stable fs 1294064174bSJonathan Corbet jbd2_journal_unlock_updates() // carry on with filesystem use. 1304064174bSJonathan Corbet 1314064174bSJonathan CorbetThe opportunities for abuse and DOS attacks with this should be obvious, 1324064174bSJonathan Corbetif you allow unprivileged userspace to trigger codepaths containing 1334064174bSJonathan Corbetthese calls. 1344064174bSJonathan Corbet 135f5b8b297SHarshad ShirwadkarFast commits 136f5b8b297SHarshad Shirwadkar~~~~~~~~~~~~ 137f5b8b297SHarshad Shirwadkar 138f5b8b297SHarshad ShirwadkarJBD2 to also allows you to perform file-system specific delta commits known as 13937e0a30eSHarshad Shirwadkarfast commits. In order to use fast commits, you will need to set following 14037e0a30eSHarshad Shirwadkarcallbacks that perform correspodning work: 141f5b8b297SHarshad Shirwadkar 142f5b8b297SHarshad Shirwadkar`journal->j_fc_cleanup_cb`: Cleanup function called after every full commit and 143f5b8b297SHarshad Shirwadkarfast commit. 144f5b8b297SHarshad Shirwadkar 145f5b8b297SHarshad Shirwadkar`journal->j_fc_replay_cb`: Replay function called for replay of fast commit 146f5b8b297SHarshad Shirwadkarblocks. 147f5b8b297SHarshad Shirwadkar 148f5b8b297SHarshad ShirwadkarFile system is free to perform fast commits as and when it wants as long as it 149f5b8b297SHarshad Shirwadkargets permission from JBD2 to do so by calling the function 150f5b8b297SHarshad Shirwadkar:c:func:`jbd2_fc_begin_commit()`. Once a fast commit is done, the client 151f5b8b297SHarshad Shirwadkarfile system should tell JBD2 about it by calling 152f5b8b297SHarshad Shirwadkar:c:func:`jbd2_fc_end_commit()`. If file system wants JBD2 to perform a full 153f5b8b297SHarshad Shirwadkarcommit immediately after stopping the fast commit it can do so by calling 154f5b8b297SHarshad Shirwadkar:c:func:`jbd2_fc_end_commit_fallback()`. This is useful if fast commit operation 155f5b8b297SHarshad Shirwadkarfails for some reason and the only way to guarantee consistency is for JBD2 to 156f5b8b297SHarshad Shirwadkarperform the full traditional commit. 157f5b8b297SHarshad Shirwadkar 158f5b8b297SHarshad ShirwadkarJBD2 helper functions to manage fast commit buffers. File system can use 159f5b8b297SHarshad Shirwadkar:c:func:`jbd2_fc_get_buf()` and :c:func:`jbd2_fc_wait_bufs()` to allocate 160f5b8b297SHarshad Shirwadkarand wait on IO completion of fast commit buffers. 161f5b8b297SHarshad Shirwadkar 162f5b8b297SHarshad ShirwadkarCurrently, only Ext4 implements fast commits. For details of its implementation 163f5b8b297SHarshad Shirwadkarof fast commits, please refer to the top level comments in 164f5b8b297SHarshad Shirwadkarfs/ext4/fast_commit.c. 165f5b8b297SHarshad Shirwadkar 1664064174bSJonathan CorbetSummary 1674064174bSJonathan Corbet~~~~~~~ 1684064174bSJonathan Corbet 1694064174bSJonathan CorbetUsing the journal is a matter of wrapping the different context changes, 1704064174bSJonathan Corbetbeing each mount, each modification (transaction) and each changed 1714064174bSJonathan Corbetbuffer to tell the journalling layer about them. 1724064174bSJonathan Corbet 1734064174bSJonathan CorbetData Types 1744064174bSJonathan Corbet---------- 1754064174bSJonathan Corbet 1764064174bSJonathan CorbetThe journalling layer uses typedefs to 'hide' the concrete definitions 1774064174bSJonathan Corbetof the structures used. As a client of the JBD2 layer you can just rely 1784064174bSJonathan Corbeton the using the pointer as a magic cookie of some sort. Obviously the 1794064174bSJonathan Corbethiding is not enforced as this is 'C'. 1804064174bSJonathan Corbet 1814064174bSJonathan CorbetStructures 1824064174bSJonathan Corbet~~~~~~~~~~ 1834064174bSJonathan Corbet 1844064174bSJonathan Corbet.. kernel-doc:: include/linux/jbd2.h 1854064174bSJonathan Corbet :internal: 1864064174bSJonathan Corbet 1874064174bSJonathan CorbetFunctions 1884064174bSJonathan Corbet--------- 1894064174bSJonathan Corbet 1904064174bSJonathan CorbetThe functions here are split into two groups those that affect a journal 1914064174bSJonathan Corbetas a whole, and those which are used to manage transactions 1924064174bSJonathan Corbet 1934064174bSJonathan CorbetJournal Level 1944064174bSJonathan Corbet~~~~~~~~~~~~~ 1954064174bSJonathan Corbet 1964064174bSJonathan Corbet.. kernel-doc:: fs/jbd2/journal.c 1974064174bSJonathan Corbet :export: 1984064174bSJonathan Corbet 1994064174bSJonathan Corbet.. kernel-doc:: fs/jbd2/recovery.c 2004064174bSJonathan Corbet :internal: 2014064174bSJonathan Corbet 2024064174bSJonathan CorbetTransasction Level 2034064174bSJonathan Corbet~~~~~~~~~~~~~~~~~~ 2044064174bSJonathan Corbet 2054064174bSJonathan Corbet.. kernel-doc:: fs/jbd2/transaction.c 2064064174bSJonathan Corbet 2074064174bSJonathan CorbetSee also 2084064174bSJonathan Corbet-------- 2094064174bSJonathan Corbet 2104064174bSJonathan Corbet`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen 2114064174bSJonathan CorbetTweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__ 2124064174bSJonathan Corbet 2134064174bSJonathan Corbet`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen 2144064174bSJonathan CorbetTweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__ 2154064174bSJonathan Corbet 216