14064174bSJonathan CorbetThe Linux Journalling API
24064174bSJonathan Corbet=========================
34064174bSJonathan Corbet
44064174bSJonathan CorbetOverview
54064174bSJonathan Corbet--------
64064174bSJonathan Corbet
74064174bSJonathan CorbetDetails
84064174bSJonathan Corbet~~~~~~~
94064174bSJonathan Corbet
104064174bSJonathan CorbetThe journalling layer is easy to use. You need to first of all create a
114064174bSJonathan Corbetjournal_t data structure. There are two calls to do this dependent on
124064174bSJonathan Corbethow you decide to allocate the physical media on which the journal
137caf3e3fSPuranjay Mohanresides. The jbd2_journal_init_inode() call is for journals stored in
147caf3e3fSPuranjay Mohanfilesystem inodes, or the jbd2_journal_init_dev() call can be used
154064174bSJonathan Corbetfor journal stored on a raw device (in a continuous range of blocks). A
164064174bSJonathan Corbetjournal_t is a typedef for a struct pointer, so when you are finally
177caf3e3fSPuranjay Mohanfinished make sure you call jbd2_journal_destroy() on it to free up
184064174bSJonathan Corbetany used kernel memory.
194064174bSJonathan Corbet
204064174bSJonathan CorbetOnce you have got your journal_t object you need to 'mount' or load the
214064174bSJonathan Corbetjournal file. The journalling layer expects the space for the journal
224064174bSJonathan Corbetwas already allocated and initialized properly by the userspace tools.
237caf3e3fSPuranjay MohanWhen loading the journal you must call jbd2_journal_load() to process
244064174bSJonathan Corbetjournal contents. If the client file system detects the journal contents
254064174bSJonathan Corbetdoes not need to be processed (or even need not have valid contents), it
267caf3e3fSPuranjay Mohanmay call jbd2_journal_wipe() to clear the journal contents before
277caf3e3fSPuranjay Mohancalling jbd2_journal_load().
284064174bSJonathan Corbet
294064174bSJonathan CorbetNote that jbd2_journal_wipe(..,0) calls
307caf3e3fSPuranjay Mohanjbd2_journal_skip_recovery() for you if it detects any outstanding
317caf3e3fSPuranjay Mohantransactions in the journal and similarly jbd2_journal_load() will
327caf3e3fSPuranjay Mohancall jbd2_journal_recover() if necessary. I would advise reading
337caf3e3fSPuranjay Mohanext4_load_journal() in fs/ext4/super.c for examples on this stage.
344064174bSJonathan Corbet
354064174bSJonathan CorbetNow you can go ahead and start modifying the underlying filesystem.
364064174bSJonathan CorbetAlmost.
374064174bSJonathan Corbet
384064174bSJonathan CorbetYou still need to actually journal your filesystem changes, this is done
394064174bSJonathan Corbetby wrapping them into transactions. Additionally you also need to wrap
404064174bSJonathan Corbetthe modification of each of the buffers with calls to the journal layer,
414064174bSJonathan Corbetso it knows what the modifications you are actually making are. To do
427caf3e3fSPuranjay Mohanthis use jbd2_journal_start() which returns a transaction handle.
434064174bSJonathan Corbet
447caf3e3fSPuranjay Mohanjbd2_journal_start() and its counterpart jbd2_journal_stop(),
454064174bSJonathan Corbetwhich indicates the end of a transaction are nestable calls, so you can
464064174bSJonathan Corbetreenter a transaction if necessary, but remember you must call
477caf3e3fSPuranjay Mohanjbd2_journal_stop() the same number of times as
487caf3e3fSPuranjay Mohanjbd2_journal_start() before the transaction is completed (or more
494064174bSJonathan Corbetaccurately leaves the update phase). Ext4/VFS makes use of this feature to
504064174bSJonathan Corbetsimplify handling of inode dirtying, quota support, etc.
514064174bSJonathan Corbet
524064174bSJonathan CorbetInside each transaction you need to wrap the modifications to the
534064174bSJonathan Corbetindividual buffers (blocks). Before you start to modify a buffer you
547caf3e3fSPuranjay Mohanneed to call jbd2_journal_get_create_access() /
557caf3e3fSPuranjay Mohanjbd2_journal_get_write_access() /
567caf3e3fSPuranjay Mohanjbd2_journal_get_undo_access() as appropriate, this allows the
574064174bSJonathan Corbetjournalling layer to copy the unmodified
584064174bSJonathan Corbetdata if it needs to. After all the buffer may be part of a previously
594064174bSJonathan Corbetuncommitted transaction. At this point you are at last ready to modify a
604064174bSJonathan Corbetbuffer, and once you are have done so you need to call
617caf3e3fSPuranjay Mohanjbd2_journal_dirty_metadata(). Or if you've asked for access to a
624064174bSJonathan Corbetbuffer you now know is now longer required to be pushed back on the
637caf3e3fSPuranjay Mohandevice you can call jbd2_journal_forget() in much the same way as you
647caf3e3fSPuranjay Mohanmight have used bforget() in the past.
654064174bSJonathan Corbet
667caf3e3fSPuranjay MohanA jbd2_journal_flush() may be called at any time to commit and
674064174bSJonathan Corbetcheckpoint all your transactions.
684064174bSJonathan Corbet
697caf3e3fSPuranjay MohanThen at umount time , in your put_super() you can then call
707caf3e3fSPuranjay Mohanjbd2_journal_destroy() to clean up your in-core journal object.
714064174bSJonathan Corbet
724064174bSJonathan CorbetUnfortunately there a couple of ways the journal layer can cause a
734064174bSJonathan Corbetdeadlock. The first thing to note is that each task can only have a
744064174bSJonathan Corbetsingle outstanding transaction at any one time, remember nothing commits
757caf3e3fSPuranjay Mohanuntil the outermost jbd2_journal_stop(). This means you must complete
764064174bSJonathan Corbetthe transaction at the end of each file/inode/address etc. operation you
774064174bSJonathan Corbetperform, so that the journalling system isn't re-entered on another
784064174bSJonathan Corbetjournal. Since transactions can't be nested/batched across differing
794064174bSJonathan Corbetjournals, and another filesystem other than yours (say ext4) may be
804064174bSJonathan Corbetmodified in a later syscall.
814064174bSJonathan Corbet
827caf3e3fSPuranjay MohanThe second case to bear in mind is that jbd2_journal_start() can block
834064174bSJonathan Corbetif there isn't enough space in the journal for your transaction (based
844064174bSJonathan Corbeton the passed nblocks param) - when it blocks it merely(!) needs to wait
854064174bSJonathan Corbetfor transactions to complete and be committed from other tasks, so
867caf3e3fSPuranjay Mohanessentially we are waiting for jbd2_journal_stop(). So to avoid
877caf3e3fSPuranjay Mohandeadlocks you must treat jbd2_journal_start() /
887caf3e3fSPuranjay Mohanjbd2_journal_stop() as if they were semaphores and include them in
894064174bSJonathan Corbetyour semaphore ordering rules to prevent
907caf3e3fSPuranjay Mohandeadlocks. Note that jbd2_journal_extend() has similar blocking
917caf3e3fSPuranjay Mohanbehaviour to jbd2_journal_start() so you can deadlock here just as
927caf3e3fSPuranjay Mohaneasily as on jbd2_journal_start().
934064174bSJonathan Corbet
944064174bSJonathan CorbetTry to reserve the right number of blocks the first time. ;-). This will
954064174bSJonathan Corbetbe the maximum number of blocks you are going to touch in this
964064174bSJonathan Corbettransaction. I advise having a look at at least ext4_jbd.h to see the
974064174bSJonathan Corbetbasis on which ext4 uses to make these decisions.
984064174bSJonathan Corbet
994064174bSJonathan CorbetAnother wriggle to watch out for is your on-disk block allocation
1004064174bSJonathan Corbetstrategy. Why? Because, if you do a delete, you need to ensure you
1014064174bSJonathan Corbethaven't reused any of the freed blocks until the transaction freeing
1024064174bSJonathan Corbetthese blocks commits. If you reused these blocks and crash happens,
1034064174bSJonathan Corbetthere is no way to restore the contents of the reallocated blocks at the
1044064174bSJonathan Corbetend of the last fully committed transaction. One simple way of doing
1054064174bSJonathan Corbetthis is to mark blocks as free in internal in-memory block allocation
1064064174bSJonathan Corbetstructures only after the transaction freeing them commits. Ext4 uses
1074064174bSJonathan Corbetjournal commit callback for this purpose.
1084064174bSJonathan Corbet
1094064174bSJonathan CorbetWith journal commit callbacks you can ask the journalling layer to call
1104064174bSJonathan Corbeta callback function when the transaction is finally committed to disk,
1114064174bSJonathan Corbetso that you can do some of your own management. You ask the journalling
1124064174bSJonathan Corbetlayer for calling the callback by simply setting
1134064174bSJonathan Corbet``journal->j_commit_callback`` function pointer and that function is
1144064174bSJonathan Corbetcalled after each transaction commit. You can also use
1154064174bSJonathan Corbet``transaction->t_private_list`` for attaching entries to a transaction
1164064174bSJonathan Corbetthat need processing when the transaction commits.
1174064174bSJonathan Corbet
1184064174bSJonathan CorbetJBD2 also provides a way to block all transaction updates via
1197caf3e3fSPuranjay Mohanjbd2_journal_lock_updates() /
1207caf3e3fSPuranjay Mohanjbd2_journal_unlock_updates(). Ext4 uses this when it wants a
1214064174bSJonathan Corbetwindow with a clean and stable fs for a moment. E.g.
1224064174bSJonathan Corbet
1234064174bSJonathan Corbet::
1244064174bSJonathan Corbet
1254064174bSJonathan Corbet
1264064174bSJonathan Corbet        jbd2_journal_lock_updates() //stop new stuff happening..
1274064174bSJonathan Corbet        jbd2_journal_flush()        // checkpoint everything.
1284064174bSJonathan Corbet        ..do stuff on stable fs
1294064174bSJonathan Corbet        jbd2_journal_unlock_updates() // carry on with filesystem use.
1304064174bSJonathan Corbet
1314064174bSJonathan CorbetThe opportunities for abuse and DOS attacks with this should be obvious,
1324064174bSJonathan Corbetif you allow unprivileged userspace to trigger codepaths containing
1334064174bSJonathan Corbetthese calls.
1344064174bSJonathan Corbet
135f5b8b297SHarshad ShirwadkarFast commits
136f5b8b297SHarshad Shirwadkar~~~~~~~~~~~~
137f5b8b297SHarshad Shirwadkar
138f5b8b297SHarshad ShirwadkarJBD2 to also allows you to perform file-system specific delta commits known as
13937e0a30eSHarshad Shirwadkarfast commits. In order to use fast commits, you will need to set following
14037e0a30eSHarshad Shirwadkarcallbacks that perform correspodning work:
141f5b8b297SHarshad Shirwadkar
142f5b8b297SHarshad Shirwadkar`journal->j_fc_cleanup_cb`: Cleanup function called after every full commit and
143f5b8b297SHarshad Shirwadkarfast commit.
144f5b8b297SHarshad Shirwadkar
145f5b8b297SHarshad Shirwadkar`journal->j_fc_replay_cb`: Replay function called for replay of fast commit
146f5b8b297SHarshad Shirwadkarblocks.
147f5b8b297SHarshad Shirwadkar
148f5b8b297SHarshad ShirwadkarFile system is free to perform fast commits as and when it wants as long as it
149f5b8b297SHarshad Shirwadkargets permission from JBD2 to do so by calling the function
150f5b8b297SHarshad Shirwadkar:c:func:`jbd2_fc_begin_commit()`. Once a fast commit is done, the client
151f5b8b297SHarshad Shirwadkarfile  system should tell JBD2 about it by calling
152f5b8b297SHarshad Shirwadkar:c:func:`jbd2_fc_end_commit()`. If file system wants JBD2 to perform a full
153f5b8b297SHarshad Shirwadkarcommit immediately after stopping the fast commit it can do so by calling
154f5b8b297SHarshad Shirwadkar:c:func:`jbd2_fc_end_commit_fallback()`. This is useful if fast commit operation
155f5b8b297SHarshad Shirwadkarfails for some reason and the only way to guarantee consistency is for JBD2 to
156f5b8b297SHarshad Shirwadkarperform the full traditional commit.
157f5b8b297SHarshad Shirwadkar
158f5b8b297SHarshad ShirwadkarJBD2 helper functions to manage fast commit buffers. File system can use
159f5b8b297SHarshad Shirwadkar:c:func:`jbd2_fc_get_buf()` and :c:func:`jbd2_fc_wait_bufs()` to allocate
160f5b8b297SHarshad Shirwadkarand wait on IO completion of fast commit buffers.
161f5b8b297SHarshad Shirwadkar
162f5b8b297SHarshad ShirwadkarCurrently, only Ext4 implements fast commits. For details of its implementation
163f5b8b297SHarshad Shirwadkarof fast commits, please refer to the top level comments in
164f5b8b297SHarshad Shirwadkarfs/ext4/fast_commit.c.
165f5b8b297SHarshad Shirwadkar
1664064174bSJonathan CorbetSummary
1674064174bSJonathan Corbet~~~~~~~
1684064174bSJonathan Corbet
1694064174bSJonathan CorbetUsing the journal is a matter of wrapping the different context changes,
1704064174bSJonathan Corbetbeing each mount, each modification (transaction) and each changed
1714064174bSJonathan Corbetbuffer to tell the journalling layer about them.
1724064174bSJonathan Corbet
1734064174bSJonathan CorbetData Types
1744064174bSJonathan Corbet----------
1754064174bSJonathan Corbet
1764064174bSJonathan CorbetThe journalling layer uses typedefs to 'hide' the concrete definitions
1774064174bSJonathan Corbetof the structures used. As a client of the JBD2 layer you can just rely
1784064174bSJonathan Corbeton the using the pointer as a magic cookie of some sort. Obviously the
1794064174bSJonathan Corbethiding is not enforced as this is 'C'.
1804064174bSJonathan Corbet
1814064174bSJonathan CorbetStructures
1824064174bSJonathan Corbet~~~~~~~~~~
1834064174bSJonathan Corbet
1844064174bSJonathan Corbet.. kernel-doc:: include/linux/jbd2.h
1854064174bSJonathan Corbet   :internal:
1864064174bSJonathan Corbet
1874064174bSJonathan CorbetFunctions
1884064174bSJonathan Corbet---------
1894064174bSJonathan Corbet
1904064174bSJonathan CorbetThe functions here are split into two groups those that affect a journal
1914064174bSJonathan Corbetas a whole, and those which are used to manage transactions
1924064174bSJonathan Corbet
1934064174bSJonathan CorbetJournal Level
1944064174bSJonathan Corbet~~~~~~~~~~~~~
1954064174bSJonathan Corbet
1964064174bSJonathan Corbet.. kernel-doc:: fs/jbd2/journal.c
1974064174bSJonathan Corbet   :export:
1984064174bSJonathan Corbet
1994064174bSJonathan Corbet.. kernel-doc:: fs/jbd2/recovery.c
2004064174bSJonathan Corbet   :internal:
2014064174bSJonathan Corbet
2024064174bSJonathan CorbetTransasction Level
2034064174bSJonathan Corbet~~~~~~~~~~~~~~~~~~
2044064174bSJonathan Corbet
2054064174bSJonathan Corbet.. kernel-doc:: fs/jbd2/transaction.c
2064064174bSJonathan Corbet
2074064174bSJonathan CorbetSee also
2084064174bSJonathan Corbet--------
2094064174bSJonathan Corbet
2104064174bSJonathan Corbet`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen
2114064174bSJonathan CorbetTweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__
2124064174bSJonathan Corbet
2134064174bSJonathan Corbet`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen
2144064174bSJonathan CorbetTweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__
2154064174bSJonathan Corbet
216