xref: /openbmc/linux/Documentation/filesystems/index.rst (revision 2e6ae11dd0d1c37f44cec51a58fb2092e55ed0f5)
1=====================
2Linux Filesystems API
3=====================
4
5The Linux VFS
6=============
7
8The Filesystem types
9--------------------
10
11.. kernel-doc:: include/linux/fs.h
12   :internal:
13
14The Directory Cache
15-------------------
16
17.. kernel-doc:: fs/dcache.c
18   :export:
19
20.. kernel-doc:: include/linux/dcache.h
21   :internal:
22
23Inode Handling
24--------------
25
26.. kernel-doc:: fs/inode.c
27   :export:
28
29.. kernel-doc:: fs/bad_inode.c
30   :export:
31
32Registration and Superblocks
33----------------------------
34
35.. kernel-doc:: fs/super.c
36   :export:
37
38File Locks
39----------
40
41.. kernel-doc:: fs/locks.c
42   :export:
43
44.. kernel-doc:: fs/locks.c
45   :internal:
46
47Other Functions
48---------------
49
50.. kernel-doc:: fs/mpage.c
51   :export:
52
53.. kernel-doc:: fs/namei.c
54   :export:
55
56.. kernel-doc:: fs/buffer.c
57   :export:
58
59.. kernel-doc:: block/bio.c
60   :export:
61
62.. kernel-doc:: fs/seq_file.c
63   :export:
64
65.. kernel-doc:: fs/filesystems.c
66   :export:
67
68.. kernel-doc:: fs/fs-writeback.c
69   :export:
70
71.. kernel-doc:: fs/block_dev.c
72   :export:
73
74.. kernel-doc:: fs/anon_inodes.c
75   :export:
76
77.. kernel-doc:: fs/attr.c
78   :export:
79
80.. kernel-doc:: fs/d_path.c
81   :export:
82
83.. kernel-doc:: fs/dax.c
84   :export:
85
86.. kernel-doc:: fs/direct-io.c
87   :export:
88
89.. kernel-doc:: fs/file_table.c
90   :export:
91
92.. kernel-doc:: fs/libfs.c
93   :export:
94
95.. kernel-doc:: fs/posix_acl.c
96   :export:
97
98.. kernel-doc:: fs/stat.c
99   :export:
100
101.. kernel-doc:: fs/sync.c
102   :export:
103
104.. kernel-doc:: fs/xattr.c
105   :export:
106
107The proc filesystem
108===================
109
110sysctl interface
111----------------
112
113.. kernel-doc:: kernel/sysctl.c
114   :export:
115
116proc filesystem interface
117-------------------------
118
119.. kernel-doc:: fs/proc/base.c
120   :internal:
121
122Events based on file descriptors
123================================
124
125.. kernel-doc:: fs/eventfd.c
126   :export:
127
128The Filesystem for Exporting Kernel Objects
129===========================================
130
131.. kernel-doc:: fs/sysfs/file.c
132   :export:
133
134.. kernel-doc:: fs/sysfs/symlink.c
135   :export:
136
137The debugfs filesystem
138======================
139
140debugfs interface
141-----------------
142
143.. kernel-doc:: fs/debugfs/inode.c
144   :export:
145
146.. kernel-doc:: fs/debugfs/file.c
147   :export:
148
149The Linux Journalling API
150=========================
151
152Overview
153--------
154
155Details
156~~~~~~~
157
158The journalling layer is easy to use. You need to first of all create a
159journal_t data structure. There are two calls to do this dependent on
160how you decide to allocate the physical media on which the journal
161resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in
162filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used
163for journal stored on a raw device (in a continuous range of blocks). A
164journal_t is a typedef for a struct pointer, so when you are finally
165finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up
166any used kernel memory.
167
168Once you have got your journal_t object you need to 'mount' or load the
169journal file. The journalling layer expects the space for the journal
170was already allocated and initialized properly by the userspace tools.
171When loading the journal you must call :c:func:`jbd2_journal_load` to process
172journal contents. If the client file system detects the journal contents
173does not need to be processed (or even need not have valid contents), it
174may call :c:func:`jbd2_journal_wipe` to clear the journal contents before
175calling :c:func:`jbd2_journal_load`.
176
177Note that jbd2_journal_wipe(..,0) calls
178:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding
179transactions in the journal and similarly :c:func:`jbd2_journal_load` will
180call :c:func:`jbd2_journal_recover` if necessary. I would advise reading
181:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage.
182
183Now you can go ahead and start modifying the underlying filesystem.
184Almost.
185
186You still need to actually journal your filesystem changes, this is done
187by wrapping them into transactions. Additionally you also need to wrap
188the modification of each of the buffers with calls to the journal layer,
189so it knows what the modifications you are actually making are. To do
190this use :c:func:`jbd2_journal_start` which returns a transaction handle.
191
192:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`,
193which indicates the end of a transaction are nestable calls, so you can
194reenter a transaction if necessary, but remember you must call
195:c:func:`jbd2_journal_stop` the same number of times as
196:c:func:`jbd2_journal_start` before the transaction is completed (or more
197accurately leaves the update phase). Ext4/VFS makes use of this feature to
198simplify handling of inode dirtying, quota support, etc.
199
200Inside each transaction you need to wrap the modifications to the
201individual buffers (blocks). Before you start to modify a buffer you
202need to call :c:func:`jbd2_journal_get_create_access()` /
203:c:func:`jbd2_journal_get_write_access()` /
204:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the
205journalling layer to copy the unmodified
206data if it needs to. After all the buffer may be part of a previously
207uncommitted transaction. At this point you are at last ready to modify a
208buffer, and once you are have done so you need to call
209:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a
210buffer you now know is now longer required to be pushed back on the
211device you can call :c:func:`jbd2_journal_forget` in much the same way as you
212might have used :c:func:`bforget` in the past.
213
214A :c:func:`jbd2_journal_flush` may be called at any time to commit and
215checkpoint all your transactions.
216
217Then at umount time , in your :c:func:`put_super` you can then call
218:c:func:`jbd2_journal_destroy` to clean up your in-core journal object.
219
220Unfortunately there a couple of ways the journal layer can cause a
221deadlock. The first thing to note is that each task can only have a
222single outstanding transaction at any one time, remember nothing commits
223until the outermost :c:func:`jbd2_journal_stop`. This means you must complete
224the transaction at the end of each file/inode/address etc. operation you
225perform, so that the journalling system isn't re-entered on another
226journal. Since transactions can't be nested/batched across differing
227journals, and another filesystem other than yours (say ext4) may be
228modified in a later syscall.
229
230The second case to bear in mind is that :c:func:`jbd2_journal_start` can block
231if there isn't enough space in the journal for your transaction (based
232on the passed nblocks param) - when it blocks it merely(!) needs to wait
233for transactions to complete and be committed from other tasks, so
234essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid
235deadlocks you must treat :c:func:`jbd2_journal_start` /
236:c:func:`jbd2_journal_stop` as if they were semaphores and include them in
237your semaphore ordering rules to prevent
238deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking
239behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as
240easily as on :c:func:`jbd2_journal_start`.
241
242Try to reserve the right number of blocks the first time. ;-). This will
243be the maximum number of blocks you are going to touch in this
244transaction. I advise having a look at at least ext4_jbd.h to see the
245basis on which ext4 uses to make these decisions.
246
247Another wriggle to watch out for is your on-disk block allocation
248strategy. Why? Because, if you do a delete, you need to ensure you
249haven't reused any of the freed blocks until the transaction freeing
250these blocks commits. If you reused these blocks and crash happens,
251there is no way to restore the contents of the reallocated blocks at the
252end of the last fully committed transaction. One simple way of doing
253this is to mark blocks as free in internal in-memory block allocation
254structures only after the transaction freeing them commits. Ext4 uses
255journal commit callback for this purpose.
256
257With journal commit callbacks you can ask the journalling layer to call
258a callback function when the transaction is finally committed to disk,
259so that you can do some of your own management. You ask the journalling
260layer for calling the callback by simply setting
261``journal->j_commit_callback`` function pointer and that function is
262called after each transaction commit. You can also use
263``transaction->t_private_list`` for attaching entries to a transaction
264that need processing when the transaction commits.
265
266JBD2 also provides a way to block all transaction updates via
267:c:func:`jbd2_journal_lock_updates()` /
268:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a
269window with a clean and stable fs for a moment. E.g.
270
271::
272
273
274        jbd2_journal_lock_updates() //stop new stuff happening..
275        jbd2_journal_flush()        // checkpoint everything.
276        ..do stuff on stable fs
277        jbd2_journal_unlock_updates() // carry on with filesystem use.
278
279The opportunities for abuse and DOS attacks with this should be obvious,
280if you allow unprivileged userspace to trigger codepaths containing
281these calls.
282
283Summary
284~~~~~~~
285
286Using the journal is a matter of wrapping the different context changes,
287being each mount, each modification (transaction) and each changed
288buffer to tell the journalling layer about them.
289
290Data Types
291----------
292
293The journalling layer uses typedefs to 'hide' the concrete definitions
294of the structures used. As a client of the JBD2 layer you can just rely
295on the using the pointer as a magic cookie of some sort. Obviously the
296hiding is not enforced as this is 'C'.
297
298Structures
299~~~~~~~~~~
300
301.. kernel-doc:: include/linux/jbd2.h
302   :internal:
303
304Functions
305---------
306
307The functions here are split into two groups those that affect a journal
308as a whole, and those which are used to manage transactions
309
310Journal Level
311~~~~~~~~~~~~~
312
313.. kernel-doc:: fs/jbd2/journal.c
314   :export:
315
316.. kernel-doc:: fs/jbd2/recovery.c
317   :internal:
318
319Transasction Level
320~~~~~~~~~~~~~~~~~~
321
322.. kernel-doc:: fs/jbd2/transaction.c
323
324See also
325--------
326
327`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen
328Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__
329
330`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen
331Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__
332
333splice API
334==========
335
336splice is a method for moving blocks of data around inside the kernel,
337without continually transferring them between the kernel and user space.
338
339.. kernel-doc:: fs/splice.c
340
341pipes API
342=========
343
344Pipe interfaces are all for in-kernel (builtin image) use. They are not
345exported for use by modules.
346
347.. kernel-doc:: include/linux/pipe_fs_i.h
348   :internal:
349
350.. kernel-doc:: fs/pipe.c
351
352Encryption API
353==============
354
355A library which filesystems can hook into to support transparent
356encryption of files and directories.
357
358.. toctree::
359    :maxdepth: 2
360
361    fscrypt
362