1ec23eb54SMauro Carvalho Chehab=======
2ec23eb54SMauro Carvalho ChehabLocking
3ec23eb54SMauro Carvalho Chehab=======
4ec23eb54SMauro Carvalho Chehab
5ec23eb54SMauro Carvalho ChehabThe text below describes the locking rules for VFS-related methods.
6ec23eb54SMauro Carvalho ChehabIt is (believed to be) up-to-date. *Please*, if you change anything in
7ec23eb54SMauro Carvalho Chehabprototypes or locking protocols - update this file. And update the relevant
8ec23eb54SMauro Carvalho Chehabinstances in the tree, don't leave that to maintainers of filesystems/devices/
9ec23eb54SMauro Carvalho Chehabetc. At the very least, put the list of dubious cases in the end of this file.
10ec23eb54SMauro Carvalho ChehabDon't turn it into log - maintainers of out-of-the-tree code are supposed to
11ec23eb54SMauro Carvalho Chehabbe able to use diff(1).
12ec23eb54SMauro Carvalho Chehab
13ec23eb54SMauro Carvalho ChehabThing currently missing here: socket operations. Alexey?
14ec23eb54SMauro Carvalho Chehab
15ec23eb54SMauro Carvalho Chehabdentry_operations
16ec23eb54SMauro Carvalho Chehab=================
17ec23eb54SMauro Carvalho Chehab
18ec23eb54SMauro Carvalho Chehabprototypes::
19ec23eb54SMauro Carvalho Chehab
20ec23eb54SMauro Carvalho Chehab	int (*d_revalidate)(struct dentry *, unsigned int);
21ec23eb54SMauro Carvalho Chehab	int (*d_weak_revalidate)(struct dentry *, unsigned int);
22ec23eb54SMauro Carvalho Chehab	int (*d_hash)(const struct dentry *, struct qstr *);
23ec23eb54SMauro Carvalho Chehab	int (*d_compare)(const struct dentry *,
24ec23eb54SMauro Carvalho Chehab			unsigned int, const char *, const struct qstr *);
25ec23eb54SMauro Carvalho Chehab	int (*d_delete)(struct dentry *);
26ec23eb54SMauro Carvalho Chehab	int (*d_init)(struct dentry *);
27ec23eb54SMauro Carvalho Chehab	void (*d_release)(struct dentry *);
28ec23eb54SMauro Carvalho Chehab	void (*d_iput)(struct dentry *, struct inode *);
29ec23eb54SMauro Carvalho Chehab	char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
30ec23eb54SMauro Carvalho Chehab	struct vfsmount *(*d_automount)(struct path *path);
31ec23eb54SMauro Carvalho Chehab	int (*d_manage)(const struct path *, bool);
32ec23eb54SMauro Carvalho Chehab	struct dentry *(*d_real)(struct dentry *, const struct inode *);
33ec23eb54SMauro Carvalho Chehab
34ec23eb54SMauro Carvalho Chehablocking rules:
35ec23eb54SMauro Carvalho Chehab
36ec23eb54SMauro Carvalho Chehab================== ===========	========	==============	========
37ec23eb54SMauro Carvalho Chehabops		   rename_lock	->d_lock	may block	rcu-walk
38ec23eb54SMauro Carvalho Chehab================== ===========	========	==============	========
39ec23eb54SMauro Carvalho Chehabd_revalidate:	   no		no		yes (ref-walk)	maybe
40ec23eb54SMauro Carvalho Chehabd_weak_revalidate: no		no		yes	 	no
41ec23eb54SMauro Carvalho Chehabd_hash		   no		no		no		maybe
42ec23eb54SMauro Carvalho Chehabd_compare:	   yes		no		no		maybe
43ec23eb54SMauro Carvalho Chehabd_delete:	   no		yes		no		no
44ec23eb54SMauro Carvalho Chehabd_init:		   no		no		yes		no
45ec23eb54SMauro Carvalho Chehabd_release:	   no		no		yes		no
46ec23eb54SMauro Carvalho Chehabd_prune:           no		yes		no		no
47ec23eb54SMauro Carvalho Chehabd_iput:		   no		no		yes		no
48ec23eb54SMauro Carvalho Chehabd_dname:	   no		no		no		no
49ec23eb54SMauro Carvalho Chehabd_automount:	   no		no		yes		no
50ec23eb54SMauro Carvalho Chehabd_manage:	   no		no		yes (ref-walk)	maybe
51ec23eb54SMauro Carvalho Chehabd_real		   no		no		yes 		no
52ec23eb54SMauro Carvalho Chehab================== ===========	========	==============	========
53ec23eb54SMauro Carvalho Chehab
54ec23eb54SMauro Carvalho Chehabinode_operations
55ec23eb54SMauro Carvalho Chehab================
56ec23eb54SMauro Carvalho Chehab
57ec23eb54SMauro Carvalho Chehabprototypes::
58ec23eb54SMauro Carvalho Chehab
596c960e68SChristian Brauner	int (*create) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t, bool);
60ec23eb54SMauro Carvalho Chehab	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
61ec23eb54SMauro Carvalho Chehab	int (*link) (struct dentry *,struct inode *,struct dentry *);
62ec23eb54SMauro Carvalho Chehab	int (*unlink) (struct inode *,struct dentry *);
637a77db95SChristian Brauner	int (*symlink) (struct mnt_idmap *, struct inode *,struct dentry *,const char *);
64c54bd91eSChristian Brauner	int (*mkdir) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t);
65ec23eb54SMauro Carvalho Chehab	int (*rmdir) (struct inode *,struct dentry *);
665ebb29beSChristian Brauner	int (*mknod) (struct mnt_idmap *, struct inode *,struct dentry *,umode_t,dev_t);
67e18275aeSChristian Brauner	int (*rename) (struct mnt_idmap *, struct inode *, struct dentry *,
68ec23eb54SMauro Carvalho Chehab			struct inode *, struct dentry *, unsigned int);
69ec23eb54SMauro Carvalho Chehab	int (*readlink) (struct dentry *, char __user *,int);
70ec23eb54SMauro Carvalho Chehab	const char *(*get_link) (struct dentry *, struct inode *, struct delayed_call *);
71ec23eb54SMauro Carvalho Chehab	void (*truncate) (struct inode *);
724609e1f1SChristian Brauner	int (*permission) (struct mnt_idmap *, struct inode *, int, unsigned int);
73cac2f8b8SChristian Brauner	struct posix_acl * (*get_inode_acl)(struct inode *, int, bool);
74c1632a0fSChristian Brauner	int (*setattr) (struct mnt_idmap *, struct dentry *, struct iattr *);
75b74d24f7SChristian Brauner	int (*getattr) (struct mnt_idmap *, const struct path *, struct kstat *, u32, unsigned int);
76ec23eb54SMauro Carvalho Chehab	ssize_t (*listxattr) (struct dentry *, char *, size_t);
77ec23eb54SMauro Carvalho Chehab	int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, u64 len);
78ec23eb54SMauro Carvalho Chehab	void (*update_time)(struct inode *, struct timespec *, int);
79ec23eb54SMauro Carvalho Chehab	int (*atomic_open)(struct inode *, struct dentry *,
80ec23eb54SMauro Carvalho Chehab				struct file *, unsigned open_flag,
81ec23eb54SMauro Carvalho Chehab				umode_t create_mode);
82011e2b71SChristian Brauner	int (*tmpfile) (struct mnt_idmap *, struct inode *,
83863f144fSMiklos Szeredi			struct file *, umode_t);
848782a9aeSChristian Brauner	int (*fileattr_set)(struct mnt_idmap *idmap,
854c5b4799SMiklos Szeredi			    struct dentry *dentry, struct fileattr *fa);
864c5b4799SMiklos Szeredi	int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
8777435322SChristian Brauner	struct posix_acl * (*get_acl)(struct mnt_idmap *, struct dentry *, int);
886faddda6SChuck Lever	struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
89ec23eb54SMauro Carvalho Chehab
90ec23eb54SMauro Carvalho Chehablocking rules:
91ec23eb54SMauro Carvalho Chehab	all may block
92ec23eb54SMauro Carvalho Chehab
936faddda6SChuck Lever==============	==================================================
94ec23eb54SMauro Carvalho Chehabops		i_rwsem(inode)
956faddda6SChuck Lever==============	==================================================
96ec23eb54SMauro Carvalho Chehablookup:		shared
97ec23eb54SMauro Carvalho Chehabcreate:		exclusive
98ec23eb54SMauro Carvalho Chehablink:		exclusive (both)
99ec23eb54SMauro Carvalho Chehabmknod:		exclusive
100ec23eb54SMauro Carvalho Chehabsymlink:	exclusive
101ec23eb54SMauro Carvalho Chehabmkdir:		exclusive
102ec23eb54SMauro Carvalho Chehabunlink:		exclusive (both)
103ec23eb54SMauro Carvalho Chehabrmdir:		exclusive (both)(see below)
104*1db06b3dSAl Virorename:		exclusive (both parents, some children)	(see below)
105ec23eb54SMauro Carvalho Chehabreadlink:	no
106ec23eb54SMauro Carvalho Chehabget_link:	no
107ec23eb54SMauro Carvalho Chehabsetattr:	exclusive
108ec23eb54SMauro Carvalho Chehabpermission:	no (may not block if called in rcu-walk mode)
109cac2f8b8SChristian Braunerget_inode_acl:	no
1107420332aSChristian Braunerget_acl:	no
111ec23eb54SMauro Carvalho Chehabgetattr:	no
112ec23eb54SMauro Carvalho Chehablistxattr:	no
113ec23eb54SMauro Carvalho Chehabfiemap:		no
114ec23eb54SMauro Carvalho Chehabupdate_time:	no
115ff467342SJeff Laytonatomic_open:	shared (exclusive if O_CREAT is set in open flags)
116ec23eb54SMauro Carvalho Chehabtmpfile:	no
1174c5b4799SMiklos Szeredifileattr_get:	no or exclusive
1184c5b4799SMiklos Szeredifileattr_set:	exclusive
1196faddda6SChuck Leverget_offset_ctx  no
1206faddda6SChuck Lever==============	==================================================
121ec23eb54SMauro Carvalho Chehab
122ec23eb54SMauro Carvalho Chehab
123ec23eb54SMauro Carvalho Chehab	Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem
124ec23eb54SMauro Carvalho Chehab	exclusive on victim.
125ec23eb54SMauro Carvalho Chehab	cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
126*1db06b3dSAl Viro	->unlink() and ->rename() have ->i_rwsem exclusive on all non-directories
127*1db06b3dSAl Viro	involved.
128*1db06b3dSAl Viro	->rename() has ->i_rwsem exclusive on any subdirectory that changes parent.
129ec23eb54SMauro Carvalho Chehab
130ec23eb54SMauro Carvalho ChehabSee Documentation/filesystems/directory-locking.rst for more detailed discussion
131ec23eb54SMauro Carvalho Chehabof the locking scheme for directory operations.
132ec23eb54SMauro Carvalho Chehab
133ec23eb54SMauro Carvalho Chehabxattr_handler operations
134ec23eb54SMauro Carvalho Chehab========================
135ec23eb54SMauro Carvalho Chehab
136ec23eb54SMauro Carvalho Chehabprototypes::
137ec23eb54SMauro Carvalho Chehab
138ec23eb54SMauro Carvalho Chehab	bool (*list)(struct dentry *dentry);
139ec23eb54SMauro Carvalho Chehab	int (*get)(const struct xattr_handler *handler, struct dentry *dentry,
140ec23eb54SMauro Carvalho Chehab		   struct inode *inode, const char *name, void *buffer,
141ec23eb54SMauro Carvalho Chehab		   size_t size);
142e65ce2a5SChristian Brauner	int (*set)(const struct xattr_handler *handler,
14339f60c1cSChristian Brauner                   struct mnt_idmap *idmap,
144e65ce2a5SChristian Brauner                   struct dentry *dentry, struct inode *inode, const char *name,
145e65ce2a5SChristian Brauner                   const void *buffer, size_t size, int flags);
146ec23eb54SMauro Carvalho Chehab
147ec23eb54SMauro Carvalho Chehablocking rules:
148ec23eb54SMauro Carvalho Chehab	all may block
149ec23eb54SMauro Carvalho Chehab
150ec23eb54SMauro Carvalho Chehab=====		==============
151ec23eb54SMauro Carvalho Chehabops		i_rwsem(inode)
152ec23eb54SMauro Carvalho Chehab=====		==============
153ec23eb54SMauro Carvalho Chehablist:		no
154ec23eb54SMauro Carvalho Chehabget:		no
155ec23eb54SMauro Carvalho Chehabset:		exclusive
156ec23eb54SMauro Carvalho Chehab=====		==============
157ec23eb54SMauro Carvalho Chehab
158ec23eb54SMauro Carvalho Chehabsuper_operations
159ec23eb54SMauro Carvalho Chehab================
160ec23eb54SMauro Carvalho Chehab
161ec23eb54SMauro Carvalho Chehabprototypes::
162ec23eb54SMauro Carvalho Chehab
163ec23eb54SMauro Carvalho Chehab	struct inode *(*alloc_inode)(struct super_block *sb);
164ec23eb54SMauro Carvalho Chehab	void (*free_inode)(struct inode *);
165ec23eb54SMauro Carvalho Chehab	void (*destroy_inode)(struct inode *);
166ec23eb54SMauro Carvalho Chehab	void (*dirty_inode) (struct inode *, int flags);
167ec23eb54SMauro Carvalho Chehab	int (*write_inode) (struct inode *, struct writeback_control *wbc);
168ec23eb54SMauro Carvalho Chehab	int (*drop_inode) (struct inode *);
169ec23eb54SMauro Carvalho Chehab	void (*evict_inode) (struct inode *);
170ec23eb54SMauro Carvalho Chehab	void (*put_super) (struct super_block *);
171ec23eb54SMauro Carvalho Chehab	int (*sync_fs)(struct super_block *sb, int wait);
172ec23eb54SMauro Carvalho Chehab	int (*freeze_fs) (struct super_block *);
173ec23eb54SMauro Carvalho Chehab	int (*unfreeze_fs) (struct super_block *);
174ec23eb54SMauro Carvalho Chehab	int (*statfs) (struct dentry *, struct kstatfs *);
175ec23eb54SMauro Carvalho Chehab	int (*remount_fs) (struct super_block *, int *, char *);
176ec23eb54SMauro Carvalho Chehab	void (*umount_begin) (struct super_block *);
177ec23eb54SMauro Carvalho Chehab	int (*show_options)(struct seq_file *, struct dentry *);
178ec23eb54SMauro Carvalho Chehab	ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
179ec23eb54SMauro Carvalho Chehab	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
180ec23eb54SMauro Carvalho Chehab
181ec23eb54SMauro Carvalho Chehablocking rules:
182ec23eb54SMauro Carvalho Chehab	All may block [not true, see below]
183ec23eb54SMauro Carvalho Chehab
184ec23eb54SMauro Carvalho Chehab======================	============	========================
185ec23eb54SMauro Carvalho Chehabops			s_umount	note
186ec23eb54SMauro Carvalho Chehab======================	============	========================
187ec23eb54SMauro Carvalho Chehaballoc_inode:
188ec23eb54SMauro Carvalho Chehabfree_inode:				called from RCU callback
189ec23eb54SMauro Carvalho Chehabdestroy_inode:
190ec23eb54SMauro Carvalho Chehabdirty_inode:
191ec23eb54SMauro Carvalho Chehabwrite_inode:
192ec23eb54SMauro Carvalho Chehabdrop_inode:				!!!inode->i_lock!!!
193ec23eb54SMauro Carvalho Chehabevict_inode:
194ec23eb54SMauro Carvalho Chehabput_super:		write
195ec23eb54SMauro Carvalho Chehabsync_fs:		read
196ec23eb54SMauro Carvalho Chehabfreeze_fs:		write
197ec23eb54SMauro Carvalho Chehabunfreeze_fs:		write
198ec23eb54SMauro Carvalho Chehabstatfs:			maybe(read)	(see below)
199ec23eb54SMauro Carvalho Chehabremount_fs:		write
200ec23eb54SMauro Carvalho Chehabumount_begin:		no
201ec23eb54SMauro Carvalho Chehabshow_options:		no		(namespace_sem)
202ec23eb54SMauro Carvalho Chehabquota_read:		no		(see below)
203ec23eb54SMauro Carvalho Chehabquota_write:		no		(see below)
204ec23eb54SMauro Carvalho Chehab======================	============	========================
205ec23eb54SMauro Carvalho Chehab
206ec23eb54SMauro Carvalho Chehab->statfs() has s_umount (shared) when called by ustat(2) (native or
207ec23eb54SMauro Carvalho Chehabcompat), but that's an accident of bad API; s_umount is used to pin
208ec23eb54SMauro Carvalho Chehabthe superblock down when we only have dev_t given us by userland to
209ec23eb54SMauro Carvalho Chehabidentify the superblock.  Everything else (statfs(), fstatfs(), etc.)
210ec23eb54SMauro Carvalho Chehabdoesn't hold it when calling ->statfs() - superblock is pinned down
211ec23eb54SMauro Carvalho Chehabby resolving the pathname passed to syscall.
212ec23eb54SMauro Carvalho Chehab
213ec23eb54SMauro Carvalho Chehab->quota_read() and ->quota_write() functions are both guaranteed to
214ec23eb54SMauro Carvalho Chehabbe the only ones operating on the quota file by the quota code (via
215ec23eb54SMauro Carvalho Chehabdqio_sem) (unless an admin really wants to screw up something and
216ec23eb54SMauro Carvalho Chehabwrites to quota files with quotas on). For other details about locking
217ec23eb54SMauro Carvalho Chehabsee also dquot_operations section.
218ec23eb54SMauro Carvalho Chehab
219ec23eb54SMauro Carvalho Chehabfile_system_type
220ec23eb54SMauro Carvalho Chehab================
221ec23eb54SMauro Carvalho Chehab
222ec23eb54SMauro Carvalho Chehabprototypes::
223ec23eb54SMauro Carvalho Chehab
224ec23eb54SMauro Carvalho Chehab	struct dentry *(*mount) (struct file_system_type *, int,
225ec23eb54SMauro Carvalho Chehab		       const char *, void *);
226ec23eb54SMauro Carvalho Chehab	void (*kill_sb) (struct super_block *);
227ec23eb54SMauro Carvalho Chehab
228ec23eb54SMauro Carvalho Chehablocking rules:
229ec23eb54SMauro Carvalho Chehab
230ec23eb54SMauro Carvalho Chehab=======		=========
231ec23eb54SMauro Carvalho Chehabops		may block
232ec23eb54SMauro Carvalho Chehab=======		=========
233ec23eb54SMauro Carvalho Chehabmount		yes
234ec23eb54SMauro Carvalho Chehabkill_sb		yes
235ec23eb54SMauro Carvalho Chehab=======		=========
236ec23eb54SMauro Carvalho Chehab
237ec23eb54SMauro Carvalho Chehab->mount() returns ERR_PTR or the root dentry; its superblock should be locked
238ec23eb54SMauro Carvalho Chehabon return.
239ec23eb54SMauro Carvalho Chehab
240ec23eb54SMauro Carvalho Chehab->kill_sb() takes a write-locked superblock, does all shutdown work on it,
241ec23eb54SMauro Carvalho Chehabunlocks and drops the reference.
242ec23eb54SMauro Carvalho Chehab
243ec23eb54SMauro Carvalho Chehabaddress_space_operations
244ec23eb54SMauro Carvalho Chehab========================
245ec23eb54SMauro Carvalho Chehabprototypes::
246ec23eb54SMauro Carvalho Chehab
247ec23eb54SMauro Carvalho Chehab	int (*writepage)(struct page *page, struct writeback_control *wbc);
24808830c8bSMatthew Wilcox (Oracle)	int (*read_folio)(struct file *, struct folio *);
249ec23eb54SMauro Carvalho Chehab	int (*writepages)(struct address_space *, struct writeback_control *);
2506f31a5a2SMatthew Wilcox (Oracle)	bool (*dirty_folio)(struct address_space *, struct folio *folio);
2518151b4c8SMatthew Wilcox (Oracle)	void (*readahead)(struct readahead_control *);
252ec23eb54SMauro Carvalho Chehab	int (*write_begin)(struct file *, struct address_space *mapping,
2539d6b0cd7SMatthew Wilcox (Oracle)				loff_t pos, unsigned len,
254ec23eb54SMauro Carvalho Chehab				struct page **pagep, void **fsdata);
255ec23eb54SMauro Carvalho Chehab	int (*write_end)(struct file *, struct address_space *mapping,
256ec23eb54SMauro Carvalho Chehab				loff_t pos, unsigned len, unsigned copied,
257ec23eb54SMauro Carvalho Chehab				struct page *page, void *fsdata);
258ec23eb54SMauro Carvalho Chehab	sector_t (*bmap)(struct address_space *, sector_t);
259128d1f82SMatthew Wilcox (Oracle)	void (*invalidate_folio) (struct folio *, size_t start, size_t len);
260fa29000bSMatthew Wilcox (Oracle)	bool (*release_folio)(struct folio *, gfp_t);
261d2329aa0SMatthew Wilcox (Oracle)	void (*free_folio)(struct folio *);
262ec23eb54SMauro Carvalho Chehab	int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
2635490da4fSMatthew Wilcox (Oracle)	int (*migrate_folio)(struct address_space *, struct folio *dst,
2645490da4fSMatthew Wilcox (Oracle)			struct folio *src, enum migrate_mode);
265affa80e8SMatthew Wilcox (Oracle)	int (*launder_folio)(struct folio *);
2662e7e80f7SMatthew Wilcox (Oracle)	bool (*is_partially_uptodate)(struct folio *, size_t from, size_t count);
267ec23eb54SMauro Carvalho Chehab	int (*error_remove_page)(struct address_space *, struct page *);
268cba738f6SNeilBrown	int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span)
269ec23eb54SMauro Carvalho Chehab	int (*swap_deactivate)(struct file *);
270cba738f6SNeilBrown	int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
271ec23eb54SMauro Carvalho Chehab
272ec23eb54SMauro Carvalho Chehablocking rules:
273d2329aa0SMatthew Wilcox (Oracle)	All except dirty_folio and free_folio may block
274ec23eb54SMauro Carvalho Chehab
275730633f0SJan Kara======================	======================== =========	===============
276d2329aa0SMatthew Wilcox (Oracle)ops			folio locked		 i_rwsem	invalidate_lock
277730633f0SJan Kara======================	======================== =========	===============
278ec23eb54SMauro Carvalho Chehabwritepage:		yes, unlocks (see below)
27908830c8bSMatthew Wilcox (Oracle)read_folio:		yes, unlocks				shared
280ec23eb54SMauro Carvalho Chehabwritepages:
281fa29000bSMatthew Wilcox (Oracle)dirty_folio:		maybe
282730633f0SJan Karareadahead:		yes, unlocks				shared
283ec23eb54SMauro Carvalho Chehabwrite_begin:		locks the page		 exclusive
284ec23eb54SMauro Carvalho Chehabwrite_end:		yes, unlocks		 exclusive
285ec23eb54SMauro Carvalho Chehabbmap:
286128d1f82SMatthew Wilcox (Oracle)invalidate_folio:	yes					exclusive
287fa29000bSMatthew Wilcox (Oracle)release_folio:		yes
288d2329aa0SMatthew Wilcox (Oracle)free_folio:		yes
289ec23eb54SMauro Carvalho Chehabdirect_IO:
2905490da4fSMatthew Wilcox (Oracle)migrate_folio:		yes (both)
291affa80e8SMatthew Wilcox (Oracle)launder_folio:		yes
292ec23eb54SMauro Carvalho Chehabis_partially_uptodate:	yes
293ec23eb54SMauro Carvalho Chehaberror_remove_page:	yes
294ec23eb54SMauro Carvalho Chehabswap_activate:		no
295ec23eb54SMauro Carvalho Chehabswap_deactivate:	no
296cba738f6SNeilBrownswap_rw:		yes, unlocks
2977882c55eSRandy Dunlap======================	======================== =========	===============
298ec23eb54SMauro Carvalho Chehab
29908830c8bSMatthew Wilcox (Oracle)->write_begin(), ->write_end() and ->read_folio() may be called from
300ec23eb54SMauro Carvalho Chehabthe request handler (/dev/loop).
301ec23eb54SMauro Carvalho Chehab
30208830c8bSMatthew Wilcox (Oracle)->read_folio() unlocks the folio, either synchronously or via I/O
303ec23eb54SMauro Carvalho Chehabcompletion.
304ec23eb54SMauro Carvalho Chehab
30508830c8bSMatthew Wilcox (Oracle)->readahead() unlocks the folios that I/O is attempted on like ->read_folio().
3068151b4c8SMatthew Wilcox (Oracle)
307ec23eb54SMauro Carvalho Chehab->writepage() is used for two purposes: for "memory cleansing" and for
308ec23eb54SMauro Carvalho Chehab"sync".  These are quite different operations and the behaviour may differ
309ec23eb54SMauro Carvalho Chehabdepending upon the mode.
310ec23eb54SMauro Carvalho Chehab
311ec23eb54SMauro Carvalho ChehabIf writepage is called for sync (wbc->sync_mode != WBC_SYNC_NONE) then
312ec23eb54SMauro Carvalho Chehabit *must* start I/O against the page, even if that would involve
313ec23eb54SMauro Carvalho Chehabblocking on in-progress I/O.
314ec23eb54SMauro Carvalho Chehab
315ec23eb54SMauro Carvalho ChehabIf writepage is called for memory cleansing (sync_mode ==
316ec23eb54SMauro Carvalho ChehabWBC_SYNC_NONE) then its role is to get as much writeout underway as
317ec23eb54SMauro Carvalho Chehabpossible.  So writepage should try to avoid blocking against
318ec23eb54SMauro Carvalho Chehabcurrently-in-progress I/O.
319ec23eb54SMauro Carvalho Chehab
320ec23eb54SMauro Carvalho ChehabIf the filesystem is not called for "sync" and it determines that it
321ec23eb54SMauro Carvalho Chehabwould need to block against in-progress I/O to be able to start new I/O
322ec23eb54SMauro Carvalho Chehabagainst the page the filesystem should redirty the page with
323ec23eb54SMauro Carvalho Chehabredirty_page_for_writepage(), then unlock the page and return zero.
324ec23eb54SMauro Carvalho ChehabThis may also be done to avoid internal deadlocks, but rarely.
325ec23eb54SMauro Carvalho Chehab
326ec23eb54SMauro Carvalho ChehabIf the filesystem is called for sync then it must wait on any
327ec23eb54SMauro Carvalho Chehabin-progress I/O and then start new I/O.
328ec23eb54SMauro Carvalho Chehab
329ec23eb54SMauro Carvalho ChehabThe filesystem should unlock the page synchronously, before returning to the
330ec23eb54SMauro Carvalho Chehabcaller, unless ->writepage() returns special WRITEPAGE_ACTIVATE
331ec23eb54SMauro Carvalho Chehabvalue. WRITEPAGE_ACTIVATE means that page cannot really be written out
332ec23eb54SMauro Carvalho Chehabcurrently, and VM should stop calling ->writepage() on this page for some
333ec23eb54SMauro Carvalho Chehabtime. VM does this by moving page to the head of the active list, hence the
334ec23eb54SMauro Carvalho Chehabname.
335ec23eb54SMauro Carvalho Chehab
336ec23eb54SMauro Carvalho ChehabUnless the filesystem is going to redirty_page_for_writepage(), unlock the page
337ec23eb54SMauro Carvalho Chehaband return zero, writepage *must* run set_page_writeback() against the page,
338ec23eb54SMauro Carvalho Chehabfollowed by unlocking it.  Once set_page_writeback() has been run against the
339ec23eb54SMauro Carvalho Chehabpage, write I/O can be submitted and the write I/O completion handler must run
340ec23eb54SMauro Carvalho Chehabend_page_writeback() once the I/O is complete.  If no I/O is submitted, the
341ec23eb54SMauro Carvalho Chehabfilesystem must run end_page_writeback() against the page before returning from
342ec23eb54SMauro Carvalho Chehabwritepage.
343ec23eb54SMauro Carvalho Chehab
344ec23eb54SMauro Carvalho ChehabThat is: after 2.5.12, pages which are under writeout are *not* locked.  Note,
345ec23eb54SMauro Carvalho Chehabif the filesystem needs the page to be locked during writeout, that is ok, too,
346ec23eb54SMauro Carvalho Chehabthe page is allowed to be unlocked at any point in time between the calls to
347ec23eb54SMauro Carvalho Chehabset_page_writeback() and end_page_writeback().
348ec23eb54SMauro Carvalho Chehab
349ec23eb54SMauro Carvalho ChehabNote, failure to run either redirty_page_for_writepage() or the combination of
350ec23eb54SMauro Carvalho Chehabset_page_writeback()/end_page_writeback() on a page submitted to writepage
351ec23eb54SMauro Carvalho Chehabwill leave the page itself marked clean but it will be tagged as dirty in the
352ec23eb54SMauro Carvalho Chehabradix tree.  This incoherency can lead to all sorts of hard-to-debug problems
353ec23eb54SMauro Carvalho Chehabin the filesystem like having dirty inodes at umount and losing written data.
354ec23eb54SMauro Carvalho Chehab
355ec23eb54SMauro Carvalho Chehab->writepages() is used for periodic writeback and for syscall-initiated
356ec23eb54SMauro Carvalho Chehabsync operations.  The address_space should start I/O against at least
357ec23eb54SMauro Carvalho Chehab``*nr_to_write`` pages.  ``*nr_to_write`` must be decremented for each page
358ec23eb54SMauro Carvalho Chehabwhich is written.  The address_space implementation may write more (or less)
359ec23eb54SMauro Carvalho Chehabpages than ``*nr_to_write`` asks for, but it should try to be reasonably close.
360ec23eb54SMauro Carvalho ChehabIf nr_to_write is NULL, all dirty pages must be written.
361ec23eb54SMauro Carvalho Chehab
362ec23eb54SMauro Carvalho Chehabwritepages should _only_ write pages which are present on
363ec23eb54SMauro Carvalho Chehabmapping->io_pages.
364ec23eb54SMauro Carvalho Chehab
3656f31a5a2SMatthew Wilcox (Oracle)->dirty_folio() is called from various places in the kernel when
3666f31a5a2SMatthew Wilcox (Oracle)the target folio is marked as needing writeback.  The folio cannot be
3676f31a5a2SMatthew Wilcox (Oracle)truncated because either the caller holds the folio lock, or the caller
3686f31a5a2SMatthew Wilcox (Oracle)has found the folio while holding the page table lock which will block
3696f31a5a2SMatthew Wilcox (Oracle)truncation.
370ec23eb54SMauro Carvalho Chehab
371ec23eb54SMauro Carvalho Chehab->bmap() is currently used by legacy ioctl() (FIBMAP) provided by some
372ec23eb54SMauro Carvalho Chehabfilesystems and by the swapper. The latter will eventually go away.  Please,
373ec23eb54SMauro Carvalho Chehabkeep it that way and don't breed new callers.
374ec23eb54SMauro Carvalho Chehab
375128d1f82SMatthew Wilcox (Oracle)->invalidate_folio() is called when the filesystem must attempt to drop
376ec23eb54SMauro Carvalho Chehabsome or all of the buffers from the page when it is being truncated. It
377128d1f82SMatthew Wilcox (Oracle)returns zero on success.  The filesystem must exclusively acquire
378128d1f82SMatthew Wilcox (Oracle)invalidate_lock before invalidating page cache in truncate / hole punch
379128d1f82SMatthew Wilcox (Oracle)path (and thus calling into ->invalidate_folio) to block races between page
380128d1f82SMatthew Wilcox (Oracle)cache invalidation and page cache filling functions (fault, read, ...).
381ec23eb54SMauro Carvalho Chehab
38232b29cc9SMatthew Wilcox (Oracle)->release_folio() is called when the MM wants to make a change to the
38332b29cc9SMatthew Wilcox (Oracle)folio that would invalidate the filesystem's private data.  For example,
38432b29cc9SMatthew Wilcox (Oracle)it may be about to be removed from the address_space or split.  The folio
38532b29cc9SMatthew Wilcox (Oracle)is locked and not under writeback.  It may be dirty.  The gfp parameter
38632b29cc9SMatthew Wilcox (Oracle)is not usually used for allocation, but rather to indicate what the
38732b29cc9SMatthew Wilcox (Oracle)filesystem may do to attempt to free the private data.  The filesystem may
38832b29cc9SMatthew Wilcox (Oracle)return false to indicate that the folio's private data cannot be freed.
38932b29cc9SMatthew Wilcox (Oracle)If it returns true, it should have already removed the private data from
39032b29cc9SMatthew Wilcox (Oracle)the folio.  If a filesystem does not provide a ->release_folio method,
39132b29cc9SMatthew Wilcox (Oracle)the pagecache will assume that private data is buffer_heads and call
39232b29cc9SMatthew Wilcox (Oracle)try_to_free_buffers().
393ec23eb54SMauro Carvalho Chehab
394d2329aa0SMatthew Wilcox (Oracle)->free_folio() is called when the kernel has dropped the folio
395ec23eb54SMauro Carvalho Chehabfrom the page cache.
396ec23eb54SMauro Carvalho Chehab
397affa80e8SMatthew Wilcox (Oracle)->launder_folio() may be called prior to releasing a folio if
398affa80e8SMatthew Wilcox (Oracle)it is still found to be dirty. It returns zero if the folio was successfully
399affa80e8SMatthew Wilcox (Oracle)cleaned, or an error value if not. Note that in order to prevent the folio
400ec23eb54SMauro Carvalho Chehabgetting mapped back in and redirtied, it needs to be kept locked
401ec23eb54SMauro Carvalho Chehabacross the entire operation.
402ec23eb54SMauro Carvalho Chehab
403cba738f6SNeilBrown->swap_activate() will be called to prepare the given file for swap.  It
404cba738f6SNeilBrownshould perform any validation and preparation necessary to ensure that
405cba738f6SNeilBrownwrites can be performed with minimal memory allocation.  It should call
406cba738f6SNeilBrownadd_swap_extent(), or the helper iomap_swapfile_activate(), and return
407cba738f6SNeilBrownthe number of extents added.  If IO should be submitted through
408cba738f6SNeilBrown->swap_rw(), it should set SWP_FS_OPS, otherwise IO will be submitted
409cba738f6SNeilBrowndirectly to the block device ``sis->bdev``.
410ec23eb54SMauro Carvalho Chehab
411ec23eb54SMauro Carvalho Chehab->swap_deactivate() will be called in the sys_swapoff()
412ec23eb54SMauro Carvalho Chehabpath after ->swap_activate() returned success.
413ec23eb54SMauro Carvalho Chehab
414cba738f6SNeilBrown->swap_rw will be called for swap IO if SWP_FS_OPS was set by ->swap_activate().
415cba738f6SNeilBrown
416ec23eb54SMauro Carvalho Chehabfile_lock_operations
417ec23eb54SMauro Carvalho Chehab====================
418ec23eb54SMauro Carvalho Chehab
419ec23eb54SMauro Carvalho Chehabprototypes::
420ec23eb54SMauro Carvalho Chehab
421ec23eb54SMauro Carvalho Chehab	void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
422ec23eb54SMauro Carvalho Chehab	void (*fl_release_private)(struct file_lock *);
423ec23eb54SMauro Carvalho Chehab
424ec23eb54SMauro Carvalho Chehab
425ec23eb54SMauro Carvalho Chehablocking rules:
426ec23eb54SMauro Carvalho Chehab
427ec23eb54SMauro Carvalho Chehab===================	=============	=========
428ec23eb54SMauro Carvalho Chehabops			inode->i_lock	may block
429ec23eb54SMauro Carvalho Chehab===================	=============	=========
430ec23eb54SMauro Carvalho Chehabfl_copy_lock:		yes		no
431ec23eb54SMauro Carvalho Chehabfl_release_private:	maybe		maybe[1]_
432ec23eb54SMauro Carvalho Chehab===================	=============	=========
433ec23eb54SMauro Carvalho Chehab
434ec23eb54SMauro Carvalho Chehab.. [1]:
435ec23eb54SMauro Carvalho Chehab   ->fl_release_private for flock or POSIX locks is currently allowed
436ec23eb54SMauro Carvalho Chehab   to block. Leases however can still be freed while the i_lock is held and
437ec23eb54SMauro Carvalho Chehab   so fl_release_private called on a lease should not block.
438ec23eb54SMauro Carvalho Chehab
439ec23eb54SMauro Carvalho Chehablock_manager_operations
440ec23eb54SMauro Carvalho Chehab=======================
441ec23eb54SMauro Carvalho Chehab
442ec23eb54SMauro Carvalho Chehabprototypes::
443ec23eb54SMauro Carvalho Chehab
444ec23eb54SMauro Carvalho Chehab	void (*lm_notify)(struct file_lock *);  /* unblock callback */
445ec23eb54SMauro Carvalho Chehab	int (*lm_grant)(struct file_lock *, struct file_lock *, int);
446ec23eb54SMauro Carvalho Chehab	void (*lm_break)(struct file_lock *); /* break_lease callback */
447ec23eb54SMauro Carvalho Chehab	int (*lm_change)(struct file_lock **, int);
44828df3d15SJ. Bruce Fields	bool (*lm_breaker_owns_lease)(struct file_lock *);
4492443da22SDai Ngo        bool (*lm_lock_expirable)(struct file_lock *);
4502443da22SDai Ngo        void (*lm_expire_lock)(void);
451ec23eb54SMauro Carvalho Chehab
452ec23eb54SMauro Carvalho Chehablocking rules:
453ec23eb54SMauro Carvalho Chehab
4546cbef2adSRandy Dunlap======================	=============	=================	=========
4559d664776SDai Ngoops			   flc_lock  	blocked_lock_lock	may block
4566cbef2adSRandy Dunlap======================	=============	=================	=========
4579d664776SDai Ngolm_notify:		no      	yes			no
458ec23eb54SMauro Carvalho Chehablm_grant:		no		no			no
459ec23eb54SMauro Carvalho Chehablm_break:		yes		no			no
460ec23eb54SMauro Carvalho Chehablm_change		yes		no			no
4619d664776SDai Ngolm_breaker_owns_lease:	yes     	no			no
4622443da22SDai Ngolm_lock_expirable	yes		no			no
4632443da22SDai Ngolm_expire_lock		no		no			yes
4646cbef2adSRandy Dunlap======================	=============	=================	=========
465ec23eb54SMauro Carvalho Chehab
466ec23eb54SMauro Carvalho Chehabbuffer_head
467ec23eb54SMauro Carvalho Chehab===========
468ec23eb54SMauro Carvalho Chehab
469ec23eb54SMauro Carvalho Chehabprototypes::
470ec23eb54SMauro Carvalho Chehab
471ec23eb54SMauro Carvalho Chehab	void (*b_end_io)(struct buffer_head *bh, int uptodate);
472ec23eb54SMauro Carvalho Chehab
473ec23eb54SMauro Carvalho Chehablocking rules:
474ec23eb54SMauro Carvalho Chehab
475ec23eb54SMauro Carvalho Chehabcalled from interrupts. In other words, extreme care is needed here.
476ec23eb54SMauro Carvalho Chehabbh is locked, but that's all warranties we have here. Currently only RAID1,
477ec23eb54SMauro Carvalho Chehabhighmem, fs/buffer.c, and fs/ntfs/aops.c are providing these. Block devices
478ec23eb54SMauro Carvalho Chehabcall this method upon the IO completion.
479ec23eb54SMauro Carvalho Chehab
480ec23eb54SMauro Carvalho Chehabblock_device_operations
481ec23eb54SMauro Carvalho Chehab=======================
482ec23eb54SMauro Carvalho Chehabprototypes::
483ec23eb54SMauro Carvalho Chehab
484ec23eb54SMauro Carvalho Chehab	int (*open) (struct block_device *, fmode_t);
485ec23eb54SMauro Carvalho Chehab	int (*release) (struct gendisk *, fmode_t);
486ec23eb54SMauro Carvalho Chehab	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
487ec23eb54SMauro Carvalho Chehab	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
488ec23eb54SMauro Carvalho Chehab	int (*direct_access) (struct block_device *, sector_t, void **,
489ec23eb54SMauro Carvalho Chehab				unsigned long *);
490ec23eb54SMauro Carvalho Chehab	void (*unlock_native_capacity) (struct gendisk *);
491ec23eb54SMauro Carvalho Chehab	int (*getgeo)(struct block_device *, struct hd_geometry *);
492ec23eb54SMauro Carvalho Chehab	void (*swap_slot_free_notify) (struct block_device *, unsigned long);
493ec23eb54SMauro Carvalho Chehab
494ec23eb54SMauro Carvalho Chehablocking rules:
495ec23eb54SMauro Carvalho Chehab
496ec23eb54SMauro Carvalho Chehab======================= ===================
497a8698707SChristoph Hellwigops			open_mutex
498ec23eb54SMauro Carvalho Chehab======================= ===================
499ec23eb54SMauro Carvalho Chehabopen:			yes
500ec23eb54SMauro Carvalho Chehabrelease:		yes
501ec23eb54SMauro Carvalho Chehabioctl:			no
502ec23eb54SMauro Carvalho Chehabcompat_ioctl:		no
503ec23eb54SMauro Carvalho Chehabdirect_access:		no
504ec23eb54SMauro Carvalho Chehabunlock_native_capacity:	no
505ec23eb54SMauro Carvalho Chehabgetgeo:			no
506ec23eb54SMauro Carvalho Chehabswap_slot_free_notify:	no	(see below)
507ec23eb54SMauro Carvalho Chehab======================= ===================
508ec23eb54SMauro Carvalho Chehab
509ec23eb54SMauro Carvalho Chehabswap_slot_free_notify is called with swap_lock and sometimes the page lock
510ec23eb54SMauro Carvalho Chehabheld.
511ec23eb54SMauro Carvalho Chehab
512ec23eb54SMauro Carvalho Chehab
513ec23eb54SMauro Carvalho Chehabfile_operations
514ec23eb54SMauro Carvalho Chehab===============
515ec23eb54SMauro Carvalho Chehab
516ec23eb54SMauro Carvalho Chehabprototypes::
517ec23eb54SMauro Carvalho Chehab
518ec23eb54SMauro Carvalho Chehab	loff_t (*llseek) (struct file *, loff_t, int);
519ec23eb54SMauro Carvalho Chehab	ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
520ec23eb54SMauro Carvalho Chehab	ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
521ec23eb54SMauro Carvalho Chehab	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
522ec23eb54SMauro Carvalho Chehab	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
523c625b4ccSJan Kara	int (*iopoll) (struct kiocb *kiocb, bool spin);
524ec23eb54SMauro Carvalho Chehab	int (*iterate_shared) (struct file *, struct dir_context *);
525ec23eb54SMauro Carvalho Chehab	__poll_t (*poll) (struct file *, struct poll_table_struct *);
526ec23eb54SMauro Carvalho Chehab	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
527ec23eb54SMauro Carvalho Chehab	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
528ec23eb54SMauro Carvalho Chehab	int (*mmap) (struct file *, struct vm_area_struct *);
529ec23eb54SMauro Carvalho Chehab	int (*open) (struct inode *, struct file *);
530ec23eb54SMauro Carvalho Chehab	int (*flush) (struct file *);
531ec23eb54SMauro Carvalho Chehab	int (*release) (struct inode *, struct file *);
532ec23eb54SMauro Carvalho Chehab	int (*fsync) (struct file *, loff_t start, loff_t end, int datasync);
533ec23eb54SMauro Carvalho Chehab	int (*fasync) (int, struct file *, int);
534ec23eb54SMauro Carvalho Chehab	int (*lock) (struct file *, int, struct file_lock *);
535ec23eb54SMauro Carvalho Chehab	unsigned long (*get_unmapped_area)(struct file *, unsigned long,
536ec23eb54SMauro Carvalho Chehab			unsigned long, unsigned long, unsigned long);
537ec23eb54SMauro Carvalho Chehab	int (*check_flags)(int);
538ec23eb54SMauro Carvalho Chehab	int (*flock) (struct file *, int, struct file_lock *);
539ec23eb54SMauro Carvalho Chehab	ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *,
540ec23eb54SMauro Carvalho Chehab			size_t, unsigned int);
541ec23eb54SMauro Carvalho Chehab	ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *,
542ec23eb54SMauro Carvalho Chehab			size_t, unsigned int);
543ec23eb54SMauro Carvalho Chehab	int (*setlease)(struct file *, long, struct file_lock **, void **);
544ec23eb54SMauro Carvalho Chehab	long (*fallocate)(struct file *, int, loff_t, loff_t);
545c625b4ccSJan Kara	void (*show_fdinfo)(struct seq_file *m, struct file *f);
546c625b4ccSJan Kara	unsigned (*mmap_capabilities)(struct file *);
547c625b4ccSJan Kara	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *,
548c625b4ccSJan Kara			loff_t, size_t, unsigned int);
549c625b4ccSJan Kara	loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in,
550c625b4ccSJan Kara			struct file *file_out, loff_t pos_out,
551c625b4ccSJan Kara			loff_t len, unsigned int remap_flags);
552c625b4ccSJan Kara	int (*fadvise)(struct file *, loff_t, loff_t, int);
553ec23eb54SMauro Carvalho Chehab
554ec23eb54SMauro Carvalho Chehablocking rules:
555ec23eb54SMauro Carvalho Chehab	All may block.
556ec23eb54SMauro Carvalho Chehab
557ec23eb54SMauro Carvalho Chehab->llseek() locking has moved from llseek to the individual llseek
558ec23eb54SMauro Carvalho Chehabimplementations.  If your fs is not using generic_file_llseek, you
559ec23eb54SMauro Carvalho Chehabneed to acquire and release the appropriate locks in your ->llseek().
560ec23eb54SMauro Carvalho ChehabFor many filesystems, it is probably safe to acquire the inode
561ec23eb54SMauro Carvalho Chehabmutex or just to use i_size_read() instead.
562ec23eb54SMauro Carvalho ChehabNote: this does not protect the file->f_pos against concurrent modifications
563ec23eb54SMauro Carvalho Chehabsince this is something the userspace has to take care about.
564ec23eb54SMauro Carvalho Chehab
5653e327154SLinus Torvalds->iterate_shared() is called with i_rwsem held for reading, and with the
5663e327154SLinus Torvaldsfile f_pos_lock held exclusively
567ec23eb54SMauro Carvalho Chehab
568ec23eb54SMauro Carvalho Chehab->fasync() is responsible for maintaining the FASYNC bit in filp->f_flags.
569ec23eb54SMauro Carvalho ChehabMost instances call fasync_helper(), which does that maintenance, so it's
570ec23eb54SMauro Carvalho Chehabnot normally something one needs to worry about.  Return values > 0 will be
571ec23eb54SMauro Carvalho Chehabmapped to zero in the VFS layer.
572ec23eb54SMauro Carvalho Chehab
573ec23eb54SMauro Carvalho Chehab->readdir() and ->ioctl() on directories must be changed. Ideally we would
574ec23eb54SMauro Carvalho Chehabmove ->readdir() to inode_operations and use a separate method for directory
575ec23eb54SMauro Carvalho Chehab->ioctl() or kill the latter completely. One of the problems is that for
576ec23eb54SMauro Carvalho Chehabanything that resembles union-mount we won't have a struct file for all
577ec23eb54SMauro Carvalho Chehabcomponents. And there are other reasons why the current interface is a mess...
578ec23eb54SMauro Carvalho Chehab
579ec23eb54SMauro Carvalho Chehab->read on directories probably must go away - we should just enforce -EISDIR
580ec23eb54SMauro Carvalho Chehabin sys_read() and friends.
581ec23eb54SMauro Carvalho Chehab
582ec23eb54SMauro Carvalho Chehab->setlease operations should call generic_setlease() before or after setting
583ec23eb54SMauro Carvalho Chehabthe lease within the individual filesystem to record the result of the
584ec23eb54SMauro Carvalho Chehaboperation
585ec23eb54SMauro Carvalho Chehab
586730633f0SJan Kara->fallocate implementation must be really careful to maintain page cache
587730633f0SJan Karaconsistency when punching holes or performing other operations that invalidate
588730633f0SJan Karapage cache contents. Usually the filesystem needs to call
589730633f0SJan Karatruncate_inode_pages_range() to invalidate relevant range of the page cache.
590730633f0SJan KaraHowever the filesystem usually also needs to update its internal (and on disk)
591730633f0SJan Karaview of file offset -> disk block mapping. Until this update is finished, the
592730633f0SJan Karafilesystem needs to block page faults and reads from reloading now-stale page
593730633f0SJan Karacache contents from the disk. Since VFS acquires mapping->invalidate_lock in
594730633f0SJan Karashared mode when loading pages from disk (filemap_fault(), filemap_read(),
595730633f0SJan Karareadahead paths), the fallocate implementation must take the invalidate_lock to
596730633f0SJan Karaprevent reloading.
597730633f0SJan Kara
598730633f0SJan Kara->copy_file_range and ->remap_file_range implementations need to serialize
599730633f0SJan Karaagainst modifications of file data while the operation is running. For
600730633f0SJan Karablocking changes through write(2) and similar operations inode->i_rwsem can be
601730633f0SJan Karaused. To block changes to file contents via a memory mapping during the
602730633f0SJan Karaoperation, the filesystem must take mapping->invalidate_lock to coordinate
603730633f0SJan Karawith ->page_mkwrite.
604730633f0SJan Kara
605ec23eb54SMauro Carvalho Chehabdquot_operations
606ec23eb54SMauro Carvalho Chehab================
607ec23eb54SMauro Carvalho Chehab
608ec23eb54SMauro Carvalho Chehabprototypes::
609ec23eb54SMauro Carvalho Chehab
610ec23eb54SMauro Carvalho Chehab	int (*write_dquot) (struct dquot *);
611ec23eb54SMauro Carvalho Chehab	int (*acquire_dquot) (struct dquot *);
612ec23eb54SMauro Carvalho Chehab	int (*release_dquot) (struct dquot *);
613ec23eb54SMauro Carvalho Chehab	int (*mark_dirty) (struct dquot *);
614ec23eb54SMauro Carvalho Chehab	int (*write_info) (struct super_block *, int);
615ec23eb54SMauro Carvalho Chehab
616ec23eb54SMauro Carvalho ChehabThese operations are intended to be more or less wrapping functions that ensure
617ec23eb54SMauro Carvalho Chehaba proper locking wrt the filesystem and call the generic quota operations.
618ec23eb54SMauro Carvalho Chehab
619ec23eb54SMauro Carvalho ChehabWhat filesystem should expect from the generic quota functions:
620ec23eb54SMauro Carvalho Chehab
621ec23eb54SMauro Carvalho Chehab==============	============	=========================
622ec23eb54SMauro Carvalho Chehabops		FS recursion	Held locks when called
623ec23eb54SMauro Carvalho Chehab==============	============	=========================
624ec23eb54SMauro Carvalho Chehabwrite_dquot:	yes		dqonoff_sem or dqptr_sem
625ec23eb54SMauro Carvalho Chehabacquire_dquot:	yes		dqonoff_sem or dqptr_sem
626ec23eb54SMauro Carvalho Chehabrelease_dquot:	yes		dqonoff_sem or dqptr_sem
627ec23eb54SMauro Carvalho Chehabmark_dirty:	no		-
628ec23eb54SMauro Carvalho Chehabwrite_info:	yes		dqonoff_sem
629ec23eb54SMauro Carvalho Chehab==============	============	=========================
630ec23eb54SMauro Carvalho Chehab
631ec23eb54SMauro Carvalho ChehabFS recursion means calling ->quota_read() and ->quota_write() from superblock
632ec23eb54SMauro Carvalho Chehaboperations.
633ec23eb54SMauro Carvalho Chehab
634ec23eb54SMauro Carvalho ChehabMore details about quota locking can be found in fs/dquot.c.
635ec23eb54SMauro Carvalho Chehab
636ec23eb54SMauro Carvalho Chehabvm_operations_struct
637ec23eb54SMauro Carvalho Chehab====================
638ec23eb54SMauro Carvalho Chehab
639ec23eb54SMauro Carvalho Chehabprototypes::
640ec23eb54SMauro Carvalho Chehab
641ec23eb54SMauro Carvalho Chehab	void (*open)(struct vm_area_struct *);
642ec23eb54SMauro Carvalho Chehab	void (*close)(struct vm_area_struct *);
64340d49a3cSMatthew Wilcox (Oracle)	vm_fault_t (*fault)(struct vm_fault *);
64440d49a3cSMatthew Wilcox (Oracle)	vm_fault_t (*huge_fault)(struct vm_fault *, unsigned int order);
64540d49a3cSMatthew Wilcox (Oracle)	vm_fault_t (*map_pages)(struct vm_fault *, pgoff_t start, pgoff_t end);
646ec23eb54SMauro Carvalho Chehab	vm_fault_t (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *);
647ec23eb54SMauro Carvalho Chehab	vm_fault_t (*pfn_mkwrite)(struct vm_area_struct *, struct vm_fault *);
648ec23eb54SMauro Carvalho Chehab	int (*access)(struct vm_area_struct *, unsigned long, void*, int, int);
649ec23eb54SMauro Carvalho Chehab
650ec23eb54SMauro Carvalho Chehablocking rules:
651ec23eb54SMauro Carvalho Chehab
65240d49a3cSMatthew Wilcox (Oracle)=============	==========	===========================
653c1e8d7c6SMichel Lespinasseops		mmap_lock	PageLocked(page)
65440d49a3cSMatthew Wilcox (Oracle)=============	==========	===========================
65540d49a3cSMatthew Wilcox (Oracle)open:		write
65640d49a3cSMatthew Wilcox (Oracle)close:		read/write
65740d49a3cSMatthew Wilcox (Oracle)fault:		read		can return with page locked
65840d49a3cSMatthew Wilcox (Oracle)huge_fault:	maybe-read
65940d49a3cSMatthew Wilcox (Oracle)map_pages:	maybe-read
66040d49a3cSMatthew Wilcox (Oracle)page_mkwrite:	read		can return with page locked
66140d49a3cSMatthew Wilcox (Oracle)pfn_mkwrite:	read
66240d49a3cSMatthew Wilcox (Oracle)access:		read
66340d49a3cSMatthew Wilcox (Oracle)=============	==========	===========================
664ec23eb54SMauro Carvalho Chehab
665730633f0SJan Kara->fault() is called when a previously not present pte is about to be faulted
666730633f0SJan Karain. The filesystem must find and return the page associated with the passed in
667730633f0SJan Kara"pgoff" in the vm_fault structure. If it is possible that the page may be
668730633f0SJan Karatruncated and/or invalidated, then the filesystem must lock invalidate_lock,
669730633f0SJan Karathen ensure the page is not already truncated (invalidate_lock will block
670ec23eb54SMauro Carvalho Chehabsubsequent truncate), and then return with VM_FAULT_LOCKED, and the page
671ec23eb54SMauro Carvalho Chehablocked. The VM will unlock the page.
672ec23eb54SMauro Carvalho Chehab
67340d49a3cSMatthew Wilcox (Oracle)->huge_fault() is called when there is no PUD or PMD entry present.  This
67440d49a3cSMatthew Wilcox (Oracle)gives the filesystem the opportunity to install a PUD or PMD sized page.
67540d49a3cSMatthew Wilcox (Oracle)Filesystems can also use the ->fault method to return a PMD sized page,
67640d49a3cSMatthew Wilcox (Oracle)so implementing this function may not be necessary.  In particular,
67740d49a3cSMatthew Wilcox (Oracle)filesystems should not call filemap_fault() from ->huge_fault().
67840d49a3cSMatthew Wilcox (Oracle)The mmap_lock may not be held when this method is called.
67940d49a3cSMatthew Wilcox (Oracle)
680ec23eb54SMauro Carvalho Chehab->map_pages() is called when VM asks to map easy accessible pages.
681ec23eb54SMauro Carvalho ChehabFilesystem should find and map pages associated with offsets from "start_pgoff"
68258ef47efSMatthew Wilcox (Oracle)till "end_pgoff". ->map_pages() is called with the RCU lock held and must
683ec23eb54SMauro Carvalho Chehabnot block.  If it's not possible to reach a page without blocking,
6843bd786f7SYin Fengweifilesystem should skip it. Filesystem should use set_pte_range() to setup
685ec23eb54SMauro Carvalho Chehabpage table entry. Pointer to entry associated with the page is passed in
686ec23eb54SMauro Carvalho Chehab"pte" field in vm_fault structure. Pointers to entries for other offsets
687ec23eb54SMauro Carvalho Chehabshould be calculated relative to "pte".
688ec23eb54SMauro Carvalho Chehab
689730633f0SJan Kara->page_mkwrite() is called when a previously read-only pte is about to become
690730633f0SJan Karawriteable. The filesystem again must ensure that there are no
691730633f0SJan Karatruncate/invalidate races or races with operations such as ->remap_file_range
692730633f0SJan Karaor ->copy_file_range, and then return with the page locked. Usually
693730633f0SJan Karamapping->invalidate_lock is suitable for proper serialization. If the page has
694730633f0SJan Karabeen truncated, the filesystem should not look up a new page like the ->fault()
695730633f0SJan Karahandler, but simply return with VM_FAULT_NOPAGE, which will cause the VM to
696730633f0SJan Kararetry the fault.
697ec23eb54SMauro Carvalho Chehab
698ec23eb54SMauro Carvalho Chehab->pfn_mkwrite() is the same as page_mkwrite but when the pte is
699ec23eb54SMauro Carvalho ChehabVM_PFNMAP or VM_MIXEDMAP with a page-less entry. Expected return is
700ec23eb54SMauro Carvalho ChehabVM_FAULT_NOPAGE. Or one of the VM_FAULT_ERROR types. The default behavior
701ec23eb54SMauro Carvalho Chehabafter this call is to make the pte read-write, unless pfn_mkwrite returns
702ec23eb54SMauro Carvalho Chehaban error.
703ec23eb54SMauro Carvalho Chehab
704ec23eb54SMauro Carvalho Chehab->access() is called when get_user_pages() fails in
705ec23eb54SMauro Carvalho Chehabaccess_process_vm(), typically used to debug a process through
706ec23eb54SMauro Carvalho Chehab/proc/pid/mem or ptrace.  This function is needed only for
707ec23eb54SMauro Carvalho ChehabVM_IO | VM_PFNMAP VMAs.
708ec23eb54SMauro Carvalho Chehab
709ec23eb54SMauro Carvalho Chehab--------------------------------------------------------------------------------
710ec23eb54SMauro Carvalho Chehab
711ec23eb54SMauro Carvalho Chehab			Dubious stuff
712ec23eb54SMauro Carvalho Chehab
713ec23eb54SMauro Carvalho Chehab(if you break something or notice that it is broken and do not fix it yourself
714ec23eb54SMauro Carvalho Chehab- at least put it here)
715