xref: /openbmc/linux/Documentation/filesystems/nfs/exporting.rst (revision ec23eb54fbc7a07405d416d77e8115e575ce3adc)
1*ec23eb54SMauro Carvalho Chehab:orphan:
2*ec23eb54SMauro Carvalho Chehab
3*ec23eb54SMauro Carvalho ChehabMaking Filesystems Exportable
4*ec23eb54SMauro Carvalho Chehab=============================
5*ec23eb54SMauro Carvalho Chehab
6*ec23eb54SMauro Carvalho ChehabOverview
7*ec23eb54SMauro Carvalho Chehab--------
8*ec23eb54SMauro Carvalho Chehab
9*ec23eb54SMauro Carvalho ChehabAll filesystem operations require a dentry (or two) as a starting
10*ec23eb54SMauro Carvalho Chehabpoint.  Local applications have a reference-counted hold on suitable
11*ec23eb54SMauro Carvalho Chehabdentries via open file descriptors or cwd/root.  However remote
12*ec23eb54SMauro Carvalho Chehabapplications that access a filesystem via a remote filesystem protocol
13*ec23eb54SMauro Carvalho Chehabsuch as NFS may not be able to hold such a reference, and so need a
14*ec23eb54SMauro Carvalho Chehabdifferent way to refer to a particular dentry.  As the alternative
15*ec23eb54SMauro Carvalho Chehabform of reference needs to be stable across renames, truncates, and
16*ec23eb54SMauro Carvalho Chehabserver-reboot (among other things, though these tend to be the most
17*ec23eb54SMauro Carvalho Chehabproblematic), there is no simple answer like 'filename'.
18*ec23eb54SMauro Carvalho Chehab
19*ec23eb54SMauro Carvalho ChehabThe mechanism discussed here allows each filesystem implementation to
20*ec23eb54SMauro Carvalho Chehabspecify how to generate an opaque (outside of the filesystem) byte
21*ec23eb54SMauro Carvalho Chehabstring for any dentry, and how to find an appropriate dentry for any
22*ec23eb54SMauro Carvalho Chehabgiven opaque byte string.
23*ec23eb54SMauro Carvalho ChehabThis byte string will be called a "filehandle fragment" as it
24*ec23eb54SMauro Carvalho Chehabcorresponds to part of an NFS filehandle.
25*ec23eb54SMauro Carvalho Chehab
26*ec23eb54SMauro Carvalho ChehabA filesystem which supports the mapping between filehandle fragments
27*ec23eb54SMauro Carvalho Chehaband dentries will be termed "exportable".
28*ec23eb54SMauro Carvalho Chehab
29*ec23eb54SMauro Carvalho Chehab
30*ec23eb54SMauro Carvalho Chehab
31*ec23eb54SMauro Carvalho ChehabDcache Issues
32*ec23eb54SMauro Carvalho Chehab-------------
33*ec23eb54SMauro Carvalho Chehab
34*ec23eb54SMauro Carvalho ChehabThe dcache normally contains a proper prefix of any given filesystem
35*ec23eb54SMauro Carvalho Chehabtree.  This means that if any filesystem object is in the dcache, then
36*ec23eb54SMauro Carvalho Chehaball of the ancestors of that filesystem object are also in the dcache.
37*ec23eb54SMauro Carvalho ChehabAs normal access is by filename this prefix is created naturally and
38*ec23eb54SMauro Carvalho Chehabmaintained easily (by each object maintaining a reference count on
39*ec23eb54SMauro Carvalho Chehabits parent).
40*ec23eb54SMauro Carvalho Chehab
41*ec23eb54SMauro Carvalho ChehabHowever when objects are included into the dcache by interpreting a
42*ec23eb54SMauro Carvalho Chehabfilehandle fragment, there is no automatic creation of a path prefix
43*ec23eb54SMauro Carvalho Chehabfor the object.  This leads to two related but distinct features of
44*ec23eb54SMauro Carvalho Chehabthe dcache that are not needed for normal filesystem access.
45*ec23eb54SMauro Carvalho Chehab
46*ec23eb54SMauro Carvalho Chehab1. The dcache must sometimes contain objects that are not part of the
47*ec23eb54SMauro Carvalho Chehab   proper prefix. i.e that are not connected to the root.
48*ec23eb54SMauro Carvalho Chehab2. The dcache must be prepared for a newly found (via ->lookup) directory
49*ec23eb54SMauro Carvalho Chehab   to already have a (non-connected) dentry, and must be able to move
50*ec23eb54SMauro Carvalho Chehab   that dentry into place (based on the parent and name in the
51*ec23eb54SMauro Carvalho Chehab   ->lookup).   This is particularly needed for directories as
52*ec23eb54SMauro Carvalho Chehab   it is a dcache invariant that directories only have one dentry.
53*ec23eb54SMauro Carvalho Chehab
54*ec23eb54SMauro Carvalho ChehabTo implement these features, the dcache has:
55*ec23eb54SMauro Carvalho Chehab
56*ec23eb54SMauro Carvalho Chehaba. A dentry flag DCACHE_DISCONNECTED which is set on
57*ec23eb54SMauro Carvalho Chehab   any dentry that might not be part of the proper prefix.
58*ec23eb54SMauro Carvalho Chehab   This is set when anonymous dentries are created, and cleared when a
59*ec23eb54SMauro Carvalho Chehab   dentry is noticed to be a child of a dentry which is in the proper
60*ec23eb54SMauro Carvalho Chehab   prefix.  If the refcount on a dentry with this flag set
61*ec23eb54SMauro Carvalho Chehab   becomes zero, the dentry is immediately discarded, rather than being
62*ec23eb54SMauro Carvalho Chehab   kept in the dcache.  If a dentry that is not already in the dcache
63*ec23eb54SMauro Carvalho Chehab   is repeatedly accessed by filehandle (as NFSD might do), an new dentry
64*ec23eb54SMauro Carvalho Chehab   will be a allocated for each access, and discarded at the end of
65*ec23eb54SMauro Carvalho Chehab   the access.
66*ec23eb54SMauro Carvalho Chehab
67*ec23eb54SMauro Carvalho Chehab   Note that such a dentry can acquire children, name, ancestors, etc.
68*ec23eb54SMauro Carvalho Chehab   without losing DCACHE_DISCONNECTED - that flag is only cleared when
69*ec23eb54SMauro Carvalho Chehab   subtree is successfully reconnected to root.  Until then dentries
70*ec23eb54SMauro Carvalho Chehab   in such subtree are retained only as long as there are references;
71*ec23eb54SMauro Carvalho Chehab   refcount reaching zero means immediate eviction, same as for unhashed
72*ec23eb54SMauro Carvalho Chehab   dentries.  That guarantees that we won't need to hunt them down upon
73*ec23eb54SMauro Carvalho Chehab   umount.
74*ec23eb54SMauro Carvalho Chehab
75*ec23eb54SMauro Carvalho Chehabb. A primitive for creation of secondary roots - d_obtain_root(inode).
76*ec23eb54SMauro Carvalho Chehab   Those do _not_ bear DCACHE_DISCONNECTED.  They are placed on the
77*ec23eb54SMauro Carvalho Chehab   per-superblock list (->s_roots), so they can be located at umount
78*ec23eb54SMauro Carvalho Chehab   time for eviction purposes.
79*ec23eb54SMauro Carvalho Chehab
80*ec23eb54SMauro Carvalho Chehabc. Helper routines to allocate anonymous dentries, and to help attach
81*ec23eb54SMauro Carvalho Chehab   loose directory dentries at lookup time. They are:
82*ec23eb54SMauro Carvalho Chehab
83*ec23eb54SMauro Carvalho Chehab    d_obtain_alias(inode) will return a dentry for the given inode.
84*ec23eb54SMauro Carvalho Chehab      If the inode already has a dentry, one of those is returned.
85*ec23eb54SMauro Carvalho Chehab
86*ec23eb54SMauro Carvalho Chehab      If it doesn't, a new anonymous (IS_ROOT and
87*ec23eb54SMauro Carvalho Chehab      DCACHE_DISCONNECTED) dentry is allocated and attached.
88*ec23eb54SMauro Carvalho Chehab
89*ec23eb54SMauro Carvalho Chehab      In the case of a directory, care is taken that only one dentry
90*ec23eb54SMauro Carvalho Chehab      can ever be attached.
91*ec23eb54SMauro Carvalho Chehab
92*ec23eb54SMauro Carvalho Chehab    d_splice_alias(inode, dentry) will introduce a new dentry into the tree;
93*ec23eb54SMauro Carvalho Chehab      either the passed-in dentry or a preexisting alias for the given inode
94*ec23eb54SMauro Carvalho Chehab      (such as an anonymous one created by d_obtain_alias), if appropriate.
95*ec23eb54SMauro Carvalho Chehab      It returns NULL when the passed-in dentry is used, following the calling
96*ec23eb54SMauro Carvalho Chehab      convention of ->lookup.
97*ec23eb54SMauro Carvalho Chehab
98*ec23eb54SMauro Carvalho ChehabFilesystem Issues
99*ec23eb54SMauro Carvalho Chehab-----------------
100*ec23eb54SMauro Carvalho Chehab
101*ec23eb54SMauro Carvalho ChehabFor a filesystem to be exportable it must:
102*ec23eb54SMauro Carvalho Chehab
103*ec23eb54SMauro Carvalho Chehab   1. provide the filehandle fragment routines described below.
104*ec23eb54SMauro Carvalho Chehab   2. make sure that d_splice_alias is used rather than d_add
105*ec23eb54SMauro Carvalho Chehab      when ->lookup finds an inode for a given parent and name.
106*ec23eb54SMauro Carvalho Chehab
107*ec23eb54SMauro Carvalho Chehab      If inode is NULL, d_splice_alias(inode, dentry) is equivalent to::
108*ec23eb54SMauro Carvalho Chehab
109*ec23eb54SMauro Carvalho Chehab		d_add(dentry, inode), NULL
110*ec23eb54SMauro Carvalho Chehab
111*ec23eb54SMauro Carvalho Chehab      Similarly, d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err)
112*ec23eb54SMauro Carvalho Chehab
113*ec23eb54SMauro Carvalho Chehab      Typically the ->lookup routine will simply end with a::
114*ec23eb54SMauro Carvalho Chehab
115*ec23eb54SMauro Carvalho Chehab		return d_splice_alias(inode, dentry);
116*ec23eb54SMauro Carvalho Chehab	}
117*ec23eb54SMauro Carvalho Chehab
118*ec23eb54SMauro Carvalho Chehab
119*ec23eb54SMauro Carvalho Chehab
120*ec23eb54SMauro Carvalho ChehabA file system implementation declares that instances of the filesystem
121*ec23eb54SMauro Carvalho Chehabare exportable by setting the s_export_op field in the struct
122*ec23eb54SMauro Carvalho Chehabsuper_block.  This field must point to a "struct export_operations"
123*ec23eb54SMauro Carvalho Chehabstruct which has the following members:
124*ec23eb54SMauro Carvalho Chehab
125*ec23eb54SMauro Carvalho Chehab encode_fh  (optional)
126*ec23eb54SMauro Carvalho Chehab    Takes a dentry and creates a filehandle fragment which can later be used
127*ec23eb54SMauro Carvalho Chehab    to find or create a dentry for the same object.  The default
128*ec23eb54SMauro Carvalho Chehab    implementation creates a filehandle fragment that encodes a 32bit inode
129*ec23eb54SMauro Carvalho Chehab    and generation number for the inode encoded, and if necessary the
130*ec23eb54SMauro Carvalho Chehab    same information for the parent.
131*ec23eb54SMauro Carvalho Chehab
132*ec23eb54SMauro Carvalho Chehab  fh_to_dentry (mandatory)
133*ec23eb54SMauro Carvalho Chehab    Given a filehandle fragment, this should find the implied object and
134*ec23eb54SMauro Carvalho Chehab    create a dentry for it (possibly with d_obtain_alias).
135*ec23eb54SMauro Carvalho Chehab
136*ec23eb54SMauro Carvalho Chehab  fh_to_parent (optional but strongly recommended)
137*ec23eb54SMauro Carvalho Chehab    Given a filehandle fragment, this should find the parent of the
138*ec23eb54SMauro Carvalho Chehab    implied object and create a dentry for it (possibly with
139*ec23eb54SMauro Carvalho Chehab    d_obtain_alias).  May fail if the filehandle fragment is too small.
140*ec23eb54SMauro Carvalho Chehab
141*ec23eb54SMauro Carvalho Chehab  get_parent (optional but strongly recommended)
142*ec23eb54SMauro Carvalho Chehab    When given a dentry for a directory, this should return  a dentry for
143*ec23eb54SMauro Carvalho Chehab    the parent.  Quite possibly the parent dentry will have been allocated
144*ec23eb54SMauro Carvalho Chehab    by d_alloc_anon.  The default get_parent function just returns an error
145*ec23eb54SMauro Carvalho Chehab    so any filehandle lookup that requires finding a parent will fail.
146*ec23eb54SMauro Carvalho Chehab    ->lookup("..") is *not* used as a default as it can leave ".." entries
147*ec23eb54SMauro Carvalho Chehab    in the dcache which are too messy to work with.
148*ec23eb54SMauro Carvalho Chehab
149*ec23eb54SMauro Carvalho Chehab  get_name (optional)
150*ec23eb54SMauro Carvalho Chehab    When given a parent dentry and a child dentry, this should find a name
151*ec23eb54SMauro Carvalho Chehab    in the directory identified by the parent dentry, which leads to the
152*ec23eb54SMauro Carvalho Chehab    object identified by the child dentry.  If no get_name function is
153*ec23eb54SMauro Carvalho Chehab    supplied, a default implementation is provided which uses vfs_readdir
154*ec23eb54SMauro Carvalho Chehab    to find potential names, and matches inode numbers to find the correct
155*ec23eb54SMauro Carvalho Chehab    match.
156*ec23eb54SMauro Carvalho Chehab
157*ec23eb54SMauro Carvalho Chehab
158*ec23eb54SMauro Carvalho ChehabA filehandle fragment consists of an array of 1 or more 4byte words,
159*ec23eb54SMauro Carvalho Chehabtogether with a one byte "type".
160*ec23eb54SMauro Carvalho ChehabThe decode_fh routine should not depend on the stated size that is
161*ec23eb54SMauro Carvalho Chehabpassed to it.  This size may be larger than the original filehandle
162*ec23eb54SMauro Carvalho Chehabgenerated by encode_fh, in which case it will have been padded with
163*ec23eb54SMauro Carvalho Chehabnuls.  Rather, the encode_fh routine should choose a "type" which
164*ec23eb54SMauro Carvalho Chehabindicates the decode_fh how much of the filehandle is valid, and how
165*ec23eb54SMauro Carvalho Chehabit should be interpreted.
166