1*ec23eb54SMauro Carvalho Chehab:orphan: 2*ec23eb54SMauro Carvalho Chehab 3*ec23eb54SMauro Carvalho ChehabMaking Filesystems Exportable 4*ec23eb54SMauro Carvalho Chehab============================= 5*ec23eb54SMauro Carvalho Chehab 6*ec23eb54SMauro Carvalho ChehabOverview 7*ec23eb54SMauro Carvalho Chehab-------- 8*ec23eb54SMauro Carvalho Chehab 9*ec23eb54SMauro Carvalho ChehabAll filesystem operations require a dentry (or two) as a starting 10*ec23eb54SMauro Carvalho Chehabpoint. Local applications have a reference-counted hold on suitable 11*ec23eb54SMauro Carvalho Chehabdentries via open file descriptors or cwd/root. However remote 12*ec23eb54SMauro Carvalho Chehabapplications that access a filesystem via a remote filesystem protocol 13*ec23eb54SMauro Carvalho Chehabsuch as NFS may not be able to hold such a reference, and so need a 14*ec23eb54SMauro Carvalho Chehabdifferent way to refer to a particular dentry. As the alternative 15*ec23eb54SMauro Carvalho Chehabform of reference needs to be stable across renames, truncates, and 16*ec23eb54SMauro Carvalho Chehabserver-reboot (among other things, though these tend to be the most 17*ec23eb54SMauro Carvalho Chehabproblematic), there is no simple answer like 'filename'. 18*ec23eb54SMauro Carvalho Chehab 19*ec23eb54SMauro Carvalho ChehabThe mechanism discussed here allows each filesystem implementation to 20*ec23eb54SMauro Carvalho Chehabspecify how to generate an opaque (outside of the filesystem) byte 21*ec23eb54SMauro Carvalho Chehabstring for any dentry, and how to find an appropriate dentry for any 22*ec23eb54SMauro Carvalho Chehabgiven opaque byte string. 23*ec23eb54SMauro Carvalho ChehabThis byte string will be called a "filehandle fragment" as it 24*ec23eb54SMauro Carvalho Chehabcorresponds to part of an NFS filehandle. 25*ec23eb54SMauro Carvalho Chehab 26*ec23eb54SMauro Carvalho ChehabA filesystem which supports the mapping between filehandle fragments 27*ec23eb54SMauro Carvalho Chehaband dentries will be termed "exportable". 28*ec23eb54SMauro Carvalho Chehab 29*ec23eb54SMauro Carvalho Chehab 30*ec23eb54SMauro Carvalho Chehab 31*ec23eb54SMauro Carvalho ChehabDcache Issues 32*ec23eb54SMauro Carvalho Chehab------------- 33*ec23eb54SMauro Carvalho Chehab 34*ec23eb54SMauro Carvalho ChehabThe dcache normally contains a proper prefix of any given filesystem 35*ec23eb54SMauro Carvalho Chehabtree. This means that if any filesystem object is in the dcache, then 36*ec23eb54SMauro Carvalho Chehaball of the ancestors of that filesystem object are also in the dcache. 37*ec23eb54SMauro Carvalho ChehabAs normal access is by filename this prefix is created naturally and 38*ec23eb54SMauro Carvalho Chehabmaintained easily (by each object maintaining a reference count on 39*ec23eb54SMauro Carvalho Chehabits parent). 40*ec23eb54SMauro Carvalho Chehab 41*ec23eb54SMauro Carvalho ChehabHowever when objects are included into the dcache by interpreting a 42*ec23eb54SMauro Carvalho Chehabfilehandle fragment, there is no automatic creation of a path prefix 43*ec23eb54SMauro Carvalho Chehabfor the object. This leads to two related but distinct features of 44*ec23eb54SMauro Carvalho Chehabthe dcache that are not needed for normal filesystem access. 45*ec23eb54SMauro Carvalho Chehab 46*ec23eb54SMauro Carvalho Chehab1. The dcache must sometimes contain objects that are not part of the 47*ec23eb54SMauro Carvalho Chehab proper prefix. i.e that are not connected to the root. 48*ec23eb54SMauro Carvalho Chehab2. The dcache must be prepared for a newly found (via ->lookup) directory 49*ec23eb54SMauro Carvalho Chehab to already have a (non-connected) dentry, and must be able to move 50*ec23eb54SMauro Carvalho Chehab that dentry into place (based on the parent and name in the 51*ec23eb54SMauro Carvalho Chehab ->lookup). This is particularly needed for directories as 52*ec23eb54SMauro Carvalho Chehab it is a dcache invariant that directories only have one dentry. 53*ec23eb54SMauro Carvalho Chehab 54*ec23eb54SMauro Carvalho ChehabTo implement these features, the dcache has: 55*ec23eb54SMauro Carvalho Chehab 56*ec23eb54SMauro Carvalho Chehaba. A dentry flag DCACHE_DISCONNECTED which is set on 57*ec23eb54SMauro Carvalho Chehab any dentry that might not be part of the proper prefix. 58*ec23eb54SMauro Carvalho Chehab This is set when anonymous dentries are created, and cleared when a 59*ec23eb54SMauro Carvalho Chehab dentry is noticed to be a child of a dentry which is in the proper 60*ec23eb54SMauro Carvalho Chehab prefix. If the refcount on a dentry with this flag set 61*ec23eb54SMauro Carvalho Chehab becomes zero, the dentry is immediately discarded, rather than being 62*ec23eb54SMauro Carvalho Chehab kept in the dcache. If a dentry that is not already in the dcache 63*ec23eb54SMauro Carvalho Chehab is repeatedly accessed by filehandle (as NFSD might do), an new dentry 64*ec23eb54SMauro Carvalho Chehab will be a allocated for each access, and discarded at the end of 65*ec23eb54SMauro Carvalho Chehab the access. 66*ec23eb54SMauro Carvalho Chehab 67*ec23eb54SMauro Carvalho Chehab Note that such a dentry can acquire children, name, ancestors, etc. 68*ec23eb54SMauro Carvalho Chehab without losing DCACHE_DISCONNECTED - that flag is only cleared when 69*ec23eb54SMauro Carvalho Chehab subtree is successfully reconnected to root. Until then dentries 70*ec23eb54SMauro Carvalho Chehab in such subtree are retained only as long as there are references; 71*ec23eb54SMauro Carvalho Chehab refcount reaching zero means immediate eviction, same as for unhashed 72*ec23eb54SMauro Carvalho Chehab dentries. That guarantees that we won't need to hunt them down upon 73*ec23eb54SMauro Carvalho Chehab umount. 74*ec23eb54SMauro Carvalho Chehab 75*ec23eb54SMauro Carvalho Chehabb. A primitive for creation of secondary roots - d_obtain_root(inode). 76*ec23eb54SMauro Carvalho Chehab Those do _not_ bear DCACHE_DISCONNECTED. They are placed on the 77*ec23eb54SMauro Carvalho Chehab per-superblock list (->s_roots), so they can be located at umount 78*ec23eb54SMauro Carvalho Chehab time for eviction purposes. 79*ec23eb54SMauro Carvalho Chehab 80*ec23eb54SMauro Carvalho Chehabc. Helper routines to allocate anonymous dentries, and to help attach 81*ec23eb54SMauro Carvalho Chehab loose directory dentries at lookup time. They are: 82*ec23eb54SMauro Carvalho Chehab 83*ec23eb54SMauro Carvalho Chehab d_obtain_alias(inode) will return a dentry for the given inode. 84*ec23eb54SMauro Carvalho Chehab If the inode already has a dentry, one of those is returned. 85*ec23eb54SMauro Carvalho Chehab 86*ec23eb54SMauro Carvalho Chehab If it doesn't, a new anonymous (IS_ROOT and 87*ec23eb54SMauro Carvalho Chehab DCACHE_DISCONNECTED) dentry is allocated and attached. 88*ec23eb54SMauro Carvalho Chehab 89*ec23eb54SMauro Carvalho Chehab In the case of a directory, care is taken that only one dentry 90*ec23eb54SMauro Carvalho Chehab can ever be attached. 91*ec23eb54SMauro Carvalho Chehab 92*ec23eb54SMauro Carvalho Chehab d_splice_alias(inode, dentry) will introduce a new dentry into the tree; 93*ec23eb54SMauro Carvalho Chehab either the passed-in dentry or a preexisting alias for the given inode 94*ec23eb54SMauro Carvalho Chehab (such as an anonymous one created by d_obtain_alias), if appropriate. 95*ec23eb54SMauro Carvalho Chehab It returns NULL when the passed-in dentry is used, following the calling 96*ec23eb54SMauro Carvalho Chehab convention of ->lookup. 97*ec23eb54SMauro Carvalho Chehab 98*ec23eb54SMauro Carvalho ChehabFilesystem Issues 99*ec23eb54SMauro Carvalho Chehab----------------- 100*ec23eb54SMauro Carvalho Chehab 101*ec23eb54SMauro Carvalho ChehabFor a filesystem to be exportable it must: 102*ec23eb54SMauro Carvalho Chehab 103*ec23eb54SMauro Carvalho Chehab 1. provide the filehandle fragment routines described below. 104*ec23eb54SMauro Carvalho Chehab 2. make sure that d_splice_alias is used rather than d_add 105*ec23eb54SMauro Carvalho Chehab when ->lookup finds an inode for a given parent and name. 106*ec23eb54SMauro Carvalho Chehab 107*ec23eb54SMauro Carvalho Chehab If inode is NULL, d_splice_alias(inode, dentry) is equivalent to:: 108*ec23eb54SMauro Carvalho Chehab 109*ec23eb54SMauro Carvalho Chehab d_add(dentry, inode), NULL 110*ec23eb54SMauro Carvalho Chehab 111*ec23eb54SMauro Carvalho Chehab Similarly, d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err) 112*ec23eb54SMauro Carvalho Chehab 113*ec23eb54SMauro Carvalho Chehab Typically the ->lookup routine will simply end with a:: 114*ec23eb54SMauro Carvalho Chehab 115*ec23eb54SMauro Carvalho Chehab return d_splice_alias(inode, dentry); 116*ec23eb54SMauro Carvalho Chehab } 117*ec23eb54SMauro Carvalho Chehab 118*ec23eb54SMauro Carvalho Chehab 119*ec23eb54SMauro Carvalho Chehab 120*ec23eb54SMauro Carvalho ChehabA file system implementation declares that instances of the filesystem 121*ec23eb54SMauro Carvalho Chehabare exportable by setting the s_export_op field in the struct 122*ec23eb54SMauro Carvalho Chehabsuper_block. This field must point to a "struct export_operations" 123*ec23eb54SMauro Carvalho Chehabstruct which has the following members: 124*ec23eb54SMauro Carvalho Chehab 125*ec23eb54SMauro Carvalho Chehab encode_fh (optional) 126*ec23eb54SMauro Carvalho Chehab Takes a dentry and creates a filehandle fragment which can later be used 127*ec23eb54SMauro Carvalho Chehab to find or create a dentry for the same object. The default 128*ec23eb54SMauro Carvalho Chehab implementation creates a filehandle fragment that encodes a 32bit inode 129*ec23eb54SMauro Carvalho Chehab and generation number for the inode encoded, and if necessary the 130*ec23eb54SMauro Carvalho Chehab same information for the parent. 131*ec23eb54SMauro Carvalho Chehab 132*ec23eb54SMauro Carvalho Chehab fh_to_dentry (mandatory) 133*ec23eb54SMauro Carvalho Chehab Given a filehandle fragment, this should find the implied object and 134*ec23eb54SMauro Carvalho Chehab create a dentry for it (possibly with d_obtain_alias). 135*ec23eb54SMauro Carvalho Chehab 136*ec23eb54SMauro Carvalho Chehab fh_to_parent (optional but strongly recommended) 137*ec23eb54SMauro Carvalho Chehab Given a filehandle fragment, this should find the parent of the 138*ec23eb54SMauro Carvalho Chehab implied object and create a dentry for it (possibly with 139*ec23eb54SMauro Carvalho Chehab d_obtain_alias). May fail if the filehandle fragment is too small. 140*ec23eb54SMauro Carvalho Chehab 141*ec23eb54SMauro Carvalho Chehab get_parent (optional but strongly recommended) 142*ec23eb54SMauro Carvalho Chehab When given a dentry for a directory, this should return a dentry for 143*ec23eb54SMauro Carvalho Chehab the parent. Quite possibly the parent dentry will have been allocated 144*ec23eb54SMauro Carvalho Chehab by d_alloc_anon. The default get_parent function just returns an error 145*ec23eb54SMauro Carvalho Chehab so any filehandle lookup that requires finding a parent will fail. 146*ec23eb54SMauro Carvalho Chehab ->lookup("..") is *not* used as a default as it can leave ".." entries 147*ec23eb54SMauro Carvalho Chehab in the dcache which are too messy to work with. 148*ec23eb54SMauro Carvalho Chehab 149*ec23eb54SMauro Carvalho Chehab get_name (optional) 150*ec23eb54SMauro Carvalho Chehab When given a parent dentry and a child dentry, this should find a name 151*ec23eb54SMauro Carvalho Chehab in the directory identified by the parent dentry, which leads to the 152*ec23eb54SMauro Carvalho Chehab object identified by the child dentry. If no get_name function is 153*ec23eb54SMauro Carvalho Chehab supplied, a default implementation is provided which uses vfs_readdir 154*ec23eb54SMauro Carvalho Chehab to find potential names, and matches inode numbers to find the correct 155*ec23eb54SMauro Carvalho Chehab match. 156*ec23eb54SMauro Carvalho Chehab 157*ec23eb54SMauro Carvalho Chehab 158*ec23eb54SMauro Carvalho ChehabA filehandle fragment consists of an array of 1 or more 4byte words, 159*ec23eb54SMauro Carvalho Chehabtogether with a one byte "type". 160*ec23eb54SMauro Carvalho ChehabThe decode_fh routine should not depend on the stated size that is 161*ec23eb54SMauro Carvalho Chehabpassed to it. This size may be larger than the original filehandle 162*ec23eb54SMauro Carvalho Chehabgenerated by encode_fh, in which case it will have been padded with 163*ec23eb54SMauro Carvalho Chehabnuls. Rather, the encode_fh routine should choose a "type" which 164*ec23eb54SMauro Carvalho Chehabindicates the decode_fh how much of the filehandle is valid, and how 165*ec23eb54SMauro Carvalho Chehabit should be interpreted. 166