xref: /openbmc/linux/Documentation/filesystems/autofs.rst (revision 2612e3bbc0386368a850140a6c9b990cd496a5ec)
1=====================
2autofs - how it works
3=====================
4
5Purpose
6=======
7
8The goal of autofs is to provide on-demand mounting and race free
9automatic unmounting of various other filesystems.  This provides two
10key advantages:
11
121. There is no need to delay boot until all filesystems that
13   might be needed are mounted.  Processes that try to access those
14   slow filesystems might be delayed but other processes can
15   continue freely.  This is particularly important for
16   network filesystems (e.g. NFS) or filesystems stored on
17   media with a media-changing robot.
18
192. The names and locations of filesystems can be stored in
20   a remote database and can change at any time.  The content
21   in that data base at the time of access will be used to provide
22   a target for the access.  The interpretation of names in the
23   filesystem can even be programmatic rather than database-backed,
24   allowing wildcards for example, and can vary based on the user who
25   first accessed a name.
26
27Context
28=======
29
30The "autofs" filesystem module is only one part of an autofs system.
31There also needs to be a user-space program which looks up names
32and mounts filesystems.  This will often be the "automount" program,
33though other tools including "systemd" can make use of "autofs".
34This document describes only the kernel module and the interactions
35required with any user-space program.  Subsequent text refers to this
36as the "automount daemon" or simply "the daemon".
37
38"autofs" is a Linux kernel module which provides the "autofs"
39filesystem type.  Several "autofs" filesystems can be mounted and they
40can each be managed separately, or all managed by the same daemon.
41
42Content
43=======
44
45An autofs filesystem can contain 3 sorts of objects: directories,
46symbolic links and mount traps.  Mount traps are directories with
47extra properties as described in the next section.
48
49Objects can only be created by the automount daemon: symlinks are
50created with a regular `symlink` system call, while directories and
51mount traps are created with `mkdir`.  The determination of whether a
52directory should be a mount trap is based on a master map. This master
53map is consulted by autofs to determine which directories are mount
54points. Mount points can be *direct*/*indirect*/*offset*.
55On most systems, the default master map is located at */etc/auto.master*.
56
57If neither the *direct* or *offset* mount options are given (so the
58mount is considered to be *indirect*), then the root directory is
59always a regular directory, otherwise it is a mount trap when it is
60empty and a regular directory when not empty.  Note that *direct* and
61*offset* are treated identically so a concise summary is that the root
62directory is a mount trap only if the filesystem is mounted *direct*
63and the root is empty.
64
65Directories created in the root directory are mount traps only if the
66filesystem is mounted *indirect* and they are empty.
67
68Directories further down the tree depend on the *maxproto* mount
69option and particularly whether it is less than five or not.
70When *maxproto* is five, no directories further down the
71tree are ever mount traps, they are always regular directories.  When
72the *maxproto* is four (or three), these directories are mount traps
73precisely when they are empty.
74
75So: non-empty (i.e. non-leaf) directories are never mount traps. Empty
76directories are sometimes mount traps, and sometimes not depending on
77where in the tree they are (root, top level, or lower), the *maxproto*,
78and whether the mount was *indirect* or not.
79
80Mount Traps
81===========
82
83A core element of the implementation of autofs is the Mount Traps
84which are provided by the Linux VFS.  Any directory provided by a
85filesystem can be designated as a trap.  This involves two separate
86features that work together to allow autofs to do its job.
87
88**DCACHE_NEED_AUTOMOUNT**
89
90If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if
91the inode has S_AUTOMOUNT set, or can be set directly) then it is
92(potentially) a mount trap.  Any access to this directory beyond a
93"`stat`" will (normally) cause the `d_op->d_automount()` dentry operation
94to be called. The task of this method is to find the filesystem that
95should be mounted on the directory and to return it.  The VFS is
96responsible for actually mounting the root of this filesystem on the
97directory.
98
99autofs doesn't find the filesystem itself but sends a message to the
100automount daemon asking it to find and mount the filesystem.  The
101autofs `d_automount` method then waits for the daemon to report that
102everything is ready.  It will then return "`NULL`" indicating that the
103mount has already happened.  The VFS doesn't try to mount anything but
104follows down the mount that is already there.
105
106This functionality is sufficient for some users of mount traps such
107as NFS which creates traps so that mountpoints on the server can be
108reflected on the client.  However it is not sufficient for autofs.  As
109mounting onto a directory is considered to be "beyond a `stat`", the
110automount daemon would not be able to mount a filesystem on the 'trap'
111directory without some way to avoid getting caught in the trap.  For
112that purpose there is another flag.
113
114**DCACHE_MANAGE_TRANSIT**
115
116If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but
117related behaviours are invoked, both using the `d_op->d_manage()`
118dentry operation.
119
120Firstly, before checking to see if any filesystem is mounted on the
121directory, d_manage() will be called with the `rcu_walk` parameter set
122to `false`.  It may return one of three things:
123
124-  A return value of zero indicates that there is nothing special
125   about this dentry and normal checks for mounts and automounts
126   should proceed.
127
128   autofs normally returns zero, but first waits for any
129   expiry (automatic unmounting of the mounted filesystem) to
130   complete.  This avoids races.
131
132-  A return value of `-EISDIR` tells the VFS to ignore any mounts
133   on the directory and to not consider calling `->d_automount()`.
134   This effectively disables the **DCACHE_NEED_AUTOMOUNT** flag
135   causing the directory not be a mount trap after all.
136
137   autofs returns this if it detects that the process performing the
138   lookup is the automount daemon and that the mount has been
139   requested but has not yet completed.  How it determines this is
140   discussed later.  This allows the automount daemon not to get
141   caught in the mount trap.
142
143   There is a subtlety here.  It is possible that a second autofs
144   filesystem can be mounted below the first and for both of them to
145   be managed by the same daemon.  For the daemon to be able to mount
146   something on the second it must be able to "walk" down past the
147   first.  This means that d_manage cannot *always* return -EISDIR for
148   the automount daemon.  It must only return it when a mount has
149   been requested, but has not yet completed.
150
151   `d_manage` also returns `-EISDIR` if the dentry shouldn't be a
152   mount trap, either because it is a symbolic link or because it is
153   not empty.
154
155-  Any other negative value is treated as an error and returned
156   to the caller.
157
158   autofs can return
159
160   - -ENOENT if the automount daemon failed to mount anything,
161   - -ENOMEM if it ran out of memory,
162   - -EINTR if a signal arrived while waiting for expiry to
163     complete
164   - or any other error sent down by the automount daemon.
165
166
167The second use case only occurs during an "RCU-walk" and so `rcu_walk`
168will be set.
169
170An RCU-walk is a fast and lightweight process for walking down a
171filename path (i.e. it is like running on tip-toes).  RCU-walk cannot
172cope with all situations so when it finds a difficulty it falls back
173to "REF-walk", which is slower but more robust.
174
175RCU-walk will never call `->d_automount`; the filesystems must already
176be mounted or RCU-walk cannot handle the path.
177To determine if a mount-trap is safe for RCU-walk mode it calls
178`->d_manage()` with `rcu_walk` set to `true`.
179
180In this case `d_manage()` must avoid blocking and should avoid taking
181spinlocks if at all possible.  Its sole purpose is to determine if it
182would be safe to follow down into any mounted directory and the only
183reason that it might not be is if an expiry of the mount is
184underway.
185
186In the `rcu_walk` case, `d_manage()` cannot return -EISDIR to tell the
187VFS that this is a directory that doesn't require d_automount.  If
188`rcu_walk` sees a dentry with DCACHE_NEED_AUTOMOUNT set but nothing
189mounted, it *will* fall back to REF-walk.  `d_manage()` cannot make the
190VFS remain in RCU-walk mode, but can only tell it to get out of
191RCU-walk mode by returning `-ECHILD`.
192
193So `d_manage()`, when called with `rcu_walk` set, should either return
194-ECHILD if there is any reason to believe it is unsafe to enter the
195mounted filesystem, otherwise it should return 0.
196
197autofs will return `-ECHILD` if an expiry of the filesystem has been
198initiated or is being considered, otherwise it returns 0.
199
200
201Mountpoint expiry
202=================
203
204The VFS has a mechanism for automatically expiring unused mounts,
205much as it can expire any unused dentry information from the dcache.
206This is guided by the MNT_SHRINKABLE flag.  This only applies to
207mounts that were created by `d_automount()` returning a filesystem to be
208mounted.  As autofs doesn't return such a filesystem but leaves the
209mounting to the automount daemon, it must involve the automount daemon
210in unmounting as well.  This also means that autofs has more control
211over expiry.
212
213The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to
214the `umount` system call.  Unmounting with MNT_EXPIRE will fail unless
215a previous attempt had been made, and the filesystem has been inactive
216and untouched since that previous attempt.  autofs does not depend on
217this but has its own internal tracking of whether filesystems were
218recently used.  This allows individual names in the autofs directory
219to expire separately.
220
221With version 4 of the protocol, the automount daemon can try to
222unmount any filesystems mounted on the autofs filesystem or remove any
223symbolic links or empty directories any time it likes.  If the unmount
224or removal is successful the filesystem will be returned to the state
225it was before the mount or creation, so that any access of the name
226will trigger normal auto-mount processing.  In particular, `rmdir` and
227`unlink` do not leave negative entries in the dcache as a normal
228filesystem would, so an attempt to access a recently-removed object is
229passed to autofs for handling.
230
231With version 5, this is not safe except for unmounting from top-level
232directories.  As lower-level directories are never mount traps, other
233processes will see an empty directory as soon as the filesystem is
234unmounted.  So it is generally safest to use the autofs expiry
235protocol described below.
236
237Normally the daemon only wants to remove entries which haven't been
238used for a while.  For this purpose autofs maintains a "`last_used`"
239time stamp on each directory or symlink.  For symlinks it genuinely
240does record the last time the symlink was "used" or followed to find
241out where it points to.  For directories the field is used slightly
242differently.  The field is updated at mount time and during expire
243checks if it is found to be in use (ie. open file descriptor or
244process working directory) and during path walks. The update done
245during path walks prevents frequent expire and immediate mount of
246frequently accessed automounts. But in the case where a GUI continually
247access or an application frequently scans an autofs directory tree
248there can be an accumulation of mounts that aren't actually being
249used. To cater for this case the "`strictexpire`" autofs mount option
250can be used to avoid the "`last_used`" update on path walk thereby
251preventing this apparent inability to expire mounts that aren't
252really in use.
253
254The daemon is able to ask autofs if anything is due to be expired,
255using an `ioctl` as discussed later.  For a *direct* mount, autofs
256considers if the entire mount-tree can be unmounted or not.  For an
257*indirect* mount, autofs considers each of the names in the top level
258directory to determine if any of those can be unmounted and cleaned
259up.
260
261There is an option with indirect mounts to consider each of the leaves
262that has been mounted on instead of considering the top-level names.
263This was originally intended for compatibility with version 4 of autofs
264and should be considered as deprecated for Sun Format automount maps.
265However, it may be used again for amd format mount maps (which are
266generally indirect maps) because the amd automounter allows for the
267setting of an expire timeout for individual mounts. But there are
268some difficulties in making the needed changes for this.
269
270When autofs considers a directory it checks the `last_used` time and
271compares it with the "timeout" value set when the filesystem was
272mounted, though this check is ignored in some cases. It also checks if
273the directory or anything below it is in use.  For symbolic links,
274only the `last_used` time is ever considered.
275
276If both appear to support expiring the directory or symlink, an action
277is taken.
278
279There are two ways to ask autofs to consider expiry.  The first is to
280use the **AUTOFS_IOC_EXPIRE** ioctl.  This only works for indirect
281mounts.  If it finds something in the root directory to expire it will
282return the name of that thing.  Once a name has been returned the
283automount daemon needs to unmount any filesystems mounted below the
284name normally.  As described above, this is unsafe for non-toplevel
285mounts in a version-5 autofs.  For this reason the current `automount(8)`
286does not use this ioctl.
287
288The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or
289the **AUTOFS_IOC_EXPIRE_MULTI** ioctl.  This will work for both direct and
290indirect mounts.  If it selects an object to expire, it will notify
291the daemon using the notification mechanism described below.  This
292will block until the daemon acknowledges the expiry notification.
293This implies that the "`EXPIRE`" ioctl must be sent from a different
294thread than the one which handles notification.
295
296While the ioctl is blocking, the entry is marked as "expiring" and
297`d_manage` will block until the daemon affirms that the unmount has
298completed (together with removing any directories that might have been
299necessary), or has been aborted.
300
301Communicating with autofs: detecting the daemon
302===============================================
303
304There are several forms of communication between the automount daemon
305and the filesystem.  As we have already seen, the daemon can create and
306remove directories and symlinks using normal filesystem operations.
307autofs knows whether a process requesting some operation is the daemon
308or not based on its process-group id number (see getpgid(1)).
309
310When an autofs filesystem is mounted the pgid of the mounting
311processes is recorded unless the "pgrp=" option is given, in which
312case that number is recorded instead.  Any request arriving from a
313process in that process group is considered to come from the daemon.
314If the daemon ever has to be stopped and restarted a new pgid can be
315provided through an ioctl as will be described below.
316
317Communicating with autofs: the event pipe
318=========================================
319
320When an autofs filesystem is mounted, the 'write' end of a pipe must
321be passed using the 'fd=' mount option.  autofs will write
322notification messages to this pipe for the daemon to respond to.
323For version 5, the format of the message is::
324
325	struct autofs_v5_packet {
326		struct autofs_packet_hdr hdr;
327		autofs_wqt_t wait_queue_token;
328		__u32 dev;
329		__u64 ino;
330		__u32 uid;
331		__u32 gid;
332		__u32 pid;
333		__u32 tgid;
334		__u32 len;
335		char name[NAME_MAX+1];
336        };
337
338And the format of the header is::
339
340	struct autofs_packet_hdr {
341		int proto_version;		/* Protocol version */
342		int type;			/* Type of packet */
343	};
344
345where the type is one of ::
346
347	autofs_ptype_missing_indirect
348	autofs_ptype_expire_indirect
349	autofs_ptype_missing_direct
350	autofs_ptype_expire_direct
351
352so messages can indicate that a name is missing (something tried to
353access it but it isn't there) or that it has been selected for expiry.
354
355The pipe will be set to "packet mode" (equivalent to passing
356`O_DIRECT`) to _pipe2(2)_ so that a read from the pipe will return at
357most one packet, and any unread portion of a packet will be discarded.
358
359The `wait_queue_token` is a unique number which can identify a
360particular request to be acknowledged.  When a message is sent over
361the pipe the affected dentry is marked as either "active" or
362"expiring" and other accesses to it block until the message is
363acknowledged using one of the ioctls below with the relevant
364`wait_queue_token`.
365
366Communicating with autofs: root directory ioctls
367================================================
368
369The root directory of an autofs filesystem will respond to a number of
370ioctls.  The process issuing the ioctl must have the CAP_SYS_ADMIN
371capability, or must be the automount daemon.
372
373The available ioctl commands are:
374
375- **AUTOFS_IOC_READY**:
376	a notification has been handled.  The argument
377	to the ioctl command is the "wait_queue_token" number
378	corresponding to the notification being acknowledged.
379- **AUTOFS_IOC_FAIL**:
380	similar to above, but indicates failure with
381	the error code `ENOENT`.
382- **AUTOFS_IOC_CATATONIC**:
383	Causes the autofs to enter "catatonic"
384	mode meaning that it stops sending notifications to the daemon.
385	This mode is also entered if a write to the pipe fails.
386- **AUTOFS_IOC_PROTOVER**:
387	This returns the protocol version in use.
388- **AUTOFS_IOC_PROTOSUBVER**:
389	Returns the protocol sub-version which
390	is really a version number for the implementation.
391- **AUTOFS_IOC_SETTIMEOUT**:
392	This passes a pointer to an unsigned
393	long.  The value is used to set the timeout for expiry, and
394	the current timeout value is stored back through the pointer.
395- **AUTOFS_IOC_ASKUMOUNT**:
396	Returns, in the pointed-to `int`, 1 if
397	the filesystem could be unmounted.  This is only a hint as
398	the situation could change at any instant.  This call can be
399	used to avoid a more expensive full unmount attempt.
400- **AUTOFS_IOC_EXPIRE**:
401	as described above, this asks if there is
402	anything suitable to expire.  A pointer to a packet::
403
404		struct autofs_packet_expire_multi {
405			struct autofs_packet_hdr hdr;
406			autofs_wqt_t wait_queue_token;
407			int len;
408			char name[NAME_MAX+1];
409		};
410
411	is required.  This is filled in with the name of something
412	that can be unmounted or removed.  If nothing can be expired,
413	`errno` is set to `EAGAIN`.  Even though a `wait_queue_token`
414	is present in the structure, no "wait queue" is established
415	and no acknowledgment is needed.
416- **AUTOFS_IOC_EXPIRE_MULTI**:
417	This is similar to
418	**AUTOFS_IOC_EXPIRE** except that it causes notification to be
419	sent to the daemon, and it blocks until the daemon acknowledges.
420	The argument is an integer which can contain two different flags.
421
422	**AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored
423	and objects are expired if the are not in use.
424
425	**AUTOFS_EXP_FORCED** causes the in use status to be ignored
426	and objects are expired ieven if they are in use. This assumes
427	that the daemon has requested this because it is capable of
428	performing the umount.
429
430	**AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level
431	name to expire.  This is only safe when *maxproto* is 4.
432
433Communicating with autofs: char-device ioctls
434=============================================
435
436It is not always possible to open the root of an autofs filesystem,
437particularly a *direct* mounted filesystem.  If the automount daemon
438is restarted there is no way for it to regain control of existing
439mounts using any of the above communication channels.  To address this
440need there is a "miscellaneous" character device (major 10, minor 235)
441which can be used to communicate directly with the autofs filesystem.
442It requires CAP_SYS_ADMIN for access.
443
444The 'ioctl's that can be used on this device are described in a separate
445document `autofs-mount-control.txt`, and are summarised briefly here.
446Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure::
447
448        struct autofs_dev_ioctl {
449                __u32 ver_major;
450                __u32 ver_minor;
451                __u32 size;             /* total size of data passed in
452                                         * including this struct */
453                __s32 ioctlfd;          /* automount command fd */
454
455		/* Command parameters */
456		union {
457			struct args_protover		protover;
458			struct args_protosubver		protosubver;
459			struct args_openmount		openmount;
460			struct args_ready		ready;
461			struct args_fail		fail;
462			struct args_setpipefd		setpipefd;
463			struct args_timeout		timeout;
464			struct args_requester		requester;
465			struct args_expire		expire;
466			struct args_askumount		askumount;
467			struct args_ismountpoint	ismountpoint;
468		};
469
470                char path[];
471        };
472
473For the **OPEN_MOUNT** and **IS_MOUNTPOINT** commands, the target
474filesystem is identified by the `path`.  All other commands identify
475the filesystem by the `ioctlfd` which is a file descriptor open on the
476root, and which can be returned by **OPEN_MOUNT**.
477
478The `ver_major` and `ver_minor` are in/out parameters which check that
479the requested version is supported, and report the maximum version
480that the kernel module can support.
481
482Commands are:
483
484- **AUTOFS_DEV_IOCTL_VERSION_CMD**:
485	does nothing, except validate and
486	set version numbers.
487- **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**:
488	return an open file descriptor
489	on the root of an autofs filesystem.  The filesystem is identified
490	by name and device number, which is stored in `openmount.devid`.
491	Device numbers for existing filesystems can be found in
492	`/proc/self/mountinfo`.
493- **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**:
494	same as `close(ioctlfd)`.
495- **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**:
496	if the filesystem is in
497	catatonic mode, this can provide the write end of a new pipe
498	in `setpipefd.pipefd` to re-establish communication with a daemon.
499	The process group of the calling process is used to identify the
500	daemon.
501- **AUTOFS_DEV_IOCTL_REQUESTER_CMD**:
502	`path` should be a
503	name within the filesystem that has been auto-mounted on.
504	On successful return, `requester.uid` and `requester.gid` will be
505	the UID and GID of the process which triggered that mount.
506- **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**:
507	Check if path is a
508	mountpoint of a particular type - see separate documentation for
509	details.
510
511- **AUTOFS_DEV_IOCTL_PROTOVER_CMD**
512- **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD**
513- **AUTOFS_DEV_IOCTL_READY_CMD**
514- **AUTOFS_DEV_IOCTL_FAIL_CMD**
515- **AUTOFS_DEV_IOCTL_CATATONIC_CMD**
516- **AUTOFS_DEV_IOCTL_TIMEOUT_CMD**
517- **AUTOFS_DEV_IOCTL_EXPIRE_CMD**
518- **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD**
519
520These all have the same
521function as the similarly named **AUTOFS_IOC** ioctls, except
522that **FAIL** can be given an explicit error number in `fail.status`
523instead of assuming `ENOENT`, and this **EXPIRE** command
524corresponds to **AUTOFS_IOC_EXPIRE_MULTI**.
525
526Catatonic mode
527==============
528
529As mentioned, an autofs mount can enter "catatonic" mode.  This
530happens if a write to the notification pipe fails, or if it is
531explicitly requested by an `ioctl`.
532
533When entering catatonic mode, the pipe is closed and any pending
534notifications are acknowledged with the error `ENOENT`.
535
536Once in catatonic mode attempts to access non-existing names will
537result in `ENOENT` while attempts to access existing directories will
538be treated in the same way as if they came from the daemon, so mount
539traps will not fire.
540
541When the filesystem is mounted a _uid_ and _gid_ can be given which
542set the ownership of directories and symbolic links.  When the
543filesystem is in catatonic mode, any process with a matching UID can
544create directories or symlinks in the root directory, but not in other
545directories.
546
547Catatonic mode can only be left via the
548**AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`.
549
550The "ignore" mount option
551=========================
552
553The "ignore" mount option can be used to provide a generic indicator
554to applications that the mount entry should be ignored when displaying
555mount information.
556
557In other OSes that provide autofs and that provide a mount list to user
558space based on the kernel mount list a no-op mount option ("ignore" is
559the one use on the most common OSes) is allowed so that autofs file
560system users can optionally use it.
561
562This is intended to be used by user space programs to exclude autofs
563mounts from consideration when reading the mounts list.
564
565autofs, name spaces, and shared mounts
566======================================
567
568With bind mounts and name spaces it is possible for an autofs
569filesystem to appear at multiple places in one or more filesystem
570name spaces.  For this to work sensibly, the autofs filesystem should
571always be mounted "shared". e.g. ::
572
573	mount --make-shared /autofs/mount/point
574
575The automount daemon is only able to manage a single mount location for
576an autofs filesystem and if mounts on that are not 'shared', other
577locations will not behave as expected.  In particular access to those
578other locations will likely result in the `ELOOP` error ::
579
580	Too many levels of symbolic links
581