18ab13bcaSDaniel W. S. Almeida.. SPDX-License-Identifier: GPL-2.0
23b31589cSMauro Carvalho Chehab
33b31589cSMauro Carvalho Chehab====
48ab13bcaSDaniel W. S. AlmeidaFUSE
53b31589cSMauro Carvalho Chehab====
68ab13bcaSDaniel W. S. Almeida
78ab13bcaSDaniel W. S. AlmeidaDefinitions
88ab13bcaSDaniel W. S. Almeida===========
98ab13bcaSDaniel W. S. Almeida
108ab13bcaSDaniel W. S. AlmeidaUserspace filesystem:
118ab13bcaSDaniel W. S. Almeida  A filesystem in which data and metadata are provided by an ordinary
128ab13bcaSDaniel W. S. Almeida  userspace process.  The filesystem can be accessed normally through
138ab13bcaSDaniel W. S. Almeida  the kernel interface.
148ab13bcaSDaniel W. S. Almeida
158ab13bcaSDaniel W. S. AlmeidaFilesystem daemon:
168ab13bcaSDaniel W. S. Almeida  The process(es) providing the data and metadata of the filesystem.
178ab13bcaSDaniel W. S. Almeida
188ab13bcaSDaniel W. S. AlmeidaNon-privileged mount (or user mount):
198ab13bcaSDaniel W. S. Almeida  A userspace filesystem mounted by a non-privileged (non-root) user.
208ab13bcaSDaniel W. S. Almeida  The filesystem daemon is running with the privileges of the mounting
218ab13bcaSDaniel W. S. Almeida  user.  NOTE: this is not the same as mounts allowed with the "user"
228ab13bcaSDaniel W. S. Almeida  option in /etc/fstab, which is not discussed here.
238ab13bcaSDaniel W. S. Almeida
248ab13bcaSDaniel W. S. AlmeidaFilesystem connection:
258ab13bcaSDaniel W. S. Almeida  A connection between the filesystem daemon and the kernel.  The
268ab13bcaSDaniel W. S. Almeida  connection exists until either the daemon dies, or the filesystem is
278ab13bcaSDaniel W. S. Almeida  umounted.  Note that detaching (or lazy umounting) the filesystem
288ab13bcaSDaniel W. S. Almeida  does *not* break the connection, in this case it will exist until
298ab13bcaSDaniel W. S. Almeida  the last reference to the filesystem is released.
308ab13bcaSDaniel W. S. Almeida
318ab13bcaSDaniel W. S. AlmeidaMount owner:
328ab13bcaSDaniel W. S. Almeida  The user who does the mounting.
338ab13bcaSDaniel W. S. Almeida
348ab13bcaSDaniel W. S. AlmeidaUser:
358ab13bcaSDaniel W. S. Almeida  The user who is performing filesystem operations.
368ab13bcaSDaniel W. S. Almeida
378ab13bcaSDaniel W. S. AlmeidaWhat is FUSE?
388ab13bcaSDaniel W. S. Almeida=============
398ab13bcaSDaniel W. S. Almeida
408ab13bcaSDaniel W. S. AlmeidaFUSE is a userspace filesystem framework.  It consists of a kernel
418ab13bcaSDaniel W. S. Almeidamodule (fuse.ko), a userspace library (libfuse.*) and a mount utility
428ab13bcaSDaniel W. S. Almeida(fusermount).
438ab13bcaSDaniel W. S. Almeida
448ab13bcaSDaniel W. S. AlmeidaOne of the most important features of FUSE is allowing secure,
458ab13bcaSDaniel W. S. Almeidanon-privileged mounts.  This opens up new possibilities for the use of
468ab13bcaSDaniel W. S. Almeidafilesystems.  A good example is sshfs: a secure network filesystem
478ab13bcaSDaniel W. S. Almeidausing the sftp protocol.
488ab13bcaSDaniel W. S. Almeida
498ab13bcaSDaniel W. S. AlmeidaThe userspace library and utilities are available from the
50c1b0c627SAndré Almeida`FUSE homepage: <https://github.com/libfuse/>`_
518ab13bcaSDaniel W. S. Almeida
528ab13bcaSDaniel W. S. AlmeidaFilesystem type
538ab13bcaSDaniel W. S. Almeida===============
548ab13bcaSDaniel W. S. Almeida
558ab13bcaSDaniel W. S. AlmeidaThe filesystem type given to mount(2) can be one of the following:
568ab13bcaSDaniel W. S. Almeida
578ab13bcaSDaniel W. S. Almeida    fuse
588ab13bcaSDaniel W. S. Almeida      This is the usual way to mount a FUSE filesystem.  The first
598ab13bcaSDaniel W. S. Almeida      argument of the mount system call may contain an arbitrary string,
608ab13bcaSDaniel W. S. Almeida      which is not interpreted by the kernel.
618ab13bcaSDaniel W. S. Almeida
628ab13bcaSDaniel W. S. Almeida    fuseblk
638ab13bcaSDaniel W. S. Almeida      The filesystem is block device based.  The first argument of the
648ab13bcaSDaniel W. S. Almeida      mount system call is interpreted as the name of the device.
658ab13bcaSDaniel W. S. Almeida
668ab13bcaSDaniel W. S. AlmeidaMount options
678ab13bcaSDaniel W. S. Almeida=============
688ab13bcaSDaniel W. S. Almeida
698ab13bcaSDaniel W. S. Almeidafd=N
708ab13bcaSDaniel W. S. Almeida  The file descriptor to use for communication between the userspace
718ab13bcaSDaniel W. S. Almeida  filesystem and the kernel.  The file descriptor must have been
728ab13bcaSDaniel W. S. Almeida  obtained by opening the FUSE device ('/dev/fuse').
738ab13bcaSDaniel W. S. Almeida
748ab13bcaSDaniel W. S. Almeidarootmode=M
758ab13bcaSDaniel W. S. Almeida  The file mode of the filesystem's root in octal representation.
768ab13bcaSDaniel W. S. Almeida
778ab13bcaSDaniel W. S. Almeidauser_id=N
788ab13bcaSDaniel W. S. Almeida  The numeric user id of the mount owner.
798ab13bcaSDaniel W. S. Almeida
808ab13bcaSDaniel W. S. Almeidagroup_id=N
818ab13bcaSDaniel W. S. Almeida  The numeric group id of the mount owner.
828ab13bcaSDaniel W. S. Almeida
838ab13bcaSDaniel W. S. Almeidadefault_permissions
848ab13bcaSDaniel W. S. Almeida  By default FUSE doesn't check file access permissions, the
858ab13bcaSDaniel W. S. Almeida  filesystem is free to implement its access policy or leave it to
868ab13bcaSDaniel W. S. Almeida  the underlying file access mechanism (e.g. in case of network
878ab13bcaSDaniel W. S. Almeida  filesystems).  This option enables permission checking, restricting
888ab13bcaSDaniel W. S. Almeida  access based on file mode.  It is usually useful together with the
898ab13bcaSDaniel W. S. Almeida  'allow_other' mount option.
908ab13bcaSDaniel W. S. Almeida
918ab13bcaSDaniel W. S. Almeidaallow_other
928ab13bcaSDaniel W. S. Almeida  This option overrides the security measure restricting file access
938ab13bcaSDaniel W. S. Almeida  to the user mounting the filesystem.  This option is by default only
948ab13bcaSDaniel W. S. Almeida  allowed to root, but this restriction can be removed with a
958ab13bcaSDaniel W. S. Almeida  (userspace) configuration option.
968ab13bcaSDaniel W. S. Almeida
978ab13bcaSDaniel W. S. Almeidamax_read=N
988ab13bcaSDaniel W. S. Almeida  With this option the maximum size of read operations can be set.
998ab13bcaSDaniel W. S. Almeida  The default is infinite.  Note that the size of read requests is
1008ab13bcaSDaniel W. S. Almeida  limited anyway to 32 pages (which is 128kbyte on i386).
1018ab13bcaSDaniel W. S. Almeida
1028ab13bcaSDaniel W. S. Almeidablksize=N
1038ab13bcaSDaniel W. S. Almeida  Set the block size for the filesystem.  The default is 512.  This
1048ab13bcaSDaniel W. S. Almeida  option is only valid for 'fuseblk' type mounts.
1058ab13bcaSDaniel W. S. Almeida
1068ab13bcaSDaniel W. S. AlmeidaControl filesystem
1078ab13bcaSDaniel W. S. Almeida==================
1088ab13bcaSDaniel W. S. Almeida
1098ab13bcaSDaniel W. S. AlmeidaThere's a control filesystem for FUSE, which can be mounted by::
1108ab13bcaSDaniel W. S. Almeida
1118ab13bcaSDaniel W. S. Almeida  mount -t fusectl none /sys/fs/fuse/connections
1128ab13bcaSDaniel W. S. Almeida
1138ab13bcaSDaniel W. S. AlmeidaMounting it under the '/sys/fs/fuse/connections' directory makes it
1148ab13bcaSDaniel W. S. Almeidabackwards compatible with earlier versions.
1158ab13bcaSDaniel W. S. Almeida
1168ab13bcaSDaniel W. S. AlmeidaUnder the fuse control filesystem each connection has a directory
1178ab13bcaSDaniel W. S. Almeidanamed by a unique number.
1188ab13bcaSDaniel W. S. Almeida
1198ab13bcaSDaniel W. S. AlmeidaFor each connection the following files exist within this directory:
1208ab13bcaSDaniel W. S. Almeida
1218ab13bcaSDaniel W. S. Almeida	waiting
1228ab13bcaSDaniel W. S. Almeida	  The number of requests which are waiting to be transferred to
1238ab13bcaSDaniel W. S. Almeida	  userspace or being processed by the filesystem daemon.  If there is
1248ab13bcaSDaniel W. S. Almeida	  no filesystem activity and 'waiting' is non-zero, then the
1258ab13bcaSDaniel W. S. Almeida	  filesystem is hung or deadlocked.
1268ab13bcaSDaniel W. S. Almeida
1278ab13bcaSDaniel W. S. Almeida	abort
1288ab13bcaSDaniel W. S. Almeida	  Writing anything into this file will abort the filesystem
1298ab13bcaSDaniel W. S. Almeida	  connection.  This means that all waiting requests will be aborted an
1308ab13bcaSDaniel W. S. Almeida	  error returned for all aborted and new requests.
1318ab13bcaSDaniel W. S. Almeida
1328ab13bcaSDaniel W. S. AlmeidaOnly the owner of the mount may read or write these files.
1338ab13bcaSDaniel W. S. Almeida
1348ab13bcaSDaniel W. S. AlmeidaInterrupting filesystem operations
1358ab13bcaSDaniel W. S. Almeida##################################
1368ab13bcaSDaniel W. S. Almeida
1378ab13bcaSDaniel W. S. AlmeidaIf a process issuing a FUSE filesystem request is interrupted, the
1388ab13bcaSDaniel W. S. Almeidafollowing will happen:
1398ab13bcaSDaniel W. S. Almeida
1408ab13bcaSDaniel W. S. Almeida  -  If the request is not yet sent to userspace AND the signal is
1418ab13bcaSDaniel W. S. Almeida     fatal (SIGKILL or unhandled fatal signal), then the request is
1428ab13bcaSDaniel W. S. Almeida     dequeued and returns immediately.
1438ab13bcaSDaniel W. S. Almeida
1448ab13bcaSDaniel W. S. Almeida  -  If the request is not yet sent to userspace AND the signal is not
1458ab13bcaSDaniel W. S. Almeida     fatal, then an interrupted flag is set for the request.  When
1468ab13bcaSDaniel W. S. Almeida     the request has been successfully transferred to userspace and
1478ab13bcaSDaniel W. S. Almeida     this flag is set, an INTERRUPT request is queued.
1488ab13bcaSDaniel W. S. Almeida
1498ab13bcaSDaniel W. S. Almeida  -  If the request is already sent to userspace, then an INTERRUPT
1508ab13bcaSDaniel W. S. Almeida     request is queued.
1518ab13bcaSDaniel W. S. Almeida
1528ab13bcaSDaniel W. S. AlmeidaINTERRUPT requests take precedence over other requests, so the
1538ab13bcaSDaniel W. S. Almeidauserspace filesystem will receive queued INTERRUPTs before any others.
1548ab13bcaSDaniel W. S. Almeida
1558ab13bcaSDaniel W. S. AlmeidaThe userspace filesystem may ignore the INTERRUPT requests entirely,
1568ab13bcaSDaniel W. S. Almeidaor may honor them by sending a reply to the *original* request, with
1578ab13bcaSDaniel W. S. Almeidathe error set to EINTR.
1588ab13bcaSDaniel W. S. Almeida
1598ab13bcaSDaniel W. S. AlmeidaIt is also possible that there's a race between processing the
1608ab13bcaSDaniel W. S. Almeidaoriginal request and its INTERRUPT request.  There are two possibilities:
1618ab13bcaSDaniel W. S. Almeida
1628ab13bcaSDaniel W. S. Almeida  1. The INTERRUPT request is processed before the original request is
1638ab13bcaSDaniel W. S. Almeida     processed
1648ab13bcaSDaniel W. S. Almeida
1658ab13bcaSDaniel W. S. Almeida  2. The INTERRUPT request is processed after the original request has
1668ab13bcaSDaniel W. S. Almeida     been answered
1678ab13bcaSDaniel W. S. Almeida
1688ab13bcaSDaniel W. S. AlmeidaIf the filesystem cannot find the original request, it should wait for
1698ab13bcaSDaniel W. S. Almeidasome timeout and/or a number of new requests to arrive, after which it
1708ab13bcaSDaniel W. S. Almeidashould reply to the INTERRUPT request with an EAGAIN error.  In case
1718ab13bcaSDaniel W. S. Almeida1) the INTERRUPT request will be requeued.  In case 2) the INTERRUPT
1728ab13bcaSDaniel W. S. Almeidareply will be ignored.
1738ab13bcaSDaniel W. S. Almeida
1748ab13bcaSDaniel W. S. AlmeidaAborting a filesystem connection
1758ab13bcaSDaniel W. S. Almeida================================
1768ab13bcaSDaniel W. S. Almeida
1778ab13bcaSDaniel W. S. AlmeidaIt is possible to get into certain situations where the filesystem is
1788ab13bcaSDaniel W. S. Almeidanot responding.  Reasons for this may be:
1798ab13bcaSDaniel W. S. Almeida
1808ab13bcaSDaniel W. S. Almeida  a) Broken userspace filesystem implementation
1818ab13bcaSDaniel W. S. Almeida
1828ab13bcaSDaniel W. S. Almeida  b) Network connection down
1838ab13bcaSDaniel W. S. Almeida
1848ab13bcaSDaniel W. S. Almeida  c) Accidental deadlock
1858ab13bcaSDaniel W. S. Almeida
1868ab13bcaSDaniel W. S. Almeida  d) Malicious deadlock
1878ab13bcaSDaniel W. S. Almeida
1888ab13bcaSDaniel W. S. Almeida(For more on c) and d) see later sections)
1898ab13bcaSDaniel W. S. Almeida
1908ab13bcaSDaniel W. S. AlmeidaIn either of these cases it may be useful to abort the connection to
1918ab13bcaSDaniel W. S. Almeidathe filesystem.  There are several ways to do this:
1928ab13bcaSDaniel W. S. Almeida
1938ab13bcaSDaniel W. S. Almeida  - Kill the filesystem daemon.  Works in case of a) and b)
1948ab13bcaSDaniel W. S. Almeida
1958ab13bcaSDaniel W. S. Almeida  - Kill the filesystem daemon and all users of the filesystem.  Works
1968ab13bcaSDaniel W. S. Almeida    in all cases except some malicious deadlocks
1978ab13bcaSDaniel W. S. Almeida
1988ab13bcaSDaniel W. S. Almeida  - Use forced umount (umount -f).  Works in all cases but only if
1998ab13bcaSDaniel W. S. Almeida    filesystem is still attached (it hasn't been lazy unmounted)
2008ab13bcaSDaniel W. S. Almeida
2018ab13bcaSDaniel W. S. Almeida  - Abort filesystem through the FUSE control filesystem.  Most
2028ab13bcaSDaniel W. S. Almeida    powerful method, always works.
2038ab13bcaSDaniel W. S. Almeida
2048ab13bcaSDaniel W. S. AlmeidaHow do non-privileged mounts work?
2058ab13bcaSDaniel W. S. Almeida==================================
2068ab13bcaSDaniel W. S. Almeida
2078ab13bcaSDaniel W. S. AlmeidaSince the mount() system call is a privileged operation, a helper
2088ab13bcaSDaniel W. S. Almeidaprogram (fusermount) is needed, which is installed setuid root.
2098ab13bcaSDaniel W. S. Almeida
2108ab13bcaSDaniel W. S. AlmeidaThe implication of providing non-privileged mounts is that the mount
2118ab13bcaSDaniel W. S. Almeidaowner must not be able to use this capability to compromise the
2128ab13bcaSDaniel W. S. Almeidasystem.  Obvious requirements arising from this are:
2138ab13bcaSDaniel W. S. Almeida
2148ab13bcaSDaniel W. S. Almeida A) mount owner should not be able to get elevated privileges with the
2158ab13bcaSDaniel W. S. Almeida    help of the mounted filesystem
2168ab13bcaSDaniel W. S. Almeida
2178ab13bcaSDaniel W. S. Almeida B) mount owner should not get illegitimate access to information from
2188ab13bcaSDaniel W. S. Almeida    other users' and the super user's processes
2198ab13bcaSDaniel W. S. Almeida
2208ab13bcaSDaniel W. S. Almeida C) mount owner should not be able to induce undesired behavior in
2218ab13bcaSDaniel W. S. Almeida    other users' or the super user's processes
2228ab13bcaSDaniel W. S. Almeida
2238ab13bcaSDaniel W. S. AlmeidaHow are requirements fulfilled?
2248ab13bcaSDaniel W. S. Almeida===============================
2258ab13bcaSDaniel W. S. Almeida
2268ab13bcaSDaniel W. S. Almeida A) The mount owner could gain elevated privileges by either:
2278ab13bcaSDaniel W. S. Almeida
2288ab13bcaSDaniel W. S. Almeida    1. creating a filesystem containing a device file, then opening this device
2298ab13bcaSDaniel W. S. Almeida
2308ab13bcaSDaniel W. S. Almeida    2. creating a filesystem containing a suid or sgid application, then executing this application
2318ab13bcaSDaniel W. S. Almeida
2328ab13bcaSDaniel W. S. Almeida    The solution is not to allow opening device files and ignore
2338ab13bcaSDaniel W. S. Almeida    setuid and setgid bits when executing programs.  To ensure this
2348ab13bcaSDaniel W. S. Almeida    fusermount always adds "nosuid" and "nodev" to the mount options
2358ab13bcaSDaniel W. S. Almeida    for non-privileged mounts.
2368ab13bcaSDaniel W. S. Almeida
2378ab13bcaSDaniel W. S. Almeida B) If another user is accessing files or directories in the
2388ab13bcaSDaniel W. S. Almeida    filesystem, the filesystem daemon serving requests can record the
2398ab13bcaSDaniel W. S. Almeida    exact sequence and timing of operations performed.  This
2408ab13bcaSDaniel W. S. Almeida    information is otherwise inaccessible to the mount owner, so this
2418ab13bcaSDaniel W. S. Almeida    counts as an information leak.
2428ab13bcaSDaniel W. S. Almeida
2438ab13bcaSDaniel W. S. Almeida    The solution to this problem will be presented in point 2) of C).
2448ab13bcaSDaniel W. S. Almeida
2458ab13bcaSDaniel W. S. Almeida C) There are several ways in which the mount owner can induce
2468ab13bcaSDaniel W. S. Almeida    undesired behavior in other users' processes, such as:
2478ab13bcaSDaniel W. S. Almeida
2488ab13bcaSDaniel W. S. Almeida     1) mounting a filesystem over a file or directory which the mount
2498ab13bcaSDaniel W. S. Almeida        owner could otherwise not be able to modify (or could only
2508ab13bcaSDaniel W. S. Almeida        make limited modifications).
2518ab13bcaSDaniel W. S. Almeida
2528ab13bcaSDaniel W. S. Almeida        This is solved in fusermount, by checking the access
2538ab13bcaSDaniel W. S. Almeida        permissions on the mountpoint and only allowing the mount if
2548ab13bcaSDaniel W. S. Almeida        the mount owner can do unlimited modification (has write
2558ab13bcaSDaniel W. S. Almeida        access to the mountpoint, and mountpoint is not a "sticky"
2568ab13bcaSDaniel W. S. Almeida        directory)
2578ab13bcaSDaniel W. S. Almeida
2588ab13bcaSDaniel W. S. Almeida     2) Even if 1) is solved the mount owner can change the behavior
2598ab13bcaSDaniel W. S. Almeida        of other users' processes.
2608ab13bcaSDaniel W. S. Almeida
2618ab13bcaSDaniel W. S. Almeida         i) It can slow down or indefinitely delay the execution of a
2628ab13bcaSDaniel W. S. Almeida            filesystem operation creating a DoS against the user or the
2638ab13bcaSDaniel W. S. Almeida            whole system.  For example a suid application locking a
2648ab13bcaSDaniel W. S. Almeida            system file, and then accessing a file on the mount owner's
2658ab13bcaSDaniel W. S. Almeida            filesystem could be stopped, and thus causing the system
2668ab13bcaSDaniel W. S. Almeida            file to be locked forever.
2678ab13bcaSDaniel W. S. Almeida
2688ab13bcaSDaniel W. S. Almeida         ii) It can present files or directories of unlimited length, or
2698ab13bcaSDaniel W. S. Almeida             directory structures of unlimited depth, possibly causing a
2708ab13bcaSDaniel W. S. Almeida             system process to eat up diskspace, memory or other
2718ab13bcaSDaniel W. S. Almeida             resources, again causing *DoS*.
2728ab13bcaSDaniel W. S. Almeida
2738ab13bcaSDaniel W. S. Almeida	The solution to this as well as B) is not to allow processes
2748ab13bcaSDaniel W. S. Almeida	to access the filesystem, which could otherwise not be
2758ab13bcaSDaniel W. S. Almeida	monitored or manipulated by the mount owner.  Since if the
2768ab13bcaSDaniel W. S. Almeida	mount owner can ptrace a process, it can do all of the above
2778ab13bcaSDaniel W. S. Almeida	without using a FUSE mount, the same criteria as used in
2788ab13bcaSDaniel W. S. Almeida	ptrace can be used to check if a process is allowed to access
2798ab13bcaSDaniel W. S. Almeida	the filesystem or not.
2808ab13bcaSDaniel W. S. Almeida
2818ab13bcaSDaniel W. S. Almeida	Note that the *ptrace* check is not strictly necessary to
282*9ccf47b2SDave Marchevsky	prevent C/2/i, it is enough to check if mount owner has enough
2838ab13bcaSDaniel W. S. Almeida	privilege to send signal to the process accessing the
2848ab13bcaSDaniel W. S. Almeida	filesystem, since *SIGSTOP* can be used to get a similar effect.
2858ab13bcaSDaniel W. S. Almeida
2868ab13bcaSDaniel W. S. AlmeidaI think these limitations are unacceptable?
2878ab13bcaSDaniel W. S. Almeida===========================================
2888ab13bcaSDaniel W. S. Almeida
2898ab13bcaSDaniel W. S. AlmeidaIf a sysadmin trusts the users enough, or can ensure through other
2908ab13bcaSDaniel W. S. Almeidameasures, that system processes will never enter non-privileged
291*9ccf47b2SDave Marchevskymounts, it can relax the last limitation in several ways:
292*9ccf47b2SDave Marchevsky
293*9ccf47b2SDave Marchevsky  - With the 'user_allow_other' config option. If this config option is
294*9ccf47b2SDave Marchevsky    set, the mounting user can add the 'allow_other' mount option which
295*9ccf47b2SDave Marchevsky    disables the check for other users' processes.
296*9ccf47b2SDave Marchevsky
297*9ccf47b2SDave Marchevsky    User namespaces have an unintuitive interaction with 'allow_other':
298*9ccf47b2SDave Marchevsky    an unprivileged user - normally restricted from mounting with
299*9ccf47b2SDave Marchevsky    'allow_other' - could do so in a user namespace where they're
300*9ccf47b2SDave Marchevsky    privileged. If any process could access such an 'allow_other' mount
301*9ccf47b2SDave Marchevsky    this would give the mounting user the ability to manipulate
302*9ccf47b2SDave Marchevsky    processes in user namespaces where they're unprivileged. For this
303*9ccf47b2SDave Marchevsky    reason 'allow_other' restricts access to users in the same userns
304*9ccf47b2SDave Marchevsky    or a descendant.
305*9ccf47b2SDave Marchevsky
306*9ccf47b2SDave Marchevsky  - With the 'allow_sys_admin_access' module option. If this option is
307*9ccf47b2SDave Marchevsky    set, super user's processes have unrestricted access to mounts
308*9ccf47b2SDave Marchevsky    irrespective of allow_other setting or user namespace of the
309*9ccf47b2SDave Marchevsky    mounting user.
310*9ccf47b2SDave Marchevsky
311*9ccf47b2SDave MarchevskyNote that both of these relaxations expose the system to potential
312*9ccf47b2SDave Marchevskyinformation leak or *DoS* as described in points B and C/2/i-ii in the
313*9ccf47b2SDave Marchevskypreceding section.
3148ab13bcaSDaniel W. S. Almeida
3158ab13bcaSDaniel W. S. AlmeidaKernel - userspace interface
3168ab13bcaSDaniel W. S. Almeida============================
3178ab13bcaSDaniel W. S. Almeida
3188ab13bcaSDaniel W. S. AlmeidaThe following diagram shows how a filesystem operation (in this
3198ab13bcaSDaniel W. S. Almeidaexample unlink) is performed in FUSE. ::
3208ab13bcaSDaniel W. S. Almeida
3218ab13bcaSDaniel W. S. Almeida
3228ab13bcaSDaniel W. S. Almeida |  "rm /mnt/fuse/file"               |  FUSE filesystem daemon
3238ab13bcaSDaniel W. S. Almeida |                                    |
3248ab13bcaSDaniel W. S. Almeida |                                    |  >sys_read()
3258ab13bcaSDaniel W. S. Almeida |                                    |    >fuse_dev_read()
3268ab13bcaSDaniel W. S. Almeida |                                    |      >request_wait()
3278ab13bcaSDaniel W. S. Almeida |                                    |        [sleep on fc->waitq]
3288ab13bcaSDaniel W. S. Almeida |                                    |
3298ab13bcaSDaniel W. S. Almeida |  >sys_unlink()                     |
3308ab13bcaSDaniel W. S. Almeida |    >fuse_unlink()                  |
3318ab13bcaSDaniel W. S. Almeida |      [get request from             |
3328ab13bcaSDaniel W. S. Almeida |       fc->unused_list]             |
3338ab13bcaSDaniel W. S. Almeida |      >request_send()               |
3348ab13bcaSDaniel W. S. Almeida |        [queue req on fc->pending]  |
3358ab13bcaSDaniel W. S. Almeida |        [wake up fc->waitq]         |        [woken up]
3368ab13bcaSDaniel W. S. Almeida |        >request_wait_answer()      |
3378ab13bcaSDaniel W. S. Almeida |          [sleep on req->waitq]     |
3388ab13bcaSDaniel W. S. Almeida |                                    |      <request_wait()
3398ab13bcaSDaniel W. S. Almeida |                                    |      [remove req from fc->pending]
3408ab13bcaSDaniel W. S. Almeida |                                    |      [copy req to read buffer]
3418ab13bcaSDaniel W. S. Almeida |                                    |      [add req to fc->processing]
3428ab13bcaSDaniel W. S. Almeida |                                    |    <fuse_dev_read()
3438ab13bcaSDaniel W. S. Almeida |                                    |  <sys_read()
3448ab13bcaSDaniel W. S. Almeida |                                    |
3458ab13bcaSDaniel W. S. Almeida |                                    |  [perform unlink]
3468ab13bcaSDaniel W. S. Almeida |                                    |
3478ab13bcaSDaniel W. S. Almeida |                                    |  >sys_write()
3488ab13bcaSDaniel W. S. Almeida |                                    |    >fuse_dev_write()
3498ab13bcaSDaniel W. S. Almeida |                                    |      [look up req in fc->processing]
3508ab13bcaSDaniel W. S. Almeida |                                    |      [remove from fc->processing]
3518ab13bcaSDaniel W. S. Almeida |                                    |      [copy write buffer to req]
3528ab13bcaSDaniel W. S. Almeida |          [woken up]                |      [wake up req->waitq]
3538ab13bcaSDaniel W. S. Almeida |                                    |    <fuse_dev_write()
3548ab13bcaSDaniel W. S. Almeida |                                    |  <sys_write()
3558ab13bcaSDaniel W. S. Almeida |        <request_wait_answer()      |
3568ab13bcaSDaniel W. S. Almeida |      <request_send()               |
3578ab13bcaSDaniel W. S. Almeida |      [add request to               |
3588ab13bcaSDaniel W. S. Almeida |       fc->unused_list]             |
3598ab13bcaSDaniel W. S. Almeida |    <fuse_unlink()                  |
3608ab13bcaSDaniel W. S. Almeida |  <sys_unlink()                     |
3618ab13bcaSDaniel W. S. Almeida
3628ab13bcaSDaniel W. S. Almeida.. note:: Everything in the description above is greatly simplified
3638ab13bcaSDaniel W. S. Almeida
3648ab13bcaSDaniel W. S. AlmeidaThere are a couple of ways in which to deadlock a FUSE filesystem.
3658ab13bcaSDaniel W. S. AlmeidaSince we are talking about unprivileged userspace programs,
3668ab13bcaSDaniel W. S. Almeidasomething must be done about these.
3678ab13bcaSDaniel W. S. Almeida
3688ab13bcaSDaniel W. S. Almeida**Scenario 1 -  Simple deadlock**::
3698ab13bcaSDaniel W. S. Almeida
3708ab13bcaSDaniel W. S. Almeida |  "rm /mnt/fuse/file"               |  FUSE filesystem daemon
3718ab13bcaSDaniel W. S. Almeida |                                    |
3728ab13bcaSDaniel W. S. Almeida |  >sys_unlink("/mnt/fuse/file")     |
3738ab13bcaSDaniel W. S. Almeida |    [acquire inode semaphore        |
3748ab13bcaSDaniel W. S. Almeida |     for "file"]                    |
3758ab13bcaSDaniel W. S. Almeida |    >fuse_unlink()                  |
3768ab13bcaSDaniel W. S. Almeida |      [sleep on req->waitq]         |
3778ab13bcaSDaniel W. S. Almeida |                                    |  <sys_read()
3788ab13bcaSDaniel W. S. Almeida |                                    |  >sys_unlink("/mnt/fuse/file")
3798ab13bcaSDaniel W. S. Almeida |                                    |    [acquire inode semaphore
3808ab13bcaSDaniel W. S. Almeida |                                    |     for "file"]
3818ab13bcaSDaniel W. S. Almeida |                                    |    *DEADLOCK*
3828ab13bcaSDaniel W. S. Almeida
3838ab13bcaSDaniel W. S. AlmeidaThe solution for this is to allow the filesystem to be aborted.
3848ab13bcaSDaniel W. S. Almeida
3858ab13bcaSDaniel W. S. Almeida**Scenario 2 - Tricky deadlock**
3868ab13bcaSDaniel W. S. Almeida
3878ab13bcaSDaniel W. S. Almeida
3888ab13bcaSDaniel W. S. AlmeidaThis one needs a carefully crafted filesystem.  It's a variation on
3898ab13bcaSDaniel W. S. Almeidathe above, only the call back to the filesystem is not explicit,
3908ab13bcaSDaniel W. S. Almeidabut is caused by a pagefault. ::
3918ab13bcaSDaniel W. S. Almeida
3928ab13bcaSDaniel W. S. Almeida |  Kamikaze filesystem thread 1      |  Kamikaze filesystem thread 2
3938ab13bcaSDaniel W. S. Almeida |                                    |
3948ab13bcaSDaniel W. S. Almeida |  [fd = open("/mnt/fuse/file")]     |  [request served normally]
3958ab13bcaSDaniel W. S. Almeida |  [mmap fd to 'addr']               |
3968ab13bcaSDaniel W. S. Almeida |  [close fd]                        |  [FLUSH triggers 'magic' flag]
3978ab13bcaSDaniel W. S. Almeida |  [read a byte from addr]           |
3988ab13bcaSDaniel W. S. Almeida |    >do_page_fault()                |
3998ab13bcaSDaniel W. S. Almeida |      [find or create page]         |
4008ab13bcaSDaniel W. S. Almeida |      [lock page]                   |
4018ab13bcaSDaniel W. S. Almeida |      >fuse_readpage()              |
4028ab13bcaSDaniel W. S. Almeida |         [queue READ request]       |
4038ab13bcaSDaniel W. S. Almeida |         [sleep on req->waitq]      |
4048ab13bcaSDaniel W. S. Almeida |                                    |  [read request to buffer]
4058ab13bcaSDaniel W. S. Almeida |                                    |  [create reply header before addr]
4068ab13bcaSDaniel W. S. Almeida |                                    |  >sys_write(addr - headerlength)
4078ab13bcaSDaniel W. S. Almeida |                                    |    >fuse_dev_write()
4088ab13bcaSDaniel W. S. Almeida |                                    |      [look up req in fc->processing]
4098ab13bcaSDaniel W. S. Almeida |                                    |      [remove from fc->processing]
4108ab13bcaSDaniel W. S. Almeida |                                    |      [copy write buffer to req]
4118ab13bcaSDaniel W. S. Almeida |                                    |        >do_page_fault()
4128ab13bcaSDaniel W. S. Almeida |                                    |           [find or create page]
4138ab13bcaSDaniel W. S. Almeida |                                    |           [lock page]
4148ab13bcaSDaniel W. S. Almeida |                                    |           * DEADLOCK *
4158ab13bcaSDaniel W. S. Almeida
4168ab13bcaSDaniel W. S. AlmeidaThe solution is basically the same as above.
4178ab13bcaSDaniel W. S. Almeida
4188ab13bcaSDaniel W. S. AlmeidaAn additional problem is that while the write buffer is being copied
4198ab13bcaSDaniel W. S. Almeidato the request, the request must not be interrupted/aborted.  This is
4208ab13bcaSDaniel W. S. Almeidabecause the destination address of the copy may not be valid after the
4218ab13bcaSDaniel W. S. Almeidarequest has returned.
4228ab13bcaSDaniel W. S. Almeida
4238ab13bcaSDaniel W. S. AlmeidaThis is solved with doing the copy atomically, and allowing abort
4248ab13bcaSDaniel W. S. Almeidawhile the page(s) belonging to the write buffer are faulted with
4258ab13bcaSDaniel W. S. Almeidaget_user_pages().  The 'req->locked' flag indicates when the copy is
4268ab13bcaSDaniel W. S. Almeidataking place, and abort is delayed until this flag is unset.
427