1de389cf0SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2de389cf0SMauro Carvalho Chehab
3de389cf0SMauro Carvalho Chehab===============================================================
4de389cf0SMauro Carvalho ChehabInotify - A Powerful yet Simple File Change Notification System
5de389cf0SMauro Carvalho Chehab===============================================================
6de389cf0SMauro Carvalho Chehab
7de389cf0SMauro Carvalho Chehab
8de389cf0SMauro Carvalho Chehab
9de389cf0SMauro Carvalho ChehabDocument started 15 Mar 2005 by Robert Love <rml@novell.com>
10de389cf0SMauro Carvalho Chehab
11de389cf0SMauro Carvalho ChehabDocument updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com>
12de389cf0SMauro Carvalho Chehab
13de389cf0SMauro Carvalho Chehab	- Deleted obsoleted interface, just refer to manpages for user interface.
14de389cf0SMauro Carvalho Chehab
15de389cf0SMauro Carvalho Chehab(i) Rationale
16de389cf0SMauro Carvalho Chehab
17de389cf0SMauro Carvalho ChehabQ:
18de389cf0SMauro Carvalho Chehab   What is the design decision behind not tying the watch to the open fd of
19de389cf0SMauro Carvalho Chehab   the watched object?
20de389cf0SMauro Carvalho Chehab
21de389cf0SMauro Carvalho ChehabA:
22de389cf0SMauro Carvalho Chehab   Watches are associated with an open inotify device, not an open file.
23de389cf0SMauro Carvalho Chehab   This solves the primary problem with dnotify: keeping the file open pins
24de389cf0SMauro Carvalho Chehab   the file and thus, worse, pins the mount.  Dnotify is therefore infeasible
25de389cf0SMauro Carvalho Chehab   for use on a desktop system with removable media as the media cannot be
26de389cf0SMauro Carvalho Chehab   unmounted.  Watching a file should not require that it be open.
27de389cf0SMauro Carvalho Chehab
28de389cf0SMauro Carvalho ChehabQ:
29de389cf0SMauro Carvalho Chehab   What is the design decision behind using an-fd-per-instance as opposed to
30de389cf0SMauro Carvalho Chehab   an fd-per-watch?
31de389cf0SMauro Carvalho Chehab
32de389cf0SMauro Carvalho ChehabA:
33de389cf0SMauro Carvalho Chehab   An fd-per-watch quickly consumes more file descriptors than are allowed,
34de389cf0SMauro Carvalho Chehab   more fd's than are feasible to manage, and more fd's than are optimally
35de389cf0SMauro Carvalho Chehab   select()-able.  Yes, root can bump the per-process fd limit and yes, users
36de389cf0SMauro Carvalho Chehab   can use epoll, but requiring both is a silly and extraneous requirement.
37de389cf0SMauro Carvalho Chehab   A watch consumes less memory than an open file, separating the number
38de389cf0SMauro Carvalho Chehab   spaces is thus sensible.  The current design is what user-space developers
39de389cf0SMauro Carvalho Chehab   want: Users initialize inotify, once, and add n watches, requiring but one
40de389cf0SMauro Carvalho Chehab   fd and no twiddling with fd limits.  Initializing an inotify instance two
41de389cf0SMauro Carvalho Chehab   thousand times is silly.  If we can implement user-space's preferences
42de389cf0SMauro Carvalho Chehab   cleanly--and we can, the idr layer makes stuff like this trivial--then we
43de389cf0SMauro Carvalho Chehab   should.
44de389cf0SMauro Carvalho Chehab
45de389cf0SMauro Carvalho Chehab   There are other good arguments.  With a single fd, there is a single
46de389cf0SMauro Carvalho Chehab   item to block on, which is mapped to a single queue of events.  The single
47de389cf0SMauro Carvalho Chehab   fd returns all watch events and also any potential out-of-band data.  If
48de389cf0SMauro Carvalho Chehab   every fd was a separate watch,
49de389cf0SMauro Carvalho Chehab
50de389cf0SMauro Carvalho Chehab   - There would be no way to get event ordering.  Events on file foo and
51de389cf0SMauro Carvalho Chehab     file bar would pop poll() on both fd's, but there would be no way to tell
52de389cf0SMauro Carvalho Chehab     which happened first.  A single queue trivially gives you ordering.  Such
53de389cf0SMauro Carvalho Chehab     ordering is crucial to existing applications such as Beagle.  Imagine
54de389cf0SMauro Carvalho Chehab     "mv a b ; mv b a" events without ordering.
55de389cf0SMauro Carvalho Chehab
56de389cf0SMauro Carvalho Chehab   - We'd have to maintain n fd's and n internal queues with state,
57de389cf0SMauro Carvalho Chehab     versus just one.  It is a lot messier in the kernel.  A single, linear
58de389cf0SMauro Carvalho Chehab     queue is the data structure that makes sense.
59de389cf0SMauro Carvalho Chehab
60de389cf0SMauro Carvalho Chehab   - User-space developers prefer the current API.  The Beagle guys, for
61de389cf0SMauro Carvalho Chehab     example, love it.  Trust me, I asked.  It is not a surprise: Who'd want
62de389cf0SMauro Carvalho Chehab     to manage and block on 1000 fd's via select?
63de389cf0SMauro Carvalho Chehab
64de389cf0SMauro Carvalho Chehab   - No way to get out of band data.
65de389cf0SMauro Carvalho Chehab
66de389cf0SMauro Carvalho Chehab   - 1024 is still too low.  ;-)
67de389cf0SMauro Carvalho Chehab
68de389cf0SMauro Carvalho Chehab   When you talk about designing a file change notification system that
69de389cf0SMauro Carvalho Chehab   scales to 1000s of directories, juggling 1000s of fd's just does not seem
70de389cf0SMauro Carvalho Chehab   the right interface.  It is too heavy.
71de389cf0SMauro Carvalho Chehab
72de389cf0SMauro Carvalho Chehab   Additionally, it _is_ possible to  more than one instance  and
73de389cf0SMauro Carvalho Chehab   juggle more than one queue and thus more than one associated fd.  There
74de389cf0SMauro Carvalho Chehab   need not be a one-fd-per-process mapping; it is one-fd-per-queue and a
75de389cf0SMauro Carvalho Chehab   process can easily want more than one queue.
76de389cf0SMauro Carvalho Chehab
77de389cf0SMauro Carvalho ChehabQ:
78de389cf0SMauro Carvalho Chehab   Why the system call approach?
79de389cf0SMauro Carvalho Chehab
80de389cf0SMauro Carvalho ChehabA:
81de389cf0SMauro Carvalho Chehab   The poor user-space interface is the second biggest problem with dnotify.
82de389cf0SMauro Carvalho Chehab   Signals are a terrible, terrible interface for file notification.  Or for
83de389cf0SMauro Carvalho Chehab   anything, for that matter.  The ideal solution, from all perspectives, is a
84de389cf0SMauro Carvalho Chehab   file descriptor-based one that allows basic file I/O and poll/select.
85de389cf0SMauro Carvalho Chehab   Obtaining the fd and managing the watches could have been done either via a
86de389cf0SMauro Carvalho Chehab   device file or a family of new system calls.  We decided to implement a
87de389cf0SMauro Carvalho Chehab   family of system calls because that is the preferred approach for new kernel
88de389cf0SMauro Carvalho Chehab   interfaces.  The only real difference was whether we wanted to use open(2)
89de389cf0SMauro Carvalho Chehab   and ioctl(2) or a couple of new system calls.  System calls beat ioctls.
90de389cf0SMauro Carvalho Chehab
91