1de389cf0SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0 2de389cf0SMauro Carvalho Chehab 3de389cf0SMauro Carvalho Chehab=============================================================== 4de389cf0SMauro Carvalho ChehabInotify - A Powerful yet Simple File Change Notification System 5de389cf0SMauro Carvalho Chehab=============================================================== 6de389cf0SMauro Carvalho Chehab 7de389cf0SMauro Carvalho Chehab 8de389cf0SMauro Carvalho Chehab 9de389cf0SMauro Carvalho ChehabDocument started 15 Mar 2005 by Robert Love <rml@novell.com> 10de389cf0SMauro Carvalho Chehab 11de389cf0SMauro Carvalho ChehabDocument updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com> 12de389cf0SMauro Carvalho Chehab 13de389cf0SMauro Carvalho Chehab - Deleted obsoleted interface, just refer to manpages for user interface. 14de389cf0SMauro Carvalho Chehab 15de389cf0SMauro Carvalho Chehab(i) Rationale 16de389cf0SMauro Carvalho Chehab 17de389cf0SMauro Carvalho ChehabQ: 18de389cf0SMauro Carvalho Chehab What is the design decision behind not tying the watch to the open fd of 19de389cf0SMauro Carvalho Chehab the watched object? 20de389cf0SMauro Carvalho Chehab 21de389cf0SMauro Carvalho ChehabA: 22de389cf0SMauro Carvalho Chehab Watches are associated with an open inotify device, not an open file. 23de389cf0SMauro Carvalho Chehab This solves the primary problem with dnotify: keeping the file open pins 24de389cf0SMauro Carvalho Chehab the file and thus, worse, pins the mount. Dnotify is therefore infeasible 25de389cf0SMauro Carvalho Chehab for use on a desktop system with removable media as the media cannot be 26de389cf0SMauro Carvalho Chehab unmounted. Watching a file should not require that it be open. 27de389cf0SMauro Carvalho Chehab 28de389cf0SMauro Carvalho ChehabQ: 29de389cf0SMauro Carvalho Chehab What is the design decision behind using an-fd-per-instance as opposed to 30de389cf0SMauro Carvalho Chehab an fd-per-watch? 31de389cf0SMauro Carvalho Chehab 32de389cf0SMauro Carvalho ChehabA: 33de389cf0SMauro Carvalho Chehab An fd-per-watch quickly consumes more file descriptors than are allowed, 34de389cf0SMauro Carvalho Chehab more fd's than are feasible to manage, and more fd's than are optimally 35de389cf0SMauro Carvalho Chehab select()-able. Yes, root can bump the per-process fd limit and yes, users 36de389cf0SMauro Carvalho Chehab can use epoll, but requiring both is a silly and extraneous requirement. 37de389cf0SMauro Carvalho Chehab A watch consumes less memory than an open file, separating the number 38de389cf0SMauro Carvalho Chehab spaces is thus sensible. The current design is what user-space developers 39de389cf0SMauro Carvalho Chehab want: Users initialize inotify, once, and add n watches, requiring but one 40de389cf0SMauro Carvalho Chehab fd and no twiddling with fd limits. Initializing an inotify instance two 41de389cf0SMauro Carvalho Chehab thousand times is silly. If we can implement user-space's preferences 42de389cf0SMauro Carvalho Chehab cleanly--and we can, the idr layer makes stuff like this trivial--then we 43de389cf0SMauro Carvalho Chehab should. 44de389cf0SMauro Carvalho Chehab 45de389cf0SMauro Carvalho Chehab There are other good arguments. With a single fd, there is a single 46de389cf0SMauro Carvalho Chehab item to block on, which is mapped to a single queue of events. The single 47de389cf0SMauro Carvalho Chehab fd returns all watch events and also any potential out-of-band data. If 48de389cf0SMauro Carvalho Chehab every fd was a separate watch, 49de389cf0SMauro Carvalho Chehab 50de389cf0SMauro Carvalho Chehab - There would be no way to get event ordering. Events on file foo and 51de389cf0SMauro Carvalho Chehab file bar would pop poll() on both fd's, but there would be no way to tell 52de389cf0SMauro Carvalho Chehab which happened first. A single queue trivially gives you ordering. Such 53de389cf0SMauro Carvalho Chehab ordering is crucial to existing applications such as Beagle. Imagine 54de389cf0SMauro Carvalho Chehab "mv a b ; mv b a" events without ordering. 55de389cf0SMauro Carvalho Chehab 56de389cf0SMauro Carvalho Chehab - We'd have to maintain n fd's and n internal queues with state, 57de389cf0SMauro Carvalho Chehab versus just one. It is a lot messier in the kernel. A single, linear 58de389cf0SMauro Carvalho Chehab queue is the data structure that makes sense. 59de389cf0SMauro Carvalho Chehab 60de389cf0SMauro Carvalho Chehab - User-space developers prefer the current API. The Beagle guys, for 61de389cf0SMauro Carvalho Chehab example, love it. Trust me, I asked. It is not a surprise: Who'd want 62de389cf0SMauro Carvalho Chehab to manage and block on 1000 fd's via select? 63de389cf0SMauro Carvalho Chehab 64de389cf0SMauro Carvalho Chehab - No way to get out of band data. 65de389cf0SMauro Carvalho Chehab 66de389cf0SMauro Carvalho Chehab - 1024 is still too low. ;-) 67de389cf0SMauro Carvalho Chehab 68de389cf0SMauro Carvalho Chehab When you talk about designing a file change notification system that 69de389cf0SMauro Carvalho Chehab scales to 1000s of directories, juggling 1000s of fd's just does not seem 70de389cf0SMauro Carvalho Chehab the right interface. It is too heavy. 71de389cf0SMauro Carvalho Chehab 72de389cf0SMauro Carvalho Chehab Additionally, it _is_ possible to more than one instance and 73de389cf0SMauro Carvalho Chehab juggle more than one queue and thus more than one associated fd. There 74de389cf0SMauro Carvalho Chehab need not be a one-fd-per-process mapping; it is one-fd-per-queue and a 75de389cf0SMauro Carvalho Chehab process can easily want more than one queue. 76de389cf0SMauro Carvalho Chehab 77de389cf0SMauro Carvalho ChehabQ: 78de389cf0SMauro Carvalho Chehab Why the system call approach? 79de389cf0SMauro Carvalho Chehab 80de389cf0SMauro Carvalho ChehabA: 81de389cf0SMauro Carvalho Chehab The poor user-space interface is the second biggest problem with dnotify. 82de389cf0SMauro Carvalho Chehab Signals are a terrible, terrible interface for file notification. Or for 83de389cf0SMauro Carvalho Chehab anything, for that matter. The ideal solution, from all perspectives, is a 84de389cf0SMauro Carvalho Chehab file descriptor-based one that allows basic file I/O and poll/select. 85de389cf0SMauro Carvalho Chehab Obtaining the fd and managing the watches could have been done either via a 86de389cf0SMauro Carvalho Chehab device file or a family of new system calls. We decided to implement a 87de389cf0SMauro Carvalho Chehab family of system calls because that is the preferred approach for new kernel 88de389cf0SMauro Carvalho Chehab interfaces. The only real difference was whether we wanted to use open(2) 89de389cf0SMauro Carvalho Chehab and ioctl(2) or a couple of new system calls. System calls beat ioctls. 90de389cf0SMauro Carvalho Chehab 91