1.. SPDX-License-Identifier: GPL-2.0 2 3=================================== 4File management in the Linux kernel 5=================================== 6 7This document describes how locking for files (struct file) 8and file descriptor table (struct files) works. 9 10Up until 2.6.12, the file descriptor table has been protected 11with a lock (files->file_lock) and reference count (files->count). 12->file_lock protected accesses to all the file related fields 13of the table. ->count was used for sharing the file descriptor 14table between tasks cloned with CLONE_FILES flag. Typically 15this would be the case for posix threads. As with the common 16refcounting model in the kernel, the last task doing 17a put_files_struct() frees the file descriptor (fd) table. 18The files (struct file) themselves are protected using 19reference count (->f_count). 20 21In the new lock-free model of file descriptor management, 22the reference counting is similar, but the locking is 23based on RCU. The file descriptor table contains multiple 24elements - the fd sets (open_fds and close_on_exec, the 25array of file pointers, the sizes of the sets and the array 26etc.). In order for the updates to appear atomic to 27a lock-free reader, all the elements of the file descriptor 28table are in a separate structure - struct fdtable. 29files_struct contains a pointer to struct fdtable through 30which the actual fd table is accessed. Initially the 31fdtable is embedded in files_struct itself. On a subsequent 32expansion of fdtable, a new fdtable structure is allocated 33and files->fdtab points to the new structure. The fdtable 34structure is freed with RCU and lock-free readers either 35see the old fdtable or the new fdtable making the update 36appear atomic. Here are the locking rules for 37the fdtable structure - 38 391. All references to the fdtable must be done through 40 the files_fdtable() macro:: 41 42 struct fdtable *fdt; 43 44 rcu_read_lock(); 45 46 fdt = files_fdtable(files); 47 .... 48 if (n <= fdt->max_fds) 49 .... 50 ... 51 rcu_read_unlock(); 52 53 files_fdtable() uses rcu_dereference() macro which takes care of 54 the memory barrier requirements for lock-free dereference. 55 The fdtable pointer must be read within the read-side 56 critical section. 57 582. Reading of the fdtable as described above must be protected 59 by rcu_read_lock()/rcu_read_unlock(). 60 613. For any update to the fd table, files->file_lock must 62 be held. 63 644. To look up the file structure given an fd, a reader 65 must use either lookup_fd_rcu() or files_lookup_fd_rcu() APIs. These 66 take care of barrier requirements due to lock-free lookup. 67 68 An example:: 69 70 struct file *file; 71 72 rcu_read_lock(); 73 file = lookup_fd_rcu(fd); 74 if (file) { 75 ... 76 } 77 .... 78 rcu_read_unlock(); 79 805. Handling of the file structures is special. Since the look-up 81 of the fd (fget()/fget_light()) are lock-free, it is possible 82 that look-up may race with the last put() operation on the 83 file structure. This is avoided using atomic_long_inc_not_zero() 84 on ->f_count:: 85 86 rcu_read_lock(); 87 file = files_lookup_fd_rcu(files, fd); 88 if (file) { 89 if (atomic_long_inc_not_zero(&file->f_count)) 90 *fput_needed = 1; 91 else 92 /* Didn't get the reference, someone's freed */ 93 file = NULL; 94 } 95 rcu_read_unlock(); 96 .... 97 return file; 98 99 atomic_long_inc_not_zero() detects if refcounts is already zero or 100 goes to zero during increment. If it does, we fail 101 fget()/fget_light(). 102 1036. Since both fdtable and file structures can be looked up 104 lock-free, they must be installed using rcu_assign_pointer() 105 API. If they are looked up lock-free, rcu_dereference() 106 must be used. However it is advisable to use files_fdtable() 107 and lookup_fd_rcu()/files_lookup_fd_rcu() which take care of these issues. 108 1097. While updating, the fdtable pointer must be looked up while 110 holding files->file_lock. If ->file_lock is dropped, then 111 another thread expand the files thereby creating a new 112 fdtable and making the earlier fdtable pointer stale. 113 114 For example:: 115 116 spin_lock(&files->file_lock); 117 fd = locate_fd(files, file, start); 118 if (fd >= 0) { 119 /* locate_fd() may have expanded fdtable, load the ptr */ 120 fdt = files_fdtable(files); 121 __set_open_fd(fd, fdt); 122 __clear_close_on_exec(fd, fdt); 123 spin_unlock(&files->file_lock); 124 ..... 125 126 Since locate_fd() can drop ->file_lock (and reacquire ->file_lock), 127 the fdtable pointer (fdt) must be loaded after locate_fd(). 128 129