1.. SPDX-License-Identifier: GPL-2.0
2.. _xfs_online_fsck_design:
3
4..
5        Mapping of heading styles within this document:
6        Heading 1 uses "====" above and below
7        Heading 2 uses "===="
8        Heading 3 uses "----"
9        Heading 4 uses "````"
10        Heading 5 uses "^^^^"
11        Heading 6 uses "~~~~"
12        Heading 7 uses "...."
13
14        Sections are manually numbered because apparently that's what everyone
15        does in the kernel.
16
17======================
18XFS Online Fsck Design
19======================
20
21This document captures the design of the online filesystem check feature for
22XFS.
23The purpose of this document is threefold:
24
25- To help kernel distributors understand exactly what the XFS online fsck
26  feature is, and issues about which they should be aware.
27
28- To help people reading the code to familiarize themselves with the relevant
29  concepts and design points before they start digging into the code.
30
31- To help developers maintaining the system by capturing the reasons
32  supporting higher level decision making.
33
34As the online fsck code is merged, the links in this document to topic branches
35will be replaced with links to code.
36
37This document is licensed under the terms of the GNU Public License, v2.
38The primary author is Darrick J. Wong.
39
40This design document is split into seven parts.
41Part 1 defines what fsck tools are and the motivations for writing a new one.
42Parts 2 and 3 present a high level overview of how online fsck process works
43and how it is tested to ensure correct functionality.
44Part 4 discusses the user interface and the intended usage modes of the new
45program.
46Parts 5 and 6 show off the high level components and how they fit together, and
47then present case studies of how each repair function actually works.
48Part 7 sums up what has been discussed so far and speculates about what else
49might be built atop online fsck.
50
51.. contents:: Table of Contents
52   :local:
53
541. What is a Filesystem Check?
55==============================
56
57A Unix filesystem has four main responsibilities:
58
59- Provide a hierarchy of names through which application programs can associate
60  arbitrary blobs of data for any length of time,
61
62- Virtualize physical storage media across those names, and
63
64- Retrieve the named data blobs at any time.
65
66- Examine resource usage.
67
68Metadata directly supporting these functions (e.g. files, directories, space
69mappings) are sometimes called primary metadata.
70Secondary metadata (e.g. reverse mapping and directory parent pointers) support
71operations internal to the filesystem, such as internal consistency checking
72and reorganization.
73Summary metadata, as the name implies, condense information contained in
74primary metadata for performance reasons.
75
76The filesystem check (fsck) tool examines all the metadata in a filesystem
77to look for errors.
78In addition to looking for obvious metadata corruptions, fsck also
79cross-references different types of metadata records with each other to look
80for inconsistencies.
81People do not like losing data, so most fsck tools also contains some ability
82to correct any problems found.
83As a word of caution -- the primary goal of most Linux fsck tools is to restore
84the filesystem metadata to a consistent state, not to maximize the data
85recovered.
86That precedent will not be challenged here.
87
88Filesystems of the 20th century generally lacked any redundancy in the ondisk
89format, which means that fsck can only respond to errors by erasing files until
90errors are no longer detected.
91More recent filesystem designs contain enough redundancy in their metadata that
92it is now possible to regenerate data structures when non-catastrophic errors
93occur; this capability aids both strategies.
94
95+--------------------------------------------------------------------------+
96| **Note**:                                                                |
97+--------------------------------------------------------------------------+
98| System administrators avoid data loss by increasing the number of        |
99| separate storage systems through the creation of backups; and they avoid |
100| downtime by increasing the redundancy of each storage system through the |
101| creation of RAID arrays.                                                 |
102| fsck tools address only the first problem.                               |
103+--------------------------------------------------------------------------+
104
105TLDR; Show Me the Code!
106-----------------------
107
108Code is posted to the kernel.org git trees as follows:
109`kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_,
110`userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and
111`QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_.
112Each kernel patchset adding an online repair function will use the same branch
113name across the kernel, xfsprogs, and fstests git repos.
114
115Existing Tools
116--------------
117
118The online fsck tool described here will be the third tool in the history of
119XFS (on Linux) to check and repair filesystems.
120Two programs precede it:
121
122The first program, ``xfs_check``, was created as part of the XFS debugger
123(``xfs_db``) and can only be used with unmounted filesystems.
124It walks all metadata in the filesystem looking for inconsistencies in the
125metadata, though it lacks any ability to repair what it finds.
126Due to its high memory requirements and inability to repair things, this
127program is now deprecated and will not be discussed further.
128
129The second program, ``xfs_repair``, was created to be faster and more robust
130than the first program.
131Like its predecessor, it can only be used with unmounted filesystems.
132It uses extent-based in-memory data structures to reduce memory consumption,
133and tries to schedule readahead IO appropriately to reduce I/O waiting time
134while it scans the metadata of the entire filesystem.
135The most important feature of this tool is its ability to respond to
136inconsistencies in file metadata and directory tree by erasing things as needed
137to eliminate problems.
138Space usage metadata are rebuilt from the observed file metadata.
139
140Problem Statement
141-----------------
142
143The current XFS tools leave several problems unsolved:
144
1451. **User programs** suddenly **lose access** to the filesystem when unexpected
146   shutdowns occur as a result of silent corruptions in the metadata.
147   These occur **unpredictably** and often without warning.
148
1492. **Users** experience a **total loss of service** during the recovery period
150   after an **unexpected shutdown** occurs.
151
1523. **Users** experience a **total loss of service** if the filesystem is taken
153   offline to **look for problems** proactively.
154
1554. **Data owners** cannot **check the integrity** of their stored data without
156   reading all of it.
157   This may expose them to substantial billing costs when a linear media scan
158   performed by the storage system administrator might suffice.
159
1605. **System administrators** cannot **schedule** a maintenance window to deal
161   with corruptions if they **lack the means** to assess filesystem health
162   while the filesystem is online.
163
1646. **Fleet monitoring tools** cannot **automate periodic checks** of filesystem
165   health when doing so requires **manual intervention** and downtime.
166
1677. **Users** can be tricked into **doing things they do not desire** when
168   malicious actors **exploit quirks of Unicode** to place misleading names
169   in directories.
170
171Given this definition of the problems to be solved and the actors who would
172benefit, the proposed solution is a third fsck tool that acts on a running
173filesystem.
174
175This new third program has three components: an in-kernel facility to check
176metadata, an in-kernel facility to repair metadata, and a userspace driver
177program to drive fsck activity on a live filesystem.
178``xfs_scrub`` is the name of the driver program.
179The rest of this document presents the goals and use cases of the new fsck
180tool, describes its major design points in connection to those goals, and
181discusses the similarities and differences with existing tools.
182
183+--------------------------------------------------------------------------+
184| **Note**:                                                                |
185+--------------------------------------------------------------------------+
186| Throughout this document, the existing offline fsck tool can also be     |
187| referred to by its current name "``xfs_repair``".                        |
188| The userspace driver program for the new online fsck tool can be         |
189| referred to as "``xfs_scrub``".                                          |
190| The kernel portion of online fsck that validates metadata is called      |
191| "online scrub", and portion of the kernel that fixes metadata is called  |
192| "online repair".                                                         |
193+--------------------------------------------------------------------------+
194
195The naming hierarchy is broken up into objects known as directories and files
196and the physical space is split into pieces known as allocation groups.
197Sharding enables better performance on highly parallel systems and helps to
198contain the damage when corruptions occur.
199The division of the filesystem into principal objects (allocation groups and
200inodes) means that there are ample opportunities to perform targeted checks and
201repairs on a subset of the filesystem.
202
203While this is going on, other parts continue processing IO requests.
204Even if a piece of filesystem metadata can only be regenerated by scanning the
205entire system, the scan can still be done in the background while other file
206operations continue.
207
208In summary, online fsck takes advantage of resource sharding and redundant
209metadata to enable targeted checking and repair operations while the system
210is running.
211This capability will be coupled to automatic system management so that
212autonomous self-healing of XFS maximizes service availability.
213