1.. SPDX-License-Identifier: GPL-2.0 2.. _xfs_online_fsck_design: 3 4.. 5 Mapping of heading styles within this document: 6 Heading 1 uses "====" above and below 7 Heading 2 uses "====" 8 Heading 3 uses "----" 9 Heading 4 uses "````" 10 Heading 5 uses "^^^^" 11 Heading 6 uses "~~~~" 12 Heading 7 uses "...." 13 14 Sections are manually numbered because apparently that's what everyone 15 does in the kernel. 16 17====================== 18XFS Online Fsck Design 19====================== 20 21This document captures the design of the online filesystem check feature for 22XFS. 23The purpose of this document is threefold: 24 25- To help kernel distributors understand exactly what the XFS online fsck 26 feature is, and issues about which they should be aware. 27 28- To help people reading the code to familiarize themselves with the relevant 29 concepts and design points before they start digging into the code. 30 31- To help developers maintaining the system by capturing the reasons 32 supporting higher level decision making. 33 34As the online fsck code is merged, the links in this document to topic branches 35will be replaced with links to code. 36 37This document is licensed under the terms of the GNU Public License, v2. 38The primary author is Darrick J. Wong. 39 40This design document is split into seven parts. 41Part 1 defines what fsck tools are and the motivations for writing a new one. 42Parts 2 and 3 present a high level overview of how online fsck process works 43and how it is tested to ensure correct functionality. 44Part 4 discusses the user interface and the intended usage modes of the new 45program. 46Parts 5 and 6 show off the high level components and how they fit together, and 47then present case studies of how each repair function actually works. 48Part 7 sums up what has been discussed so far and speculates about what else 49might be built atop online fsck. 50 51.. contents:: Table of Contents 52 :local: 53 541. What is a Filesystem Check? 55============================== 56 57A Unix filesystem has four main responsibilities: 58 59- Provide a hierarchy of names through which application programs can associate 60 arbitrary blobs of data for any length of time, 61 62- Virtualize physical storage media across those names, and 63 64- Retrieve the named data blobs at any time. 65 66- Examine resource usage. 67 68Metadata directly supporting these functions (e.g. files, directories, space 69mappings) are sometimes called primary metadata. 70Secondary metadata (e.g. reverse mapping and directory parent pointers) support 71operations internal to the filesystem, such as internal consistency checking 72and reorganization. 73Summary metadata, as the name implies, condense information contained in 74primary metadata for performance reasons. 75 76The filesystem check (fsck) tool examines all the metadata in a filesystem 77to look for errors. 78In addition to looking for obvious metadata corruptions, fsck also 79cross-references different types of metadata records with each other to look 80for inconsistencies. 81People do not like losing data, so most fsck tools also contains some ability 82to correct any problems found. 83As a word of caution -- the primary goal of most Linux fsck tools is to restore 84the filesystem metadata to a consistent state, not to maximize the data 85recovered. 86That precedent will not be challenged here. 87 88Filesystems of the 20th century generally lacked any redundancy in the ondisk 89format, which means that fsck can only respond to errors by erasing files until 90errors are no longer detected. 91More recent filesystem designs contain enough redundancy in their metadata that 92it is now possible to regenerate data structures when non-catastrophic errors 93occur; this capability aids both strategies. 94 95+--------------------------------------------------------------------------+ 96| **Note**: | 97+--------------------------------------------------------------------------+ 98| System administrators avoid data loss by increasing the number of | 99| separate storage systems through the creation of backups; and they avoid | 100| downtime by increasing the redundancy of each storage system through the | 101| creation of RAID arrays. | 102| fsck tools address only the first problem. | 103+--------------------------------------------------------------------------+ 104 105TLDR; Show Me the Code! 106----------------------- 107 108Code is posted to the kernel.org git trees as follows: 109`kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_, 110`userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and 111`QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_. 112Each kernel patchset adding an online repair function will use the same branch 113name across the kernel, xfsprogs, and fstests git repos. 114 115Existing Tools 116-------------- 117 118The online fsck tool described here will be the third tool in the history of 119XFS (on Linux) to check and repair filesystems. 120Two programs precede it: 121 122The first program, ``xfs_check``, was created as part of the XFS debugger 123(``xfs_db``) and can only be used with unmounted filesystems. 124It walks all metadata in the filesystem looking for inconsistencies in the 125metadata, though it lacks any ability to repair what it finds. 126Due to its high memory requirements and inability to repair things, this 127program is now deprecated and will not be discussed further. 128 129The second program, ``xfs_repair``, was created to be faster and more robust 130than the first program. 131Like its predecessor, it can only be used with unmounted filesystems. 132It uses extent-based in-memory data structures to reduce memory consumption, 133and tries to schedule readahead IO appropriately to reduce I/O waiting time 134while it scans the metadata of the entire filesystem. 135The most important feature of this tool is its ability to respond to 136inconsistencies in file metadata and directory tree by erasing things as needed 137to eliminate problems. 138Space usage metadata are rebuilt from the observed file metadata. 139 140Problem Statement 141----------------- 142 143The current XFS tools leave several problems unsolved: 144 1451. **User programs** suddenly **lose access** to the filesystem when unexpected 146 shutdowns occur as a result of silent corruptions in the metadata. 147 These occur **unpredictably** and often without warning. 148 1492. **Users** experience a **total loss of service** during the recovery period 150 after an **unexpected shutdown** occurs. 151 1523. **Users** experience a **total loss of service** if the filesystem is taken 153 offline to **look for problems** proactively. 154 1554. **Data owners** cannot **check the integrity** of their stored data without 156 reading all of it. 157 This may expose them to substantial billing costs when a linear media scan 158 performed by the storage system administrator might suffice. 159 1605. **System administrators** cannot **schedule** a maintenance window to deal 161 with corruptions if they **lack the means** to assess filesystem health 162 while the filesystem is online. 163 1646. **Fleet monitoring tools** cannot **automate periodic checks** of filesystem 165 health when doing so requires **manual intervention** and downtime. 166 1677. **Users** can be tricked into **doing things they do not desire** when 168 malicious actors **exploit quirks of Unicode** to place misleading names 169 in directories. 170 171Given this definition of the problems to be solved and the actors who would 172benefit, the proposed solution is a third fsck tool that acts on a running 173filesystem. 174 175This new third program has three components: an in-kernel facility to check 176metadata, an in-kernel facility to repair metadata, and a userspace driver 177program to drive fsck activity on a live filesystem. 178``xfs_scrub`` is the name of the driver program. 179The rest of this document presents the goals and use cases of the new fsck 180tool, describes its major design points in connection to those goals, and 181discusses the similarities and differences with existing tools. 182 183+--------------------------------------------------------------------------+ 184| **Note**: | 185+--------------------------------------------------------------------------+ 186| Throughout this document, the existing offline fsck tool can also be | 187| referred to by its current name "``xfs_repair``". | 188| The userspace driver program for the new online fsck tool can be | 189| referred to as "``xfs_scrub``". | 190| The kernel portion of online fsck that validates metadata is called | 191| "online scrub", and portion of the kernel that fixes metadata is called | 192| "online repair". | 193+--------------------------------------------------------------------------+ 194 195The naming hierarchy is broken up into objects known as directories and files 196and the physical space is split into pieces known as allocation groups. 197Sharding enables better performance on highly parallel systems and helps to 198contain the damage when corruptions occur. 199The division of the filesystem into principal objects (allocation groups and 200inodes) means that there are ample opportunities to perform targeted checks and 201repairs on a subset of the filesystem. 202 203While this is going on, other parts continue processing IO requests. 204Even if a piece of filesystem metadata can only be regenerated by scanning the 205entire system, the scan can still be done in the background while other file 206operations continue. 207 208In summary, online fsck takes advantage of resource sharding and redundant 209metadata to enable targeted checking and repair operations while the system 210is running. 211This capability will be coupled to automatic system management so that 212autonomous self-healing of XFS maximizes service availability. 213