.. SPDX-License-Identifier: GPL-2.0 .. _xfs_online_fsck_design: .. Mapping of heading styles within this document: Heading 1 uses "====" above and below Heading 2 uses "====" Heading 3 uses "----" Heading 4 uses "````" Heading 5 uses "^^^^" Heading 6 uses "~~~~" Heading 7 uses "...." Sections are manually numbered because apparently that's what everyone does in the kernel. ====================== XFS Online Fsck Design ====================== This document captures the design of the online filesystem check feature for XFS. The purpose of this document is threefold: - To help kernel distributors understand exactly what the XFS online fsck feature is, and issues about which they should be aware. - To help people reading the code to familiarize themselves with the relevant concepts and design points before they start digging into the code. - To help developers maintaining the system by capturing the reasons supporting higher level decision making. As the online fsck code is merged, the links in this document to topic branches will be replaced with links to code. This document is licensed under the terms of the GNU Public License, v2. The primary author is Darrick J. Wong. This design document is split into seven parts. Part 1 defines what fsck tools are and the motivations for writing a new one. Parts 2 and 3 present a high level overview of how online fsck process works and how it is tested to ensure correct functionality. Part 4 discusses the user interface and the intended usage modes of the new program. Parts 5 and 6 show off the high level components and how they fit together, and then present case studies of how each repair function actually works. Part 7 sums up what has been discussed so far and speculates about what else might be built atop online fsck. .. contents:: Table of Contents :local: 1. What is a Filesystem Check? ============================== A Unix filesystem has four main responsibilities: - Provide a hierarchy of names through which application programs can associate arbitrary blobs of data for any length of time, - Virtualize physical storage media across those names, and - Retrieve the named data blobs at any time. - Examine resource usage. Metadata directly supporting these functions (e.g. files, directories, space mappings) are sometimes called primary metadata. Secondary metadata (e.g. reverse mapping and directory parent pointers) support operations internal to the filesystem, such as internal consistency checking and reorganization. Summary metadata, as the name implies, condense information contained in primary metadata for performance reasons. The filesystem check (fsck) tool examines all the metadata in a filesystem to look for errors. In addition to looking for obvious metadata corruptions, fsck also cross-references different types of metadata records with each other to look for inconsistencies. People do not like losing data, so most fsck tools also contains some ability to correct any problems found. As a word of caution -- the primary goal of most Linux fsck tools is to restore the filesystem metadata to a consistent state, not to maximize the data recovered. That precedent will not be challenged here. Filesystems of the 20th century generally lacked any redundancy in the ondisk format, which means that fsck can only respond to errors by erasing files until errors are no longer detected. More recent filesystem designs contain enough redundancy in their metadata that it is now possible to regenerate data structures when non-catastrophic errors occur; this capability aids both strategies. +--------------------------------------------------------------------------+ | **Note**: | +--------------------------------------------------------------------------+ | System administrators avoid data loss by increasing the number of | | separate storage systems through the creation of backups; and they avoid | | downtime by increasing the redundancy of each storage system through the | | creation of RAID arrays. | | fsck tools address only the first problem. | +--------------------------------------------------------------------------+ TLDR; Show Me the Code! ----------------------- Code is posted to the kernel.org git trees as follows: `kernel changes `_, `userspace changes `_, and `QA test changes `_. Each kernel patchset adding an online repair function will use the same branch name across the kernel, xfsprogs, and fstests git repos. Existing Tools -------------- The online fsck tool described here will be the third tool in the history of XFS (on Linux) to check and repair filesystems. Two programs precede it: The first program, ``xfs_check``, was created as part of the XFS debugger (``xfs_db``) and can only be used with unmounted filesystems. It walks all metadata in the filesystem looking for inconsistencies in the metadata, though it lacks any ability to repair what it finds. Due to its high memory requirements and inability to repair things, this program is now deprecated and will not be discussed further. The second program, ``xfs_repair``, was created to be faster and more robust than the first program. Like its predecessor, it can only be used with unmounted filesystems. It uses extent-based in-memory data structures to reduce memory consumption, and tries to schedule readahead IO appropriately to reduce I/O waiting time while it scans the metadata of the entire filesystem. The most important feature of this tool is its ability to respond to inconsistencies in file metadata and directory tree by erasing things as needed to eliminate problems. Space usage metadata are rebuilt from the observed file metadata. Problem Statement ----------------- The current XFS tools leave several problems unsolved: 1. **User programs** suddenly **lose access** to the filesystem when unexpected shutdowns occur as a result of silent corruptions in the metadata. These occur **unpredictably** and often without warning. 2. **Users** experience a **total loss of service** during the recovery period after an **unexpected shutdown** occurs. 3. **Users** experience a **total loss of service** if the filesystem is taken offline to **look for problems** proactively. 4. **Data owners** cannot **check the integrity** of their stored data without reading all of it. This may expose them to substantial billing costs when a linear media scan performed by the storage system administrator might suffice. 5. **System administrators** cannot **schedule** a maintenance window to deal with corruptions if they **lack the means** to assess filesystem health while the filesystem is online. 6. **Fleet monitoring tools** cannot **automate periodic checks** of filesystem health when doing so requires **manual intervention** and downtime. 7. **Users** can be tricked into **doing things they do not desire** when malicious actors **exploit quirks of Unicode** to place misleading names in directories. Given this definition of the problems to be solved and the actors who would benefit, the proposed solution is a third fsck tool that acts on a running filesystem. This new third program has three components: an in-kernel facility to check metadata, an in-kernel facility to repair metadata, and a userspace driver program to drive fsck activity on a live filesystem. ``xfs_scrub`` is the name of the driver program. The rest of this document presents the goals and use cases of the new fsck tool, describes its major design points in connection to those goals, and discusses the similarities and differences with existing tools. +--------------------------------------------------------------------------+ | **Note**: | +--------------------------------------------------------------------------+ | Throughout this document, the existing offline fsck tool can also be | | referred to by its current name "``xfs_repair``". | | The userspace driver program for the new online fsck tool can be | | referred to as "``xfs_scrub``". | | The kernel portion of online fsck that validates metadata is called | | "online scrub", and portion of the kernel that fixes metadata is called | | "online repair". | +--------------------------------------------------------------------------+ The naming hierarchy is broken up into objects known as directories and files and the physical space is split into pieces known as allocation groups. Sharding enables better performance on highly parallel systems and helps to contain the damage when corruptions occur. The division of the filesystem into principal objects (allocation groups and inodes) means that there are ample opportunities to perform targeted checks and repairs on a subset of the filesystem. While this is going on, other parts continue processing IO requests. Even if a piece of filesystem metadata can only be regenerated by scanning the entire system, the scan can still be done in the background while other file operations continue. In summary, online fsck takes advantage of resource sharding and redundant metadata to enable targeted checking and repair operations while the system is running. This capability will be coupled to automatic system management so that autonomous self-healing of XFS maximizes service availability.