xref: /openbmc/linux/Documentation/admin-guide/filesystem-monitoring.rst (revision 03ab8e6297acd1bc0eedaa050e2a1635c576fd11)
1c0baf9acSGabriel Krisman Bertazi.. SPDX-License-Identifier: GPL-2.0
2c0baf9acSGabriel Krisman Bertazi
3c0baf9acSGabriel Krisman Bertazi====================================
4c0baf9acSGabriel Krisman BertaziFile system Monitoring with fanotify
5c0baf9acSGabriel Krisman Bertazi====================================
6c0baf9acSGabriel Krisman Bertazi
7c0baf9acSGabriel Krisman BertaziFile system Error Reporting
8c0baf9acSGabriel Krisman Bertazi===========================
9c0baf9acSGabriel Krisman Bertazi
10c0baf9acSGabriel Krisman BertaziFanotify supports the FAN_FS_ERROR event type for file system-wide error
11c0baf9acSGabriel Krisman Bertazireporting.  It is meant to be used by file system health monitoring
12c0baf9acSGabriel Krisman Bertazidaemons, which listen for these events and take actions (notify
13c0baf9acSGabriel Krisman Bertazisysadmin, start recovery) when a file system problem is detected.
14c0baf9acSGabriel Krisman Bertazi
15c0baf9acSGabriel Krisman BertaziBy design, a FAN_FS_ERROR notification exposes sufficient information
16c0baf9acSGabriel Krisman Bertazifor a monitoring tool to know a problem in the file system has happened.
17c0baf9acSGabriel Krisman BertaziIt doesn't necessarily provide a user space application with semantics
18c0baf9acSGabriel Krisman Bertazito verify an IO operation was successfully executed.  That is out of
19c0baf9acSGabriel Krisman Bertaziscope for this feature.  Instead, it is only meant as a framework for
20c0baf9acSGabriel Krisman Bertaziearly file system problem detection and reporting recovery tools.
21c0baf9acSGabriel Krisman Bertazi
22c0baf9acSGabriel Krisman BertaziWhen a file system operation fails, it is common for dozens of kernel
23c0baf9acSGabriel Krisman Bertazierrors to cascade after the initial failure, hiding the original failure
24c0baf9acSGabriel Krisman Bertazilog, which is usually the most useful debug data to troubleshoot the
25c0baf9acSGabriel Krisman Bertaziproblem.  For this reason, FAN_FS_ERROR tries to report only the first
26c0baf9acSGabriel Krisman Bertazierror that occurred for a file system since the last notification, and
27c0baf9acSGabriel Krisman Bertaziit simply counts additional errors.  This ensures that the most
28c0baf9acSGabriel Krisman Bertaziimportant pieces of information are never lost.
29c0baf9acSGabriel Krisman Bertazi
30c0baf9acSGabriel Krisman BertaziFAN_FS_ERROR requires the fanotify group to be setup with the
31c0baf9acSGabriel Krisman BertaziFAN_REPORT_FID flag.
32c0baf9acSGabriel Krisman Bertazi
33c0baf9acSGabriel Krisman BertaziAt the time of this writing, the only file system that emits FAN_FS_ERROR
34c0baf9acSGabriel Krisman Bertazinotifications is Ext4.
35c0baf9acSGabriel Krisman Bertazi
36c0baf9acSGabriel Krisman BertaziA FAN_FS_ERROR Notification has the following format::
37c0baf9acSGabriel Krisman Bertazi
38*9abeae5dSGabriel Krisman Bertazi  ::
39*9abeae5dSGabriel Krisman Bertazi
40c0baf9acSGabriel Krisman Bertazi     [ Notification Metadata (Mandatory) ]
41c0baf9acSGabriel Krisman Bertazi     [ Generic Error Record  (Mandatory) ]
42c0baf9acSGabriel Krisman Bertazi     [ FID record            (Mandatory) ]
43c0baf9acSGabriel Krisman Bertazi
44c0baf9acSGabriel Krisman BertaziThe order of records is not guaranteed, and new records might be added
45c0baf9acSGabriel Krisman Bertaziin the future.  Therefore, applications must not rely on the order and
46c0baf9acSGabriel Krisman Bertazimust be prepared to skip over unknown records. Please refer to
47c0baf9acSGabriel Krisman Bertazi``samples/fanotify/fs-monitor.c`` for an example parser.
48c0baf9acSGabriel Krisman Bertazi
49c0baf9acSGabriel Krisman BertaziGeneric error record
50c0baf9acSGabriel Krisman Bertazi--------------------
51c0baf9acSGabriel Krisman Bertazi
52c0baf9acSGabriel Krisman BertaziThe generic error record provides enough information for a file system
53c0baf9acSGabriel Krisman Bertaziagnostic tool to learn about a problem in the file system, without
54c0baf9acSGabriel Krisman Bertaziproviding any additional details about the problem.  This record is
55c0baf9acSGabriel Krisman Bertaziidentified by ``struct fanotify_event_info_header.info_type`` being set
56c0baf9acSGabriel Krisman Bertazito FAN_EVENT_INFO_TYPE_ERROR.
57c0baf9acSGabriel Krisman Bertazi
58*9abeae5dSGabriel Krisman Bertazi  ::
59*9abeae5dSGabriel Krisman Bertazi
60c0baf9acSGabriel Krisman Bertazi     struct fanotify_event_info_error {
61c0baf9acSGabriel Krisman Bertazi          struct fanotify_event_info_header hdr;
62c0baf9acSGabriel Krisman Bertazi         __s32 error;
63c0baf9acSGabriel Krisman Bertazi         __u32 error_count;
64c0baf9acSGabriel Krisman Bertazi     };
65c0baf9acSGabriel Krisman Bertazi
66c0baf9acSGabriel Krisman BertaziThe `error` field identifies the type of error using errno values.
67c0baf9acSGabriel Krisman Bertazi`error_count` tracks the number of errors that occurred and were
68c0baf9acSGabriel Krisman Bertazisuppressed to preserve the original error information, since the last
69c0baf9acSGabriel Krisman Bertazinotification.
70c0baf9acSGabriel Krisman Bertazi
71c0baf9acSGabriel Krisman BertaziFID record
72c0baf9acSGabriel Krisman Bertazi----------
73c0baf9acSGabriel Krisman Bertazi
74c0baf9acSGabriel Krisman BertaziThe FID record can be used to uniquely identify the inode that triggered
75c0baf9acSGabriel Krisman Bertazithe error through the combination of fsid and file handle.  A file system
76c0baf9acSGabriel Krisman Bertazispecific application can use that information to attempt a recovery
77c0baf9acSGabriel Krisman Bertaziprocedure.  Errors that are not related to an inode are reported with an
78c0baf9acSGabriel Krisman Bertaziempty file handle of type FILEID_INVALID.
79