1898bd37aSMauro Carvalho Chehab==========================================
2898bd37aSMauro Carvalho ChehabExplicit volatile write back cache control
3898bd37aSMauro Carvalho Chehab==========================================
4898bd37aSMauro Carvalho Chehab
5898bd37aSMauro Carvalho ChehabIntroduction
6898bd37aSMauro Carvalho Chehab------------
7898bd37aSMauro Carvalho Chehab
8898bd37aSMauro Carvalho ChehabMany storage devices, especially in the consumer market, come with volatile
9898bd37aSMauro Carvalho Chehabwrite back caches.  That means the devices signal I/O completion to the
10898bd37aSMauro Carvalho Chehaboperating system before data actually has hit the non-volatile storage.  This
11898bd37aSMauro Carvalho Chehabbehavior obviously speeds up various workloads, but it means the operating
12898bd37aSMauro Carvalho Chehabsystem needs to force data out to the non-volatile storage when it performs
13898bd37aSMauro Carvalho Chehaba data integrity operation like fsync, sync or an unmount.
14898bd37aSMauro Carvalho Chehab
15898bd37aSMauro Carvalho ChehabThe Linux block layer provides two simple mechanisms that let filesystems
16898bd37aSMauro Carvalho Chehabcontrol the caching behavior of the storage device.  These mechanisms are
17898bd37aSMauro Carvalho Chehaba forced cache flush, and the Force Unit Access (FUA) flag for requests.
18898bd37aSMauro Carvalho Chehab
19898bd37aSMauro Carvalho Chehab
20898bd37aSMauro Carvalho ChehabExplicit cache flushes
21898bd37aSMauro Carvalho Chehab----------------------
22898bd37aSMauro Carvalho Chehab
23898bd37aSMauro Carvalho ChehabThe REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from
24898bd37aSMauro Carvalho Chehabthe filesystem and will make sure the volatile cache of the storage device
25898bd37aSMauro Carvalho Chehabhas been flushed before the actual I/O operation is started.  This explicitly
26898bd37aSMauro Carvalho Chehabguarantees that previously completed write requests are on non-volatile
27898bd37aSMauro Carvalho Chehabstorage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be
28898bd37aSMauro Carvalho Chehabset on an otherwise empty bio structure, which causes only an explicit cache
29898bd37aSMauro Carvalho Chehabflush without any dependent I/O.  It is recommend to use
30898bd37aSMauro Carvalho Chehabthe blkdev_issue_flush() helper for a pure cache flush.
31898bd37aSMauro Carvalho Chehab
32898bd37aSMauro Carvalho Chehab
33898bd37aSMauro Carvalho ChehabForced Unit Access
34898bd37aSMauro Carvalho Chehab------------------
35898bd37aSMauro Carvalho Chehab
36898bd37aSMauro Carvalho ChehabThe REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
37898bd37aSMauro Carvalho Chehabfilesystem and will make sure that I/O completion for this request is only
38898bd37aSMauro Carvalho Chehabsignaled after the data has been committed to non-volatile storage.
39898bd37aSMauro Carvalho Chehab
40898bd37aSMauro Carvalho Chehab
41898bd37aSMauro Carvalho ChehabImplementation details for filesystems
42898bd37aSMauro Carvalho Chehab--------------------------------------
43898bd37aSMauro Carvalho Chehab
44898bd37aSMauro Carvalho ChehabFilesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to
45898bd37aSMauro Carvalho Chehabworry if the underlying devices need any explicit cache flushing and how
46898bd37aSMauro Carvalho Chehabthe Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
47898bd37aSMauro Carvalho Chehabmay both be set on a single bio.
48898bd37aSMauro Carvalho Chehab
49898bd37aSMauro Carvalho Chehab
50c62b37d9SChristoph HellwigImplementation details for bio based block drivers
51898bd37aSMauro Carvalho Chehab--------------------------------------------------------------
52898bd37aSMauro Carvalho Chehab
53898bd37aSMauro Carvalho ChehabThese drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
54898bd37aSMauro Carvalho Chehabdirectly below the submit_bio interface.  For remapping drivers the REQ_FUA
55898bd37aSMauro Carvalho Chehabbits need to be propagated to underlying devices, and a global flush needs
56898bd37aSMauro Carvalho Chehabto be implemented for bios with the REQ_PREFLUSH bit set.  For real device
57898bd37aSMauro Carvalho Chehabdrivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
58898bd37aSMauro Carvalho Chehabon non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
59898bd37aSMauro Carvalho Chehabdata can be completed successfully without doing any work.  Drivers for
60898bd37aSMauro Carvalho Chehabdevices with volatile caches need to implement the support for these
61898bd37aSMauro Carvalho Chehabflags themselves without any help from the block layer.
62898bd37aSMauro Carvalho Chehab
63898bd37aSMauro Carvalho Chehab
64898bd37aSMauro Carvalho ChehabImplementation details for request_fn based block drivers
65898bd37aSMauro Carvalho Chehab---------------------------------------------------------
66898bd37aSMauro Carvalho Chehab
67898bd37aSMauro Carvalho ChehabFor devices that do not support volatile write caches there is no driver
68898bd37aSMauro Carvalho Chehabsupport required, the block layer completes empty REQ_PREFLUSH requests before
69898bd37aSMauro Carvalho Chehabentering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
70898bd37aSMauro Carvalho Chehabrequests that have a payload.  For devices with volatile write caches the
71898bd37aSMauro Carvalho Chehabdriver needs to tell the block layer that it supports flushing caches by
72898bd37aSMauro Carvalho Chehabdoing::
73898bd37aSMauro Carvalho Chehab
74898bd37aSMauro Carvalho Chehab	blk_queue_write_cache(sdkp->disk->queue, true, false);
75898bd37aSMauro Carvalho Chehab
76898bd37aSMauro Carvalho Chehaband handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
77898bd37aSMauro Carvalho ChehabREQ_PREFLUSH requests with a payload are automatically turned into a sequence
78898bd37aSMauro Carvalho Chehabof an empty REQ_OP_FLUSH request followed by the actual write by the block
79898bd37aSMauro Carvalho Chehablayer.  For devices that also support the FUA bit the block layer needs
80898bd37aSMauro Carvalho Chehabto be told to pass through the REQ_FUA bit using::
81898bd37aSMauro Carvalho Chehab
82898bd37aSMauro Carvalho Chehab	blk_queue_write_cache(sdkp->disk->queue, true, true);
83898bd37aSMauro Carvalho Chehab
84898bd37aSMauro Carvalho Chehaband the driver must handle write requests that have the REQ_FUA bit set
85898bd37aSMauro Carvalho Chehabin prep_fn/request_fn.  If the FUA bit is not natively supported the block
86898bd37aSMauro Carvalho Chehablayer turns it into an empty REQ_OP_FLUSH request after the actual write.
87