xref: /openbmc/linux/Documentation/driver-api/mmc/mmc-async-req.rst (revision 0898782247ae533d1f4e47a06bc5d4870931b284)
1*19024c09SMauro Carvalho Chehab========================
2*19024c09SMauro Carvalho ChehabMMC Asynchronous Request
3*19024c09SMauro Carvalho Chehab========================
4*19024c09SMauro Carvalho Chehab
5*19024c09SMauro Carvalho ChehabRationale
6*19024c09SMauro Carvalho Chehab=========
7*19024c09SMauro Carvalho Chehab
8*19024c09SMauro Carvalho ChehabHow significant is the cache maintenance overhead?
9*19024c09SMauro Carvalho Chehab
10*19024c09SMauro Carvalho ChehabIt depends. Fast eMMC and multiple cache levels with speculative cache
11*19024c09SMauro Carvalho Chehabpre-fetch makes the cache overhead relatively significant. If the DMA
12*19024c09SMauro Carvalho Chehabpreparations for the next request are done in parallel with the current
13*19024c09SMauro Carvalho Chehabtransfer, the DMA preparation overhead would not affect the MMC performance.
14*19024c09SMauro Carvalho Chehab
15*19024c09SMauro Carvalho ChehabThe intention of non-blocking (asynchronous) MMC requests is to minimize the
16*19024c09SMauro Carvalho Chehabtime between when an MMC request ends and another MMC request begins.
17*19024c09SMauro Carvalho Chehab
18*19024c09SMauro Carvalho ChehabUsing mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
19*19024c09SMauro Carvalho Chehabdma_unmap_sg are processing. Using non-blocking MMC requests makes it
20*19024c09SMauro Carvalho Chehabpossible to prepare the caches for next job in parallel with an active
21*19024c09SMauro Carvalho ChehabMMC request.
22*19024c09SMauro Carvalho Chehab
23*19024c09SMauro Carvalho ChehabMMC block driver
24*19024c09SMauro Carvalho Chehab================
25*19024c09SMauro Carvalho Chehab
26*19024c09SMauro Carvalho ChehabThe mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
27*19024c09SMauro Carvalho Chehab
28*19024c09SMauro Carvalho ChehabThe increase in throughput is proportional to the time it takes to
29*19024c09SMauro Carvalho Chehabprepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
30*19024c09SMauro Carvalho Chehaba request and how fast the memory is. The faster the MMC/SD is the
31*19024c09SMauro Carvalho Chehabmore significant the prepare request time becomes. Roughly the expected
32*19024c09SMauro Carvalho Chehabperformance gain is 5% for large writes and 10% on large reads on a L2 cache
33*19024c09SMauro Carvalho Chehabplatform. In power save mode, when clocks run on a lower frequency, the DMA
34*19024c09SMauro Carvalho Chehabpreparation may cost even more. As long as these slower preparations are run
35*19024c09SMauro Carvalho Chehabin parallel with the transfer performance won't be affected.
36*19024c09SMauro Carvalho Chehab
37*19024c09SMauro Carvalho ChehabDetails on measurements from IOZone and mmc_test
38*19024c09SMauro Carvalho Chehab================================================
39*19024c09SMauro Carvalho Chehab
40*19024c09SMauro Carvalho Chehabhttps://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
41*19024c09SMauro Carvalho Chehab
42*19024c09SMauro Carvalho ChehabMMC core API extension
43*19024c09SMauro Carvalho Chehab======================
44*19024c09SMauro Carvalho Chehab
45*19024c09SMauro Carvalho ChehabThere is one new public function mmc_start_req().
46*19024c09SMauro Carvalho Chehab
47*19024c09SMauro Carvalho ChehabIt starts a new MMC command request for a host. The function isn't
48*19024c09SMauro Carvalho Chehabtruly non-blocking. If there is an ongoing async request it waits
49*19024c09SMauro Carvalho Chehabfor completion of that request and starts the new one and returns. It
50*19024c09SMauro Carvalho Chehabdoesn't wait for the new request to complete. If there is no ongoing
51*19024c09SMauro Carvalho Chehabrequest it starts the new request and returns immediately.
52*19024c09SMauro Carvalho Chehab
53*19024c09SMauro Carvalho ChehabMMC host extensions
54*19024c09SMauro Carvalho Chehab===================
55*19024c09SMauro Carvalho Chehab
56*19024c09SMauro Carvalho ChehabThere are two optional members in the mmc_host_ops -- pre_req() and
57*19024c09SMauro Carvalho Chehabpost_req() -- that the host driver may implement in order to move work
58*19024c09SMauro Carvalho Chehabto before and after the actual mmc_host_ops.request() function is called.
59*19024c09SMauro Carvalho Chehab
60*19024c09SMauro Carvalho ChehabIn the DMA case pre_req() may do dma_map_sg() and prepare the DMA
61*19024c09SMauro Carvalho Chehabdescriptor, and post_req() runs the dma_unmap_sg().
62*19024c09SMauro Carvalho Chehab
63*19024c09SMauro Carvalho ChehabOptimize for the first request
64*19024c09SMauro Carvalho Chehab==============================
65*19024c09SMauro Carvalho Chehab
66*19024c09SMauro Carvalho ChehabThe first request in a series of requests can't be prepared in parallel
67*19024c09SMauro Carvalho Chehabwith the previous transfer, since there is no previous request.
68*19024c09SMauro Carvalho Chehab
69*19024c09SMauro Carvalho ChehabThe argument is_first_req in pre_req() indicates that there is no previous
70*19024c09SMauro Carvalho Chehabrequest. The host driver may optimize for this scenario to minimize
71*19024c09SMauro Carvalho Chehabthe performance loss. A way to optimize for this is to split the current
72*19024c09SMauro Carvalho Chehabrequest in two chunks, prepare the first chunk and start the request,
73*19024c09SMauro Carvalho Chehaband finally prepare the second chunk and start the transfer.
74*19024c09SMauro Carvalho Chehab
75*19024c09SMauro Carvalho ChehabPseudocode to handle is_first_req scenario with minimal prepare overhead::
76*19024c09SMauro Carvalho Chehab
77*19024c09SMauro Carvalho Chehab  if (is_first_req && req->size > threshold)
78*19024c09SMauro Carvalho Chehab     /* start MMC transfer for the complete transfer size */
79*19024c09SMauro Carvalho Chehab     mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
80*19024c09SMauro Carvalho Chehab
81*19024c09SMauro Carvalho Chehab     /*
82*19024c09SMauro Carvalho Chehab      * Begin to prepare DMA while cmd is being processed by MMC.
83*19024c09SMauro Carvalho Chehab      * The first chunk of the request should take the same time
84*19024c09SMauro Carvalho Chehab      * to prepare as the "MMC process command time".
85*19024c09SMauro Carvalho Chehab      * If prepare time exceeds MMC cmd time
86*19024c09SMauro Carvalho Chehab      * the transfer is delayed, guesstimate max 4k as first chunk size.
87*19024c09SMauro Carvalho Chehab      */
88*19024c09SMauro Carvalho Chehab      prepare_1st_chunk_for_dma(req);
89*19024c09SMauro Carvalho Chehab      /* flush pending desc to the DMAC (dmaengine.h) */
90*19024c09SMauro Carvalho Chehab      dma_issue_pending(req->dma_desc);
91*19024c09SMauro Carvalho Chehab
92*19024c09SMauro Carvalho Chehab      prepare_2nd_chunk_for_dma(req);
93*19024c09SMauro Carvalho Chehab      /*
94*19024c09SMauro Carvalho Chehab       * The second issue_pending should be called before MMC runs out
95*19024c09SMauro Carvalho Chehab       * of the first chunk. If the MMC runs out of the first data chunk
96*19024c09SMauro Carvalho Chehab       * before this call, the transfer is delayed.
97*19024c09SMauro Carvalho Chehab       */
98*19024c09SMauro Carvalho Chehab      dma_issue_pending(req->dma_desc);
99