1*19024c09SMauro Carvalho Chehab======================== 2*19024c09SMauro Carvalho ChehabMMC Asynchronous Request 3*19024c09SMauro Carvalho Chehab======================== 4*19024c09SMauro Carvalho Chehab 5*19024c09SMauro Carvalho ChehabRationale 6*19024c09SMauro Carvalho Chehab========= 7*19024c09SMauro Carvalho Chehab 8*19024c09SMauro Carvalho ChehabHow significant is the cache maintenance overhead? 9*19024c09SMauro Carvalho Chehab 10*19024c09SMauro Carvalho ChehabIt depends. Fast eMMC and multiple cache levels with speculative cache 11*19024c09SMauro Carvalho Chehabpre-fetch makes the cache overhead relatively significant. If the DMA 12*19024c09SMauro Carvalho Chehabpreparations for the next request are done in parallel with the current 13*19024c09SMauro Carvalho Chehabtransfer, the DMA preparation overhead would not affect the MMC performance. 14*19024c09SMauro Carvalho Chehab 15*19024c09SMauro Carvalho ChehabThe intention of non-blocking (asynchronous) MMC requests is to minimize the 16*19024c09SMauro Carvalho Chehabtime between when an MMC request ends and another MMC request begins. 17*19024c09SMauro Carvalho Chehab 18*19024c09SMauro Carvalho ChehabUsing mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and 19*19024c09SMauro Carvalho Chehabdma_unmap_sg are processing. Using non-blocking MMC requests makes it 20*19024c09SMauro Carvalho Chehabpossible to prepare the caches for next job in parallel with an active 21*19024c09SMauro Carvalho ChehabMMC request. 22*19024c09SMauro Carvalho Chehab 23*19024c09SMauro Carvalho ChehabMMC block driver 24*19024c09SMauro Carvalho Chehab================ 25*19024c09SMauro Carvalho Chehab 26*19024c09SMauro Carvalho ChehabThe mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking. 27*19024c09SMauro Carvalho Chehab 28*19024c09SMauro Carvalho ChehabThe increase in throughput is proportional to the time it takes to 29*19024c09SMauro Carvalho Chehabprepare (major part of preparations are dma_map_sg() and dma_unmap_sg()) 30*19024c09SMauro Carvalho Chehaba request and how fast the memory is. The faster the MMC/SD is the 31*19024c09SMauro Carvalho Chehabmore significant the prepare request time becomes. Roughly the expected 32*19024c09SMauro Carvalho Chehabperformance gain is 5% for large writes and 10% on large reads on a L2 cache 33*19024c09SMauro Carvalho Chehabplatform. In power save mode, when clocks run on a lower frequency, the DMA 34*19024c09SMauro Carvalho Chehabpreparation may cost even more. As long as these slower preparations are run 35*19024c09SMauro Carvalho Chehabin parallel with the transfer performance won't be affected. 36*19024c09SMauro Carvalho Chehab 37*19024c09SMauro Carvalho ChehabDetails on measurements from IOZone and mmc_test 38*19024c09SMauro Carvalho Chehab================================================ 39*19024c09SMauro Carvalho Chehab 40*19024c09SMauro Carvalho Chehabhttps://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req 41*19024c09SMauro Carvalho Chehab 42*19024c09SMauro Carvalho ChehabMMC core API extension 43*19024c09SMauro Carvalho Chehab====================== 44*19024c09SMauro Carvalho Chehab 45*19024c09SMauro Carvalho ChehabThere is one new public function mmc_start_req(). 46*19024c09SMauro Carvalho Chehab 47*19024c09SMauro Carvalho ChehabIt starts a new MMC command request for a host. The function isn't 48*19024c09SMauro Carvalho Chehabtruly non-blocking. If there is an ongoing async request it waits 49*19024c09SMauro Carvalho Chehabfor completion of that request and starts the new one and returns. It 50*19024c09SMauro Carvalho Chehabdoesn't wait for the new request to complete. If there is no ongoing 51*19024c09SMauro Carvalho Chehabrequest it starts the new request and returns immediately. 52*19024c09SMauro Carvalho Chehab 53*19024c09SMauro Carvalho ChehabMMC host extensions 54*19024c09SMauro Carvalho Chehab=================== 55*19024c09SMauro Carvalho Chehab 56*19024c09SMauro Carvalho ChehabThere are two optional members in the mmc_host_ops -- pre_req() and 57*19024c09SMauro Carvalho Chehabpost_req() -- that the host driver may implement in order to move work 58*19024c09SMauro Carvalho Chehabto before and after the actual mmc_host_ops.request() function is called. 59*19024c09SMauro Carvalho Chehab 60*19024c09SMauro Carvalho ChehabIn the DMA case pre_req() may do dma_map_sg() and prepare the DMA 61*19024c09SMauro Carvalho Chehabdescriptor, and post_req() runs the dma_unmap_sg(). 62*19024c09SMauro Carvalho Chehab 63*19024c09SMauro Carvalho ChehabOptimize for the first request 64*19024c09SMauro Carvalho Chehab============================== 65*19024c09SMauro Carvalho Chehab 66*19024c09SMauro Carvalho ChehabThe first request in a series of requests can't be prepared in parallel 67*19024c09SMauro Carvalho Chehabwith the previous transfer, since there is no previous request. 68*19024c09SMauro Carvalho Chehab 69*19024c09SMauro Carvalho ChehabThe argument is_first_req in pre_req() indicates that there is no previous 70*19024c09SMauro Carvalho Chehabrequest. The host driver may optimize for this scenario to minimize 71*19024c09SMauro Carvalho Chehabthe performance loss. A way to optimize for this is to split the current 72*19024c09SMauro Carvalho Chehabrequest in two chunks, prepare the first chunk and start the request, 73*19024c09SMauro Carvalho Chehaband finally prepare the second chunk and start the transfer. 74*19024c09SMauro Carvalho Chehab 75*19024c09SMauro Carvalho ChehabPseudocode to handle is_first_req scenario with minimal prepare overhead:: 76*19024c09SMauro Carvalho Chehab 77*19024c09SMauro Carvalho Chehab if (is_first_req && req->size > threshold) 78*19024c09SMauro Carvalho Chehab /* start MMC transfer for the complete transfer size */ 79*19024c09SMauro Carvalho Chehab mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE); 80*19024c09SMauro Carvalho Chehab 81*19024c09SMauro Carvalho Chehab /* 82*19024c09SMauro Carvalho Chehab * Begin to prepare DMA while cmd is being processed by MMC. 83*19024c09SMauro Carvalho Chehab * The first chunk of the request should take the same time 84*19024c09SMauro Carvalho Chehab * to prepare as the "MMC process command time". 85*19024c09SMauro Carvalho Chehab * If prepare time exceeds MMC cmd time 86*19024c09SMauro Carvalho Chehab * the transfer is delayed, guesstimate max 4k as first chunk size. 87*19024c09SMauro Carvalho Chehab */ 88*19024c09SMauro Carvalho Chehab prepare_1st_chunk_for_dma(req); 89*19024c09SMauro Carvalho Chehab /* flush pending desc to the DMAC (dmaengine.h) */ 90*19024c09SMauro Carvalho Chehab dma_issue_pending(req->dma_desc); 91*19024c09SMauro Carvalho Chehab 92*19024c09SMauro Carvalho Chehab prepare_2nd_chunk_for_dma(req); 93*19024c09SMauro Carvalho Chehab /* 94*19024c09SMauro Carvalho Chehab * The second issue_pending should be called before MMC runs out 95*19024c09SMauro Carvalho Chehab * of the first chunk. If the MMC runs out of the first data chunk 96*19024c09SMauro Carvalho Chehab * before this call, the transfer is delayed. 97*19024c09SMauro Carvalho Chehab */ 98*19024c09SMauro Carvalho Chehab dma_issue_pending(req->dma_desc); 99