1.. SPDX-License-Identifier: GPL-2.0 2 3======= 4SCSI EH 5======= 6 7This document describes SCSI midlayer error handling infrastructure. 8Please refer to Documentation/scsi/scsi_mid_low_api.rst for more 9information regarding SCSI midlayer. 10 11.. TABLE OF CONTENTS 12 13 [1] How SCSI commands travel through the midlayer and to EH 14 [1-1] struct scsi_cmnd 15 [1-2] How do scmd's get completed? 16 [1-2-1] Completing a scmd w/ scsi_done 17 [1-2-2] Completing a scmd w/ timeout 18 [1-3] How EH takes over 19 [2] How SCSI EH works 20 [2-1] EH through fine-grained callbacks 21 [2-1-1] Overview 22 [2-1-2] Flow of scmds through EH 23 [2-1-3] Flow of control 24 [2-2] EH through transportt->eh_strategy_handler() 25 [2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions 26 [2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions 27 [2-2-3] Things to consider 28 29 301. How SCSI commands travel through the midlayer and to EH 31========================================================== 32 331.1 struct scsi_cmnd 34-------------------- 35 36Each SCSI command is represented with struct scsi_cmnd (== scmd). A 37scmd has two list_head's to link itself into lists. The two are 38scmd->list and scmd->eh_entry. The former is used for free list or 39per-device allocated scmd list and not of much interest to this EH 40discussion. The latter is used for completion and EH lists and unless 41otherwise stated scmds are always linked using scmd->eh_entry in this 42discussion. 43 44 451.2 How do scmd's get completed? 46-------------------------------- 47 48Once LLDD gets hold of a scmd, either the LLDD will complete the 49command by calling scsi_done callback passed from midlayer when 50invoking hostt->queuecommand() or the block layer will time it out. 51 52 531.2.1 Completing a scmd w/ scsi_done 54^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 55 56For all non-EH commands, scsi_done() is the completion callback. It 57just calls blk_complete_request() to delete the block layer timer and 58raise SCSI_SOFTIRQ 59 60SCSI_SOFTIRQ handler scsi_softirq calls scsi_decide_disposition() to 61determine what to do with the command. scsi_decide_disposition() 62looks at the scmd->result value and sense data to determine what to do 63with the command. 64 65 - SUCCESS 66 67 scsi_finish_command() is invoked for the command. The 68 function does some maintenance chores and then calls 69 scsi_io_completion() to finish the I/O. 70 scsi_io_completion() then notifies the block layer on 71 the completed request by calling blk_end_request and 72 friends or figures out what to do with the remainder 73 of the data in case of an error. 74 75 - NEEDS_RETRY 76 77 - ADD_TO_MLQUEUE 78 79 scmd is requeued to blk queue. 80 81 - otherwise 82 83 scsi_eh_scmd_add(scmd) is invoked for the command. See 84 [1-3] for details of this function. 85 86 871.2.2 Completing a scmd w/ timeout 88^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 89 90The timeout handler is scsi_timeout(). When a timeout occurs, this function 91 92 1. invokes optional hostt->eh_timed_out() callback. Return value can 93 be one of 94 95 - BLK_EH_RESET_TIMER 96 This indicates that more time is required to finish the 97 command. Timer is restarted. 98 99 - BLK_EH_DONE 100 eh_timed_out() callback did not handle the command. 101 Step #2 is taken. 102 103 2. scsi_abort_command() is invoked to schedule an asynchronous abort which may 104 issue a retry scmd->allowed + 1 times. Asynchronous aborts are not invoked 105 for commands for which the SCSI_EH_ABORT_SCHEDULED flag is set (this 106 indicates that the command already had been aborted once, and this is a 107 retry which failed), when retries are exceeded, or when the EH deadline is 108 expired. In these cases Step #3 is taken. 109 110 3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the 111 command. See [1-4] for more information. 112 1131.3 Asynchronous command aborts 114------------------------------- 115 116 After a timeout occurs a command abort is scheduled from 117 scsi_abort_command(). If the abort is successful the command 118 will either be retried (if the number of retries is not exhausted) 119 or terminated with DID_TIME_OUT. 120 121 Otherwise scsi_eh_scmd_add() is invoked for the command. 122 See [1-4] for more information. 123 1241.4 How EH takes over 125--------------------- 126 127scmds enter EH via scsi_eh_scmd_add(), which does the following. 128 129 1. Links scmd->eh_entry to shost->eh_cmd_q 130 131 2. Sets SHOST_RECOVERY bit in shost->shost_state 132 133 3. Increments shost->host_failed 134 135 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed 136 137As can be seen above, once any scmd is added to shost->eh_cmd_q, 138SHOST_RECOVERY shost_state bit is turned on. This prevents any new 139scmd to be issued from blk queue to the host; eventually, all scmds on 140the host either complete normally, fail and get added to eh_cmd_q, or 141time out and get added to shost->eh_cmd_q. 142 143If all scmds either complete or fail, the number of in-flight scmds 144becomes equal to the number of failed scmds - i.e. shost->host_busy == 145shost->host_failed. This wakes up SCSI EH thread. So, once woken up, 146SCSI EH thread can expect that all in-flight commands have failed and 147are linked on shost->eh_cmd_q. 148 149Note that this does not mean lower layers are quiescent. If a LLDD 150completed a scmd with error status, the LLDD and lower layers are 151assumed to forget about the scmd at that point. However, if a scmd 152has timed out, unless hostt->eh_timed_out() made lower layers forget 153about the scmd, which currently no LLDD does, the command is still 154active as long as lower layers are concerned and completion could 155occur at any time. Of course, all such completions are ignored as the 156timer has already expired. 157 158We'll talk about how SCSI EH takes actions to abort - make LLDD 159forget about - timed out scmds later. 160 161 1622. How SCSI EH works 163==================== 164 165LLDD's can implement SCSI EH actions in one of the following two 166ways. 167 168 - Fine-grained EH callbacks 169 LLDD can implement fine-grained EH callbacks and let SCSI 170 midlayer drive error handling and call appropriate callbacks. 171 This will be discussed further in [2-1]. 172 173 - eh_strategy_handler() callback 174 This is one big callback which should perform whole error 175 handling. As such, it should do all chores the SCSI midlayer 176 performs during recovery. This will be discussed in [2-2]. 177 178Once recovery is complete, SCSI EH resumes normal operation by 179calling scsi_restart_operations(), which 180 181 1. Checks if door locking is needed and locks door. 182 183 2. Clears SHOST_RECOVERY shost_state bit 184 185 3. Wakes up waiters on shost->host_wait. This occurs if someone 186 calls scsi_block_when_processing_errors() on the host. 187 (*QUESTION* why is it needed? All operations will be blocked 188 anyway after it reaches blk queue.) 189 190 4. Kicks queues in all devices on the host in the asses 191 192 1932.1 EH through fine-grained callbacks 194------------------------------------- 195 1962.1.1 Overview 197^^^^^^^^^^^^^^ 198 199If eh_strategy_handler() is not present, SCSI midlayer takes charge 200of driving error handling. EH's goals are two - make LLDD, host and 201device forget about timed out scmds and make them ready for new 202commands. A scmd is said to be recovered if the scmd is forgotten by 203lower layers and lower layers are ready to process or fail the scmd 204again. 205 206To achieve these goals, EH performs recovery actions with increasing 207severity. Some actions are performed by issuing SCSI commands and 208others are performed by invoking one of the following fine-grained 209hostt EH callbacks. Callbacks may be omitted and omitted ones are 210considered to fail always. 211 212:: 213 214 int (* eh_abort_handler)(struct scsi_cmnd *); 215 int (* eh_device_reset_handler)(struct scsi_cmnd *); 216 int (* eh_bus_reset_handler)(struct scsi_cmnd *); 217 int (* eh_host_reset_handler)(struct scsi_cmnd *); 218 219Higher-severity actions are taken only when lower-severity actions 220cannot recover some of failed scmds. Also, note that failure of the 221highest-severity action means EH failure and results in offlining of 222all unrecovered devices. 223 224During recovery, the following rules are followed 225 226 - Recovery actions are performed on failed scmds on the to do list, 227 eh_work_q. If a recovery action succeeds for a scmd, recovered 228 scmds are removed from eh_work_q. 229 230 Note that single recovery action on a scmd can recover multiple 231 scmds. e.g. resetting a device recovers all failed scmds on the 232 device. 233 234 - Higher severity actions are taken iff eh_work_q is not empty after 235 lower severity actions are complete. 236 237 - EH reuses failed scmds to issue commands for recovery. For 238 timed-out scmds, SCSI EH ensures that LLDD forgets about a scmd 239 before reusing it for EH commands. 240 241When a scmd is recovered, the scmd is moved from eh_work_q to EH 242local eh_done_q using scsi_eh_finish_cmd(). After all scmds are 243recovered (eh_work_q is empty), scsi_eh_flush_done_q() is invoked to 244either retry or error-finish (notify upper layer of failure) recovered 245scmds. 246 247scmds are retried iff its sdev is still online (not offlined during 248EH), REQ_FAILFAST is not set and ++scmd->retries is less than 249scmd->allowed. 250 251 2522.1.2 Flow of scmds through EH 253^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 254 255 1. Error completion / time out 256 257 :ACTION: scsi_eh_scmd_add() is invoked for scmd 258 259 - add scmd to shost->eh_cmd_q 260 - set SHOST_RECOVERY 261 - shost->host_failed++ 262 263 :LOCKING: shost->host_lock 264 265 2. EH starts 266 267 :ACTION: move all scmds to EH's local eh_work_q. shost->eh_cmd_q 268 is cleared. 269 270 :LOCKING: shost->host_lock (not strictly necessary, just for 271 consistency) 272 273 3. scmd recovered 274 275 :ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd 276 277 - scsi_setup_cmd_retry() 278 - move from local eh_work_q to local eh_done_q 279 280 :LOCKING: none 281 282 :CONCURRENCY: at most one thread per separate eh_work_q to 283 keep queue manipulation lockless 284 285 4. EH completes 286 287 :ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper 288 layer of failure. May be called concurrently but must have 289 a no more than one thread per separate eh_work_q to 290 manipulate the queue locklessly 291 292 - scmd is removed from eh_done_q and scmd->eh_entry is cleared 293 - if retry is necessary, scmd is requeued using 294 scsi_queue_insert() 295 - otherwise, scsi_finish_command() is invoked for scmd 296 - zero shost->host_failed 297 298 :LOCKING: queue or finish function performs appropriate locking 299 300 3012.1.3 Flow of control 302^^^^^^^^^^^^^^^^^^^^^^ 303 304 EH through fine-grained callbacks start from scsi_unjam_host(). 305 306``scsi_unjam_host`` 307 308 1. Lock shost->host_lock, splice_init shost->eh_cmd_q into local 309 eh_work_q and unlock host_lock. Note that shost->eh_cmd_q is 310 cleared by this action. 311 312 2. Invoke scsi_eh_get_sense. 313 314 ``scsi_eh_get_sense`` 315 316 This action is taken for each error-completed 317 (!SCSI_EH_CANCEL_CMD) commands without valid sense data. Most 318 SCSI transports/LLDDs automatically acquire sense data on 319 command failures (autosense). Autosense is recommended for 320 performance reasons and as sense information could get out of 321 sync between occurrence of CHECK CONDITION and this action. 322 323 Note that if autosense is not supported, scmd->sense_buffer 324 contains invalid sense data when error-completing the scmd 325 with scsi_done(). scsi_decide_disposition() always returns 326 FAILED in such cases thus invoking SCSI EH. When the scmd 327 reaches here, sense data is acquired and 328 scsi_decide_disposition() is called again. 329 330 1. Invoke scsi_request_sense() which issues REQUEST_SENSE 331 command. If fails, no action. Note that taking no action 332 causes higher-severity recovery to be taken for the scmd. 333 334 2. Invoke scsi_decide_disposition() on the scmd 335 336 - SUCCESS 337 scmd->retries is set to scmd->allowed preventing 338 scsi_eh_flush_done_q() from retrying the scmd and 339 scsi_eh_finish_cmd() is invoked. 340 341 - NEEDS_RETRY 342 scsi_eh_finish_cmd() invoked 343 344 - otherwise 345 No action. 346 347 3. If !list_empty(&eh_work_q), invoke scsi_eh_abort_cmds(). 348 349 ``scsi_eh_abort_cmds`` 350 351 This action is taken for each timed out command when 352 no_async_abort is enabled in the host template. 353 hostt->eh_abort_handler() is invoked for each scmd. The 354 handler returns SUCCESS if it has succeeded to make LLDD and 355 all related hardware forget about the scmd. 356 357 If a timedout scmd is successfully aborted and the sdev is 358 either offline or ready, scsi_eh_finish_cmd() is invoked for 359 the scmd. Otherwise, the scmd is left in eh_work_q for 360 higher-severity actions. 361 362 Note that both offline and ready status mean that the sdev is 363 ready to process new scmds, where processing also implies 364 immediate failing; thus, if a sdev is in one of the two 365 states, no further recovery action is needed. 366 367 Device readiness is tested using scsi_eh_tur() which issues 368 TEST_UNIT_READY command. Note that the scmd must have been 369 aborted successfully before reusing it for TEST_UNIT_READY. 370 371 4. If !list_empty(&eh_work_q), invoke scsi_eh_ready_devs() 372 373 ``scsi_eh_ready_devs`` 374 375 This function takes four increasingly more severe measures to 376 make failed sdevs ready for new commands. 377 378 1. Invoke scsi_eh_stu() 379 380 ``scsi_eh_stu`` 381 382 For each sdev which has failed scmds with valid sense data 383 of which scsi_check_sense()'s verdict is FAILED, 384 START_STOP_UNIT command is issued w/ start=1. Note that 385 as we explicitly choose error-completed scmds, it is known 386 that lower layers have forgotten about the scmd and we can 387 reuse it for STU. 388 389 If STU succeeds and the sdev is either offline or ready, 390 all failed scmds on the sdev are EH-finished with 391 scsi_eh_finish_cmd(). 392 393 *NOTE* If hostt->eh_abort_handler() isn't implemented or 394 failed, we may still have timed out scmds at this point 395 and STU doesn't make lower layers forget about those 396 scmds. Yet, this function EH-finish all scmds on the sdev 397 if STU succeeds leaving lower layers in an inconsistent 398 state. It seems that STU action should be taken only when 399 a sdev has no timed out scmd. 400 401 2. If !list_empty(&eh_work_q), invoke scsi_eh_bus_device_reset(). 402 403 ``scsi_eh_bus_device_reset`` 404 405 This action is very similar to scsi_eh_stu() except that, 406 instead of issuing STU, hostt->eh_device_reset_handler() 407 is used. Also, as we're not issuing SCSI commands and 408 resetting clears all scmds on the sdev, there is no need 409 to choose error-completed scmds. 410 411 3. If !list_empty(&eh_work_q), invoke scsi_eh_bus_reset() 412 413 ``scsi_eh_bus_reset`` 414 415 hostt->eh_bus_reset_handler() is invoked for each channel 416 with failed scmds. If bus reset succeeds, all failed 417 scmds on all ready or offline sdevs on the channel are 418 EH-finished. 419 420 4. If !list_empty(&eh_work_q), invoke scsi_eh_host_reset() 421 422 ``scsi_eh_host_reset`` 423 424 This is the last resort. hostt->eh_host_reset_handler() 425 is invoked. If host reset succeeds, all failed scmds on 426 all ready or offline sdevs on the host are EH-finished. 427 428 5. If !list_empty(&eh_work_q), invoke scsi_eh_offline_sdevs() 429 430 ``scsi_eh_offline_sdevs`` 431 432 Take all sdevs which still have unrecovered scmds offline 433 and EH-finish the scmds. 434 435 5. Invoke scsi_eh_flush_done_q(). 436 437 ``scsi_eh_flush_done_q`` 438 439 At this point all scmds are recovered (or given up) and 440 put on eh_done_q by scsi_eh_finish_cmd(). This function 441 flushes eh_done_q by either retrying or notifying upper 442 layer of failure of the scmds. 443 444 4452.2 EH through transportt->eh_strategy_handler() 446------------------------------------------------ 447 448transportt->eh_strategy_handler() is invoked in the place of 449scsi_unjam_host() and it is responsible for whole recovery process. 450On completion, the handler should have made lower layers forget about 451all failed scmds and either ready for new commands or offline. Also, 452it should perform SCSI EH maintenance chores to maintain integrity of 453SCSI midlayer. IOW, of the steps described in [2-1-2], all steps 454except for #1 must be implemented by eh_strategy_handler(). 455 456 4572.2.1 Pre transportt->eh_strategy_handler() SCSI midlayer conditions 458^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 459 460 The following conditions are true on entry to the handler. 461 462 - Each failed scmd's eh_flags field is set appropriately. 463 464 - Each failed scmd is linked on scmd->eh_cmd_q by scmd->eh_entry. 465 466 - SHOST_RECOVERY is set. 467 468 - shost->host_failed == shost->host_busy 469 470 4712.2.2 Post transportt->eh_strategy_handler() SCSI midlayer conditions 472^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 473 474 The following conditions must be true on exit from the handler. 475 476 - shost->host_failed is zero. 477 478 - Each scmd is in such a state that scsi_setup_cmd_retry() on the 479 scmd doesn't make any difference. 480 481 - shost->eh_cmd_q is cleared. 482 483 - Each scmd->eh_entry is cleared. 484 485 - Either scsi_queue_insert() or scsi_finish_command() is called on 486 each scmd. Note that the handler is free to use scmd->retries and 487 ->allowed to limit the number of retries. 488 489 4902.2.3 Things to consider 491^^^^^^^^^^^^^^^^^^^^^^^^ 492 493 - Know that timed out scmds are still active on lower layers. Make 494 lower layers forget about them before doing anything else with 495 those scmds. 496 497 - For consistency, when accessing/modifying shost data structure, 498 grab shost->host_lock. 499 500 - On completion, each failed sdev must have forgotten about all 501 active scmds. 502 503 - On completion, each failed sdev must be ready for new commands or 504 offline. 505 506 507Tejun Heo 508htejun@gmail.com 509 51011th September 2005 511