1.. SPDX-License-Identifier: GPL-2.0 2 3======= 4SCSI EH 5======= 6 7This document describes SCSI midlayer error handling infrastructure. 8Please refer to Documentation/scsi/scsi_mid_low_api.rst for more 9information regarding SCSI midlayer. 10 11.. TABLE OF CONTENTS 12 13 [1] How SCSI commands travel through the midlayer and to EH 14 [1-1] struct scsi_cmnd 15 [1-2] How do scmd's get completed? 16 [1-2-1] Completing a scmd w/ scsi_done 17 [1-2-2] Completing a scmd w/ timeout 18 [1-3] How EH takes over 19 [2] How SCSI EH works 20 [2-1] EH through fine-grained callbacks 21 [2-1-1] Overview 22 [2-1-2] Flow of scmds through EH 23 [2-1-3] Flow of control 24 [2-2] EH through transportt->eh_strategy_handler() 25 [2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions 26 [2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions 27 [2-2-3] Things to consider 28 29 301. How SCSI commands travel through the midlayer and to EH 31========================================================== 32 331.1 struct scsi_cmnd 34-------------------- 35 36Each SCSI command is represented with struct scsi_cmnd (== scmd). A 37scmd has two list_head's to link itself into lists. The two are 38scmd->list and scmd->eh_entry. The former is used for free list or 39per-device allocated scmd list and not of much interest to this EH 40discussion. The latter is used for completion and EH lists and unless 41otherwise stated scmds are always linked using scmd->eh_entry in this 42discussion. 43 44 451.2 How do scmd's get completed? 46-------------------------------- 47 48Once LLDD gets hold of a scmd, either the LLDD will complete the 49command by calling scsi_done callback passed from midlayer when 50invoking hostt->queuecommand() or the block layer will time it out. 51 52 531.2.1 Completing a scmd w/ scsi_done 54^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 55 56For all non-EH commands, scsi_done() is the completion callback. It 57just calls blk_complete_request() to delete the block layer timer and 58raise SCSI_SOFTIRQ 59 60SCSI_SOFTIRQ handler scsi_softirq calls scsi_decide_disposition() to 61determine what to do with the command. scsi_decide_disposition() 62looks at the scmd->result value and sense data to determine what to do 63with the command. 64 65 - SUCCESS 66 67 scsi_finish_command() is invoked for the command. The 68 function does some maintenance chores and then calls 69 scsi_io_completion() to finish the I/O. 70 scsi_io_completion() then notifies the block layer on 71 the completed request by calling blk_end_request and 72 friends or figures out what to do with the remainder 73 of the data in case of an error. 74 75 - NEEDS_RETRY 76 77 - ADD_TO_MLQUEUE 78 79 scmd is requeued to blk queue. 80 81 - otherwise 82 83 scsi_eh_scmd_add(scmd) is invoked for the command. See 84 [1-3] for details of this function. 85 86 871.2.2 Completing a scmd w/ timeout 88^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 89 90The timeout handler is scsi_times_out(). When a timeout occurs, this 91function 92 93 1. invokes optional hostt->eh_timed_out() callback. Return value can 94 be one of 95 96 - BLK_EH_RESET_TIMER 97 This indicates that more time is required to finish the 98 command. Timer is restarted. This action is counted as a 99 retry and only allowed scmd->allowed + 1(!) times. Once the 100 limit is reached, action for BLK_EH_DONE is taken instead. 101 102 - BLK_EH_DONE 103 eh_timed_out() callback did not handle the command. 104 Step #2 is taken. 105 106 2. scsi_abort_command() is invoked to schedule an asynchrous abort. 107 Asynchronous abort are not invoked for commands which the 108 SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command 109 already had been aborted once, and this is a retry which failed), 110 or when the EH deadline is expired. In these case Step #3 is taken. 111 112 3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the 113 command. See [1-4] for more information. 114 1151.3 Asynchronous command aborts 116------------------------------- 117 118 After a timeout occurs a command abort is scheduled from 119 scsi_abort_command(). If the abort is successful the command 120 will either be retried (if the number of retries is not exhausted) 121 or terminated with DID_TIME_OUT. 122 123 Otherwise scsi_eh_scmd_add() is invoked for the command. 124 See [1-4] for more information. 125 1261.4 How EH takes over 127--------------------- 128 129scmds enter EH via scsi_eh_scmd_add(), which does the following. 130 131 1. Links scmd->eh_entry to shost->eh_cmd_q 132 133 2. Sets SHOST_RECOVERY bit in shost->shost_state 134 135 3. Increments shost->host_failed 136 137 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed 138 139As can be seen above, once any scmd is added to shost->eh_cmd_q, 140SHOST_RECOVERY shost_state bit is turned on. This prevents any new 141scmd to be issued from blk queue to the host; eventually, all scmds on 142the host either complete normally, fail and get added to eh_cmd_q, or 143time out and get added to shost->eh_cmd_q. 144 145If all scmds either complete or fail, the number of in-flight scmds 146becomes equal to the number of failed scmds - i.e. shost->host_busy == 147shost->host_failed. This wakes up SCSI EH thread. So, once woken up, 148SCSI EH thread can expect that all in-flight commands have failed and 149are linked on shost->eh_cmd_q. 150 151Note that this does not mean lower layers are quiescent. If a LLDD 152completed a scmd with error status, the LLDD and lower layers are 153assumed to forget about the scmd at that point. However, if a scmd 154has timed out, unless hostt->eh_timed_out() made lower layers forget 155about the scmd, which currently no LLDD does, the command is still 156active as long as lower layers are concerned and completion could 157occur at any time. Of course, all such completions are ignored as the 158timer has already expired. 159 160We'll talk about how SCSI EH takes actions to abort - make LLDD 161forget about - timed out scmds later. 162 163 1642. How SCSI EH works 165==================== 166 167LLDD's can implement SCSI EH actions in one of the following two 168ways. 169 170 - Fine-grained EH callbacks 171 LLDD can implement fine-grained EH callbacks and let SCSI 172 midlayer drive error handling and call appropriate callbacks. 173 This will be discussed further in [2-1]. 174 175 - eh_strategy_handler() callback 176 This is one big callback which should perform whole error 177 handling. As such, it should do all chores the SCSI midlayer 178 performs during recovery. This will be discussed in [2-2]. 179 180Once recovery is complete, SCSI EH resumes normal operation by 181calling scsi_restart_operations(), which 182 183 1. Checks if door locking is needed and locks door. 184 185 2. Clears SHOST_RECOVERY shost_state bit 186 187 3. Wakes up waiters on shost->host_wait. This occurs if someone 188 calls scsi_block_when_processing_errors() on the host. 189 (*QUESTION* why is it needed? All operations will be blocked 190 anyway after it reaches blk queue.) 191 192 4. Kicks queues in all devices on the host in the asses 193 194 1952.1 EH through fine-grained callbacks 196------------------------------------- 197 1982.1.1 Overview 199^^^^^^^^^^^^^^ 200 201If eh_strategy_handler() is not present, SCSI midlayer takes charge 202of driving error handling. EH's goals are two - make LLDD, host and 203device forget about timed out scmds and make them ready for new 204commands. A scmd is said to be recovered if the scmd is forgotten by 205lower layers and lower layers are ready to process or fail the scmd 206again. 207 208To achieve these goals, EH performs recovery actions with increasing 209severity. Some actions are performed by issuing SCSI commands and 210others are performed by invoking one of the following fine-grained 211hostt EH callbacks. Callbacks may be omitted and omitted ones are 212considered to fail always. 213 214:: 215 216 int (* eh_abort_handler)(struct scsi_cmnd *); 217 int (* eh_device_reset_handler)(struct scsi_cmnd *); 218 int (* eh_bus_reset_handler)(struct scsi_cmnd *); 219 int (* eh_host_reset_handler)(struct scsi_cmnd *); 220 221Higher-severity actions are taken only when lower-severity actions 222cannot recover some of failed scmds. Also, note that failure of the 223highest-severity action means EH failure and results in offlining of 224all unrecovered devices. 225 226During recovery, the following rules are followed 227 228 - Recovery actions are performed on failed scmds on the to do list, 229 eh_work_q. If a recovery action succeeds for a scmd, recovered 230 scmds are removed from eh_work_q. 231 232 Note that single recovery action on a scmd can recover multiple 233 scmds. e.g. resetting a device recovers all failed scmds on the 234 device. 235 236 - Higher severity actions are taken iff eh_work_q is not empty after 237 lower severity actions are complete. 238 239 - EH reuses failed scmds to issue commands for recovery. For 240 timed-out scmds, SCSI EH ensures that LLDD forgets about a scmd 241 before reusing it for EH commands. 242 243When a scmd is recovered, the scmd is moved from eh_work_q to EH 244local eh_done_q using scsi_eh_finish_cmd(). After all scmds are 245recovered (eh_work_q is empty), scsi_eh_flush_done_q() is invoked to 246either retry or error-finish (notify upper layer of failure) recovered 247scmds. 248 249scmds are retried iff its sdev is still online (not offlined during 250EH), REQ_FAILFAST is not set and ++scmd->retries is less than 251scmd->allowed. 252 253 2542.1.2 Flow of scmds through EH 255^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 256 257 1. Error completion / time out 258 259 :ACTION: scsi_eh_scmd_add() is invoked for scmd 260 261 - add scmd to shost->eh_cmd_q 262 - set SHOST_RECOVERY 263 - shost->host_failed++ 264 265 :LOCKING: shost->host_lock 266 267 2. EH starts 268 269 :ACTION: move all scmds to EH's local eh_work_q. shost->eh_cmd_q 270 is cleared. 271 272 :LOCKING: shost->host_lock (not strictly necessary, just for 273 consistency) 274 275 3. scmd recovered 276 277 :ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd 278 279 - scsi_setup_cmd_retry() 280 - move from local eh_work_q to local eh_done_q 281 282 :LOCKING: none 283 284 :CONCURRENCY: at most one thread per separate eh_work_q to 285 keep queue manipulation lockless 286 287 4. EH completes 288 289 :ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper 290 layer of failure. May be called concurrently but must have 291 a no more than one thread per separate eh_work_q to 292 manipulate the queue locklessly 293 294 - scmd is removed from eh_done_q and scmd->eh_entry is cleared 295 - if retry is necessary, scmd is requeued using 296 scsi_queue_insert() 297 - otherwise, scsi_finish_command() is invoked for scmd 298 - zero shost->host_failed 299 300 :LOCKING: queue or finish function performs appropriate locking 301 302 3032.1.3 Flow of control 304^^^^^^^^^^^^^^^^^^^^^^ 305 306 EH through fine-grained callbacks start from scsi_unjam_host(). 307 308``scsi_unjam_host`` 309 310 1. Lock shost->host_lock, splice_init shost->eh_cmd_q into local 311 eh_work_q and unlock host_lock. Note that shost->eh_cmd_q is 312 cleared by this action. 313 314 2. Invoke scsi_eh_get_sense. 315 316 ``scsi_eh_get_sense`` 317 318 This action is taken for each error-completed 319 (!SCSI_EH_CANCEL_CMD) commands without valid sense data. Most 320 SCSI transports/LLDDs automatically acquire sense data on 321 command failures (autosense). Autosense is recommended for 322 performance reasons and as sense information could get out of 323 sync between occurrence of CHECK CONDITION and this action. 324 325 Note that if autosense is not supported, scmd->sense_buffer 326 contains invalid sense data when error-completing the scmd 327 with scsi_done(). scsi_decide_disposition() always returns 328 FAILED in such cases thus invoking SCSI EH. When the scmd 329 reaches here, sense data is acquired and 330 scsi_decide_disposition() is called again. 331 332 1. Invoke scsi_request_sense() which issues REQUEST_SENSE 333 command. If fails, no action. Note that taking no action 334 causes higher-severity recovery to be taken for the scmd. 335 336 2. Invoke scsi_decide_disposition() on the scmd 337 338 - SUCCESS 339 scmd->retries is set to scmd->allowed preventing 340 scsi_eh_flush_done_q() from retrying the scmd and 341 scsi_eh_finish_cmd() is invoked. 342 343 - NEEDS_RETRY 344 scsi_eh_finish_cmd() invoked 345 346 - otherwise 347 No action. 348 349 3. If !list_empty(&eh_work_q), invoke scsi_eh_abort_cmds(). 350 351 ``scsi_eh_abort_cmds`` 352 353 This action is taken for each timed out command when 354 no_async_abort is enabled in the host template. 355 hostt->eh_abort_handler() is invoked for each scmd. The 356 handler returns SUCCESS if it has succeeded to make LLDD and 357 all related hardware forget about the scmd. 358 359 If a timedout scmd is successfully aborted and the sdev is 360 either offline or ready, scsi_eh_finish_cmd() is invoked for 361 the scmd. Otherwise, the scmd is left in eh_work_q for 362 higher-severity actions. 363 364 Note that both offline and ready status mean that the sdev is 365 ready to process new scmds, where processing also implies 366 immediate failing; thus, if a sdev is in one of the two 367 states, no further recovery action is needed. 368 369 Device readiness is tested using scsi_eh_tur() which issues 370 TEST_UNIT_READY command. Note that the scmd must have been 371 aborted successfully before reusing it for TEST_UNIT_READY. 372 373 4. If !list_empty(&eh_work_q), invoke scsi_eh_ready_devs() 374 375 ``scsi_eh_ready_devs`` 376 377 This function takes four increasingly more severe measures to 378 make failed sdevs ready for new commands. 379 380 1. Invoke scsi_eh_stu() 381 382 ``scsi_eh_stu`` 383 384 For each sdev which has failed scmds with valid sense data 385 of which scsi_check_sense()'s verdict is FAILED, 386 START_STOP_UNIT command is issued w/ start=1. Note that 387 as we explicitly choose error-completed scmds, it is known 388 that lower layers have forgotten about the scmd and we can 389 reuse it for STU. 390 391 If STU succeeds and the sdev is either offline or ready, 392 all failed scmds on the sdev are EH-finished with 393 scsi_eh_finish_cmd(). 394 395 *NOTE* If hostt->eh_abort_handler() isn't implemented or 396 failed, we may still have timed out scmds at this point 397 and STU doesn't make lower layers forget about those 398 scmds. Yet, this function EH-finish all scmds on the sdev 399 if STU succeeds leaving lower layers in an inconsistent 400 state. It seems that STU action should be taken only when 401 a sdev has no timed out scmd. 402 403 2. If !list_empty(&eh_work_q), invoke scsi_eh_bus_device_reset(). 404 405 ``scsi_eh_bus_device_reset`` 406 407 This action is very similar to scsi_eh_stu() except that, 408 instead of issuing STU, hostt->eh_device_reset_handler() 409 is used. Also, as we're not issuing SCSI commands and 410 resetting clears all scmds on the sdev, there is no need 411 to choose error-completed scmds. 412 413 3. If !list_empty(&eh_work_q), invoke scsi_eh_bus_reset() 414 415 ``scsi_eh_bus_reset`` 416 417 hostt->eh_bus_reset_handler() is invoked for each channel 418 with failed scmds. If bus reset succeeds, all failed 419 scmds on all ready or offline sdevs on the channel are 420 EH-finished. 421 422 4. If !list_empty(&eh_work_q), invoke scsi_eh_host_reset() 423 424 ``scsi_eh_host_reset`` 425 426 This is the last resort. hostt->eh_host_reset_handler() 427 is invoked. If host reset succeeds, all failed scmds on 428 all ready or offline sdevs on the host are EH-finished. 429 430 5. If !list_empty(&eh_work_q), invoke scsi_eh_offline_sdevs() 431 432 ``scsi_eh_offline_sdevs`` 433 434 Take all sdevs which still have unrecovered scmds offline 435 and EH-finish the scmds. 436 437 5. Invoke scsi_eh_flush_done_q(). 438 439 ``scsi_eh_flush_done_q`` 440 441 At this point all scmds are recovered (or given up) and 442 put on eh_done_q by scsi_eh_finish_cmd(). This function 443 flushes eh_done_q by either retrying or notifying upper 444 layer of failure of the scmds. 445 446 4472.2 EH through transportt->eh_strategy_handler() 448------------------------------------------------ 449 450transportt->eh_strategy_handler() is invoked in the place of 451scsi_unjam_host() and it is responsible for whole recovery process. 452On completion, the handler should have made lower layers forget about 453all failed scmds and either ready for new commands or offline. Also, 454it should perform SCSI EH maintenance chores to maintain integrity of 455SCSI midlayer. IOW, of the steps described in [2-1-2], all steps 456except for #1 must be implemented by eh_strategy_handler(). 457 458 4592.2.1 Pre transportt->eh_strategy_handler() SCSI midlayer conditions 460^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 461 462 The following conditions are true on entry to the handler. 463 464 - Each failed scmd's eh_flags field is set appropriately. 465 466 - Each failed scmd is linked on scmd->eh_cmd_q by scmd->eh_entry. 467 468 - SHOST_RECOVERY is set. 469 470 - shost->host_failed == shost->host_busy 471 472 4732.2.2 Post transportt->eh_strategy_handler() SCSI midlayer conditions 474^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 475 476 The following conditions must be true on exit from the handler. 477 478 - shost->host_failed is zero. 479 480 - Each scmd is in such a state that scsi_setup_cmd_retry() on the 481 scmd doesn't make any difference. 482 483 - shost->eh_cmd_q is cleared. 484 485 - Each scmd->eh_entry is cleared. 486 487 - Either scsi_queue_insert() or scsi_finish_command() is called on 488 each scmd. Note that the handler is free to use scmd->retries and 489 ->allowed to limit the number of retries. 490 491 4922.2.3 Things to consider 493^^^^^^^^^^^^^^^^^^^^^^^^ 494 495 - Know that timed out scmds are still active on lower layers. Make 496 lower layers forget about them before doing anything else with 497 those scmds. 498 499 - For consistency, when accessing/modifying shost data structure, 500 grab shost->host_lock. 501 502 - On completion, each failed sdev must have forgotten about all 503 active scmds. 504 505 - On completion, each failed sdev must be ready for new commands or 506 offline. 507 508 509Tejun Heo 510htejun@gmail.com 511 51211th September 2005 513