1 // SPDX-License-Identifier: MIT 2 /* 3 * Copyright © 2014 Intel Corporation 4 */ 5 6 /** 7 * DOC: Logical Rings, Logical Ring Contexts and Execlists 8 * 9 * Motivation: 10 * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts". 11 * These expanded contexts enable a number of new abilities, especially 12 * "Execlists" (also implemented in this file). 13 * 14 * One of the main differences with the legacy HW contexts is that logical 15 * ring contexts incorporate many more things to the context's state, like 16 * PDPs or ringbuffer control registers: 17 * 18 * The reason why PDPs are included in the context is straightforward: as 19 * PPGTTs (per-process GTTs) are actually per-context, having the PDPs 20 * contained there mean you don't need to do a ppgtt->switch_mm yourself, 21 * instead, the GPU will do it for you on the context switch. 22 * 23 * But, what about the ringbuffer control registers (head, tail, etc..)? 24 * shouldn't we just need a set of those per engine command streamer? This is 25 * where the name "Logical Rings" starts to make sense: by virtualizing the 26 * rings, the engine cs shifts to a new "ring buffer" with every context 27 * switch. When you want to submit a workload to the GPU you: A) choose your 28 * context, B) find its appropriate virtualized ring, C) write commands to it 29 * and then, finally, D) tell the GPU to switch to that context. 30 * 31 * Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch 32 * to a contexts is via a context execution list, ergo "Execlists". 33 * 34 * LRC implementation: 35 * Regarding the creation of contexts, we have: 36 * 37 * - One global default context. 38 * - One local default context for each opened fd. 39 * - One local extra context for each context create ioctl call. 40 * 41 * Now that ringbuffers belong per-context (and not per-engine, like before) 42 * and that contexts are uniquely tied to a given engine (and not reusable, 43 * like before) we need: 44 * 45 * - One ringbuffer per-engine inside each context. 46 * - One backing object per-engine inside each context. 47 * 48 * The global default context starts its life with these new objects fully 49 * allocated and populated. The local default context for each opened fd is 50 * more complex, because we don't know at creation time which engine is going 51 * to use them. To handle this, we have implemented a deferred creation of LR 52 * contexts: 53 * 54 * The local context starts its life as a hollow or blank holder, that only 55 * gets populated for a given engine once we receive an execbuffer. If later 56 * on we receive another execbuffer ioctl for the same context but a different 57 * engine, we allocate/populate a new ringbuffer and context backing object and 58 * so on. 59 * 60 * Finally, regarding local contexts created using the ioctl call: as they are 61 * only allowed with the render ring, we can allocate & populate them right 62 * away (no need to defer anything, at least for now). 63 * 64 * Execlists implementation: 65 * Execlists are the new method by which, on gen8+ hardware, workloads are 66 * submitted for execution (as opposed to the legacy, ringbuffer-based, method). 67 * This method works as follows: 68 * 69 * When a request is committed, its commands (the BB start and any leading or 70 * trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer 71 * for the appropriate context. The tail pointer in the hardware context is not 72 * updated at this time, but instead, kept by the driver in the ringbuffer 73 * structure. A structure representing this request is added to a request queue 74 * for the appropriate engine: this structure contains a copy of the context's 75 * tail after the request was written to the ring buffer and a pointer to the 76 * context itself. 77 * 78 * If the engine's request queue was empty before the request was added, the 79 * queue is processed immediately. Otherwise the queue will be processed during 80 * a context switch interrupt. In any case, elements on the queue will get sent 81 * (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a 82 * globally unique 20-bits submission ID. 83 * 84 * When execution of a request completes, the GPU updates the context status 85 * buffer with a context complete event and generates a context switch interrupt. 86 * During the interrupt handling, the driver examines the events in the buffer: 87 * for each context complete event, if the announced ID matches that on the head 88 * of the request queue, then that request is retired and removed from the queue. 89 * 90 * After processing, if any requests were retired and the queue is not empty 91 * then a new execution list can be submitted. The two requests at the front of 92 * the queue are next to be submitted but since a context may not occur twice in 93 * an execution list, if subsequent requests have the same ID as the first then 94 * the two requests must be combined. This is done simply by discarding requests 95 * at the head of the queue until either only one requests is left (in which case 96 * we use a NULL second context) or the first two requests have unique IDs. 97 * 98 * By always executing the first two requests in the queue the driver ensures 99 * that the GPU is kept as busy as possible. In the case where a single context 100 * completes but a second context is still executing, the request for this second 101 * context will be at the head of the queue when we remove the first one. This 102 * request will then be resubmitted along with a new request for a different context, 103 * which will cause the hardware to continue executing the second request and queue 104 * the new request (the GPU detects the condition of a context getting preempted 105 * with the same context and optimizes the context switch flow by not doing 106 * preemption, but just sampling the new tail pointer). 107 * 108 */ 109 #include <linux/interrupt.h> 110 111 #include "i915_drv.h" 112 #include "i915_trace.h" 113 #include "i915_vgpu.h" 114 #include "gen8_engine_cs.h" 115 #include "intel_breadcrumbs.h" 116 #include "intel_context.h" 117 #include "intel_engine_pm.h" 118 #include "intel_engine_stats.h" 119 #include "intel_execlists_submission.h" 120 #include "intel_gt.h" 121 #include "intel_gt_pm.h" 122 #include "intel_gt_requests.h" 123 #include "intel_lrc.h" 124 #include "intel_lrc_reg.h" 125 #include "intel_mocs.h" 126 #include "intel_reset.h" 127 #include "intel_ring.h" 128 #include "intel_workarounds.h" 129 #include "shmem_utils.h" 130 131 #define RING_EXECLIST_QFULL (1 << 0x2) 132 #define RING_EXECLIST1_VALID (1 << 0x3) 133 #define RING_EXECLIST0_VALID (1 << 0x4) 134 #define RING_EXECLIST_ACTIVE_STATUS (3 << 0xE) 135 #define RING_EXECLIST1_ACTIVE (1 << 0x11) 136 #define RING_EXECLIST0_ACTIVE (1 << 0x12) 137 138 #define GEN8_CTX_STATUS_IDLE_ACTIVE (1 << 0) 139 #define GEN8_CTX_STATUS_PREEMPTED (1 << 1) 140 #define GEN8_CTX_STATUS_ELEMENT_SWITCH (1 << 2) 141 #define GEN8_CTX_STATUS_ACTIVE_IDLE (1 << 3) 142 #define GEN8_CTX_STATUS_COMPLETE (1 << 4) 143 #define GEN8_CTX_STATUS_LITE_RESTORE (1 << 15) 144 145 #define GEN8_CTX_STATUS_COMPLETED_MASK \ 146 (GEN8_CTX_STATUS_COMPLETE | GEN8_CTX_STATUS_PREEMPTED) 147 148 #define GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE (0x1) /* lower csb dword */ 149 #define GEN12_CTX_SWITCH_DETAIL(csb_dw) ((csb_dw) & 0xF) /* upper csb dword */ 150 #define GEN12_CSB_SW_CTX_ID_MASK GENMASK(25, 15) 151 #define GEN12_IDLE_CTX_ID 0x7FF 152 #define GEN12_CSB_CTX_VALID(csb_dw) \ 153 (FIELD_GET(GEN12_CSB_SW_CTX_ID_MASK, csb_dw) != GEN12_IDLE_CTX_ID) 154 155 /* Typical size of the average request (2 pipecontrols and a MI_BB) */ 156 #define EXECLISTS_REQUEST_SIZE 64 /* bytes */ 157 158 struct virtual_engine { 159 struct intel_engine_cs base; 160 struct intel_context context; 161 struct rcu_work rcu; 162 163 /* 164 * We allow only a single request through the virtual engine at a time 165 * (each request in the timeline waits for the completion fence of 166 * the previous before being submitted). By restricting ourselves to 167 * only submitting a single request, each request is placed on to a 168 * physical to maximise load spreading (by virtue of the late greedy 169 * scheduling -- each real engine takes the next available request 170 * upon idling). 171 */ 172 struct i915_request *request; 173 174 /* 175 * We keep a rbtree of available virtual engines inside each physical 176 * engine, sorted by priority. Here we preallocate the nodes we need 177 * for the virtual engine, indexed by physical_engine->id. 178 */ 179 struct ve_node { 180 struct rb_node rb; 181 int prio; 182 } nodes[I915_NUM_ENGINES]; 183 184 /* 185 * Keep track of bonded pairs -- restrictions upon on our selection 186 * of physical engines any particular request may be submitted to. 187 * If we receive a submit-fence from a master engine, we will only 188 * use one of sibling_mask physical engines. 189 */ 190 struct ve_bond { 191 const struct intel_engine_cs *master; 192 intel_engine_mask_t sibling_mask; 193 } *bonds; 194 unsigned int num_bonds; 195 196 /* And finally, which physical engines this virtual engine maps onto. */ 197 unsigned int num_siblings; 198 struct intel_engine_cs *siblings[]; 199 }; 200 201 static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine) 202 { 203 GEM_BUG_ON(!intel_engine_is_virtual(engine)); 204 return container_of(engine, struct virtual_engine, base); 205 } 206 207 static struct i915_request * 208 __active_request(const struct intel_timeline * const tl, 209 struct i915_request *rq, 210 int error) 211 { 212 struct i915_request *active = rq; 213 214 list_for_each_entry_from_reverse(rq, &tl->requests, link) { 215 if (__i915_request_is_complete(rq)) 216 break; 217 218 if (error) { 219 i915_request_set_error_once(rq, error); 220 __i915_request_skip(rq); 221 } 222 active = rq; 223 } 224 225 return active; 226 } 227 228 static struct i915_request * 229 active_request(const struct intel_timeline * const tl, struct i915_request *rq) 230 { 231 return __active_request(tl, rq, 0); 232 } 233 234 static void ring_set_paused(const struct intel_engine_cs *engine, int state) 235 { 236 /* 237 * We inspect HWS_PREEMPT with a semaphore inside 238 * engine->emit_fini_breadcrumb. If the dword is true, 239 * the ring is paused as the semaphore will busywait 240 * until the dword is false. 241 */ 242 engine->status_page.addr[I915_GEM_HWS_PREEMPT] = state; 243 if (state) 244 wmb(); 245 } 246 247 static struct i915_priolist *to_priolist(struct rb_node *rb) 248 { 249 return rb_entry(rb, struct i915_priolist, node); 250 } 251 252 static int rq_prio(const struct i915_request *rq) 253 { 254 return READ_ONCE(rq->sched.attr.priority); 255 } 256 257 static int effective_prio(const struct i915_request *rq) 258 { 259 int prio = rq_prio(rq); 260 261 /* 262 * If this request is special and must not be interrupted at any 263 * cost, so be it. Note we are only checking the most recent request 264 * in the context and so may be masking an earlier vip request. It 265 * is hoped that under the conditions where nopreempt is used, this 266 * will not matter (i.e. all requests to that context will be 267 * nopreempt for as long as desired). 268 */ 269 if (i915_request_has_nopreempt(rq)) 270 prio = I915_PRIORITY_UNPREEMPTABLE; 271 272 return prio; 273 } 274 275 static int queue_prio(const struct intel_engine_execlists *execlists) 276 { 277 struct i915_priolist *p; 278 struct rb_node *rb; 279 280 rb = rb_first_cached(&execlists->queue); 281 if (!rb) 282 return INT_MIN; 283 284 /* 285 * As the priolist[] are inverted, with the highest priority in [0], 286 * we have to flip the index value to become priority. 287 */ 288 p = to_priolist(rb); 289 if (!I915_USER_PRIORITY_SHIFT) 290 return p->priority; 291 292 return ((p->priority + 1) << I915_USER_PRIORITY_SHIFT) - ffs(p->used); 293 } 294 295 static int virtual_prio(const struct intel_engine_execlists *el) 296 { 297 struct rb_node *rb = rb_first_cached(&el->virtual); 298 299 return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN; 300 } 301 302 static bool need_preempt(const struct intel_engine_cs *engine, 303 const struct i915_request *rq) 304 { 305 int last_prio; 306 307 if (!intel_engine_has_semaphores(engine)) 308 return false; 309 310 /* 311 * Check if the current priority hint merits a preemption attempt. 312 * 313 * We record the highest value priority we saw during rescheduling 314 * prior to this dequeue, therefore we know that if it is strictly 315 * less than the current tail of ESLP[0], we do not need to force 316 * a preempt-to-idle cycle. 317 * 318 * However, the priority hint is a mere hint that we may need to 319 * preempt. If that hint is stale or we may be trying to preempt 320 * ourselves, ignore the request. 321 * 322 * More naturally we would write 323 * prio >= max(0, last); 324 * except that we wish to prevent triggering preemption at the same 325 * priority level: the task that is running should remain running 326 * to preserve FIFO ordering of dependencies. 327 */ 328 last_prio = max(effective_prio(rq), I915_PRIORITY_NORMAL - 1); 329 if (engine->execlists.queue_priority_hint <= last_prio) 330 return false; 331 332 /* 333 * Check against the first request in ELSP[1], it will, thanks to the 334 * power of PI, be the highest priority of that context. 335 */ 336 if (!list_is_last(&rq->sched.link, &engine->active.requests) && 337 rq_prio(list_next_entry(rq, sched.link)) > last_prio) 338 return true; 339 340 /* 341 * If the inflight context did not trigger the preemption, then maybe 342 * it was the set of queued requests? Pick the highest priority in 343 * the queue (the first active priolist) and see if it deserves to be 344 * running instead of ELSP[0]. 345 * 346 * The highest priority request in the queue can not be either 347 * ELSP[0] or ELSP[1] as, thanks again to PI, if it was the same 348 * context, it's priority would not exceed ELSP[0] aka last_prio. 349 */ 350 return max(virtual_prio(&engine->execlists), 351 queue_prio(&engine->execlists)) > last_prio; 352 } 353 354 __maybe_unused static bool 355 assert_priority_queue(const struct i915_request *prev, 356 const struct i915_request *next) 357 { 358 /* 359 * Without preemption, the prev may refer to the still active element 360 * which we refuse to let go. 361 * 362 * Even with preemption, there are times when we think it is better not 363 * to preempt and leave an ostensibly lower priority request in flight. 364 */ 365 if (i915_request_is_active(prev)) 366 return true; 367 368 return rq_prio(prev) >= rq_prio(next); 369 } 370 371 static struct i915_request * 372 __unwind_incomplete_requests(struct intel_engine_cs *engine) 373 { 374 struct i915_request *rq, *rn, *active = NULL; 375 struct list_head *pl; 376 int prio = I915_PRIORITY_INVALID; 377 378 lockdep_assert_held(&engine->active.lock); 379 380 list_for_each_entry_safe_reverse(rq, rn, 381 &engine->active.requests, 382 sched.link) { 383 if (__i915_request_is_complete(rq)) { 384 list_del_init(&rq->sched.link); 385 continue; 386 } 387 388 __i915_request_unsubmit(rq); 389 390 GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); 391 if (rq_prio(rq) != prio) { 392 prio = rq_prio(rq); 393 pl = i915_sched_lookup_priolist(engine, prio); 394 } 395 GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); 396 397 list_move(&rq->sched.link, pl); 398 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 399 400 /* Check in case we rollback so far we wrap [size/2] */ 401 if (intel_ring_direction(rq->ring, 402 rq->tail, 403 rq->ring->tail + 8) > 0) 404 rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE; 405 406 active = rq; 407 } 408 409 return active; 410 } 411 412 struct i915_request * 413 execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists) 414 { 415 struct intel_engine_cs *engine = 416 container_of(execlists, typeof(*engine), execlists); 417 418 return __unwind_incomplete_requests(engine); 419 } 420 421 static void 422 execlists_context_status_change(struct i915_request *rq, unsigned long status) 423 { 424 /* 425 * Only used when GVT-g is enabled now. When GVT-g is disabled, 426 * The compiler should eliminate this function as dead-code. 427 */ 428 if (!IS_ENABLED(CONFIG_DRM_I915_GVT)) 429 return; 430 431 atomic_notifier_call_chain(&rq->engine->context_status_notifier, 432 status, rq); 433 } 434 435 static void reset_active(struct i915_request *rq, 436 struct intel_engine_cs *engine) 437 { 438 struct intel_context * const ce = rq->context; 439 u32 head; 440 441 /* 442 * The executing context has been cancelled. We want to prevent 443 * further execution along this context and propagate the error on 444 * to anything depending on its results. 445 * 446 * In __i915_request_submit(), we apply the -EIO and remove the 447 * requests' payloads for any banned requests. But first, we must 448 * rewind the context back to the start of the incomplete request so 449 * that we do not jump back into the middle of the batch. 450 * 451 * We preserve the breadcrumbs and semaphores of the incomplete 452 * requests so that inter-timeline dependencies (i.e other timelines) 453 * remain correctly ordered. And we defer to __i915_request_submit() 454 * so that all asynchronous waits are correctly handled. 455 */ 456 ENGINE_TRACE(engine, "{ reset rq=%llx:%lld }\n", 457 rq->fence.context, rq->fence.seqno); 458 459 /* On resubmission of the active request, payload will be scrubbed */ 460 if (__i915_request_is_complete(rq)) 461 head = rq->tail; 462 else 463 head = __active_request(ce->timeline, rq, -EIO)->head; 464 head = intel_ring_wrap(ce->ring, head); 465 466 /* Scrub the context image to prevent replaying the previous batch */ 467 lrc_init_regs(ce, engine, true); 468 469 /* We've switched away, so this should be a no-op, but intent matters */ 470 ce->lrc.lrca = lrc_update_regs(ce, engine, head); 471 } 472 473 static struct intel_engine_cs * 474 __execlists_schedule_in(struct i915_request *rq) 475 { 476 struct intel_engine_cs * const engine = rq->engine; 477 struct intel_context * const ce = rq->context; 478 479 intel_context_get(ce); 480 481 if (unlikely(intel_context_is_closed(ce) && 482 !intel_engine_has_heartbeat(engine))) 483 intel_context_set_banned(ce); 484 485 if (unlikely(intel_context_is_banned(ce))) 486 reset_active(rq, engine); 487 488 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 489 lrc_check_regs(ce, engine, "before"); 490 491 if (ce->tag) { 492 /* Use a fixed tag for OA and friends */ 493 GEM_BUG_ON(ce->tag <= BITS_PER_LONG); 494 ce->lrc.ccid = ce->tag; 495 } else { 496 /* We don't need a strict matching tag, just different values */ 497 unsigned int tag = __ffs(engine->context_tag); 498 499 GEM_BUG_ON(tag >= BITS_PER_LONG); 500 __clear_bit(tag, &engine->context_tag); 501 ce->lrc.ccid = (1 + tag) << (GEN11_SW_CTX_ID_SHIFT - 32); 502 503 BUILD_BUG_ON(BITS_PER_LONG > GEN12_MAX_CONTEXT_HW_ID); 504 } 505 506 ce->lrc.ccid |= engine->execlists.ccid; 507 508 __intel_gt_pm_get(engine->gt); 509 if (engine->fw_domain && !engine->fw_active++) 510 intel_uncore_forcewake_get(engine->uncore, engine->fw_domain); 511 execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN); 512 intel_engine_context_in(engine); 513 514 CE_TRACE(ce, "schedule-in, ccid:%x\n", ce->lrc.ccid); 515 516 return engine; 517 } 518 519 static void execlists_schedule_in(struct i915_request *rq, int idx) 520 { 521 struct intel_context * const ce = rq->context; 522 struct intel_engine_cs *old; 523 524 GEM_BUG_ON(!intel_engine_pm_is_awake(rq->engine)); 525 trace_i915_request_in(rq, idx); 526 527 old = ce->inflight; 528 if (!old) 529 old = __execlists_schedule_in(rq); 530 WRITE_ONCE(ce->inflight, ptr_inc(old)); 531 532 GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); 533 } 534 535 static void 536 resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve) 537 { 538 struct intel_engine_cs *engine = rq->engine; 539 540 spin_lock_irq(&engine->active.lock); 541 542 clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 543 WRITE_ONCE(rq->engine, &ve->base); 544 ve->base.submit_request(rq); 545 546 spin_unlock_irq(&engine->active.lock); 547 } 548 549 static void kick_siblings(struct i915_request *rq, struct intel_context *ce) 550 { 551 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 552 struct intel_engine_cs *engine = rq->engine; 553 554 /* 555 * After this point, the rq may be transferred to a new sibling, so 556 * before we clear ce->inflight make sure that the context has been 557 * removed from the b->signalers and furthermore we need to make sure 558 * that the concurrent iterator in signal_irq_work is no longer 559 * following ce->signal_link. 560 */ 561 if (!list_empty(&ce->signals)) 562 intel_context_remove_breadcrumbs(ce, engine->breadcrumbs); 563 564 /* 565 * This engine is now too busy to run this virtual request, so 566 * see if we can find an alternative engine for it to execute on. 567 * Once a request has become bonded to this engine, we treat it the 568 * same as other native request. 569 */ 570 if (i915_request_in_priority_queue(rq) && 571 rq->execution_mask != engine->mask) 572 resubmit_virtual_request(rq, ve); 573 574 if (READ_ONCE(ve->request)) 575 tasklet_hi_schedule(&ve->base.execlists.tasklet); 576 } 577 578 static void __execlists_schedule_out(struct i915_request * const rq, 579 struct intel_context * const ce) 580 { 581 struct intel_engine_cs * const engine = rq->engine; 582 unsigned int ccid; 583 584 /* 585 * NB process_csb() is not under the engine->active.lock and hence 586 * schedule_out can race with schedule_in meaning that we should 587 * refrain from doing non-trivial work here. 588 */ 589 590 CE_TRACE(ce, "schedule-out, ccid:%x\n", ce->lrc.ccid); 591 GEM_BUG_ON(ce->inflight != engine); 592 593 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 594 lrc_check_regs(ce, engine, "after"); 595 596 /* 597 * If we have just completed this context, the engine may now be 598 * idle and we want to re-enter powersaving. 599 */ 600 if (intel_timeline_is_last(ce->timeline, rq) && 601 __i915_request_is_complete(rq)) 602 intel_engine_add_retire(engine, ce->timeline); 603 604 ccid = ce->lrc.ccid; 605 ccid >>= GEN11_SW_CTX_ID_SHIFT - 32; 606 ccid &= GEN12_MAX_CONTEXT_HW_ID; 607 if (ccid < BITS_PER_LONG) { 608 GEM_BUG_ON(ccid == 0); 609 GEM_BUG_ON(test_bit(ccid - 1, &engine->context_tag)); 610 __set_bit(ccid - 1, &engine->context_tag); 611 } 612 613 lrc_update_runtime(ce); 614 intel_engine_context_out(engine); 615 execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT); 616 if (engine->fw_domain && !--engine->fw_active) 617 intel_uncore_forcewake_put(engine->uncore, engine->fw_domain); 618 intel_gt_pm_put_async(engine->gt); 619 620 /* 621 * If this is part of a virtual engine, its next request may 622 * have been blocked waiting for access to the active context. 623 * We have to kick all the siblings again in case we need to 624 * switch (e.g. the next request is not runnable on this 625 * engine). Hopefully, we will already have submitted the next 626 * request before the tasklet runs and do not need to rebuild 627 * each virtual tree and kick everyone again. 628 */ 629 if (ce->engine != engine) 630 kick_siblings(rq, ce); 631 632 WRITE_ONCE(ce->inflight, NULL); 633 intel_context_put(ce); 634 } 635 636 static inline void execlists_schedule_out(struct i915_request *rq) 637 { 638 struct intel_context * const ce = rq->context; 639 640 trace_i915_request_out(rq); 641 642 GEM_BUG_ON(!ce->inflight); 643 ce->inflight = ptr_dec(ce->inflight); 644 if (!__intel_context_inflight_count(ce->inflight)) 645 __execlists_schedule_out(rq, ce); 646 647 i915_request_put(rq); 648 } 649 650 static u64 execlists_update_context(struct i915_request *rq) 651 { 652 struct intel_context *ce = rq->context; 653 u64 desc = ce->lrc.desc; 654 u32 tail, prev; 655 656 /* 657 * WaIdleLiteRestore:bdw,skl 658 * 659 * We should never submit the context with the same RING_TAIL twice 660 * just in case we submit an empty ring, which confuses the HW. 661 * 662 * We append a couple of NOOPs (gen8_emit_wa_tail) after the end of 663 * the normal request to be able to always advance the RING_TAIL on 664 * subsequent resubmissions (for lite restore). Should that fail us, 665 * and we try and submit the same tail again, force the context 666 * reload. 667 * 668 * If we need to return to a preempted context, we need to skip the 669 * lite-restore and force it to reload the RING_TAIL. Otherwise, the 670 * HW has a tendency to ignore us rewinding the TAIL to the end of 671 * an earlier request. 672 */ 673 GEM_BUG_ON(ce->lrc_reg_state[CTX_RING_TAIL] != rq->ring->tail); 674 prev = rq->ring->tail; 675 tail = intel_ring_set_tail(rq->ring, rq->tail); 676 if (unlikely(intel_ring_direction(rq->ring, tail, prev) <= 0)) 677 desc |= CTX_DESC_FORCE_RESTORE; 678 ce->lrc_reg_state[CTX_RING_TAIL] = tail; 679 rq->tail = rq->wa_tail; 680 681 /* 682 * Make sure the context image is complete before we submit it to HW. 683 * 684 * Ostensibly, writes (including the WCB) should be flushed prior to 685 * an uncached write such as our mmio register access, the empirical 686 * evidence (esp. on Braswell) suggests that the WC write into memory 687 * may not be visible to the HW prior to the completion of the UC 688 * register write and that we may begin execution from the context 689 * before its image is complete leading to invalid PD chasing. 690 */ 691 wmb(); 692 693 ce->lrc.desc &= ~CTX_DESC_FORCE_RESTORE; 694 return desc; 695 } 696 697 static void write_desc(struct intel_engine_execlists *execlists, u64 desc, u32 port) 698 { 699 if (execlists->ctrl_reg) { 700 writel(lower_32_bits(desc), execlists->submit_reg + port * 2); 701 writel(upper_32_bits(desc), execlists->submit_reg + port * 2 + 1); 702 } else { 703 writel(upper_32_bits(desc), execlists->submit_reg); 704 writel(lower_32_bits(desc), execlists->submit_reg); 705 } 706 } 707 708 static __maybe_unused char * 709 dump_port(char *buf, int buflen, const char *prefix, struct i915_request *rq) 710 { 711 if (!rq) 712 return ""; 713 714 snprintf(buf, buflen, "%sccid:%x %llx:%lld%s prio %d", 715 prefix, 716 rq->context->lrc.ccid, 717 rq->fence.context, rq->fence.seqno, 718 __i915_request_is_complete(rq) ? "!" : 719 __i915_request_has_started(rq) ? "*" : 720 "", 721 rq_prio(rq)); 722 723 return buf; 724 } 725 726 static __maybe_unused noinline void 727 trace_ports(const struct intel_engine_execlists *execlists, 728 const char *msg, 729 struct i915_request * const *ports) 730 { 731 const struct intel_engine_cs *engine = 732 container_of(execlists, typeof(*engine), execlists); 733 char __maybe_unused p0[40], p1[40]; 734 735 if (!ports[0]) 736 return; 737 738 ENGINE_TRACE(engine, "%s { %s%s }\n", msg, 739 dump_port(p0, sizeof(p0), "", ports[0]), 740 dump_port(p1, sizeof(p1), ", ", ports[1])); 741 } 742 743 static bool 744 reset_in_progress(const struct intel_engine_execlists *execlists) 745 { 746 return unlikely(!__tasklet_is_enabled(&execlists->tasklet)); 747 } 748 749 static __maybe_unused noinline bool 750 assert_pending_valid(const struct intel_engine_execlists *execlists, 751 const char *msg) 752 { 753 struct intel_engine_cs *engine = 754 container_of(execlists, typeof(*engine), execlists); 755 struct i915_request * const *port, *rq; 756 struct intel_context *ce = NULL; 757 bool sentinel = false; 758 u32 ccid = -1; 759 760 trace_ports(execlists, msg, execlists->pending); 761 762 /* We may be messing around with the lists during reset, lalala */ 763 if (reset_in_progress(execlists)) 764 return true; 765 766 if (!execlists->pending[0]) { 767 GEM_TRACE_ERR("%s: Nothing pending for promotion!\n", 768 engine->name); 769 return false; 770 } 771 772 if (execlists->pending[execlists_num_ports(execlists)]) { 773 GEM_TRACE_ERR("%s: Excess pending[%d] for promotion!\n", 774 engine->name, execlists_num_ports(execlists)); 775 return false; 776 } 777 778 for (port = execlists->pending; (rq = *port); port++) { 779 unsigned long flags; 780 bool ok = true; 781 782 GEM_BUG_ON(!kref_read(&rq->fence.refcount)); 783 GEM_BUG_ON(!i915_request_is_active(rq)); 784 785 if (ce == rq->context) { 786 GEM_TRACE_ERR("%s: Dup context:%llx in pending[%zd]\n", 787 engine->name, 788 ce->timeline->fence_context, 789 port - execlists->pending); 790 return false; 791 } 792 ce = rq->context; 793 794 if (ccid == ce->lrc.ccid) { 795 GEM_TRACE_ERR("%s: Dup ccid:%x context:%llx in pending[%zd]\n", 796 engine->name, 797 ccid, ce->timeline->fence_context, 798 port - execlists->pending); 799 return false; 800 } 801 ccid = ce->lrc.ccid; 802 803 /* 804 * Sentinels are supposed to be the last request so they flush 805 * the current execution off the HW. Check that they are the only 806 * request in the pending submission. 807 */ 808 if (sentinel) { 809 GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n", 810 engine->name, 811 ce->timeline->fence_context, 812 port - execlists->pending); 813 return false; 814 } 815 sentinel = i915_request_has_sentinel(rq); 816 817 /* 818 * We want virtual requests to only be in the first slot so 819 * that they are never stuck behind a hog and can be immediately 820 * transferred onto the next idle engine. 821 */ 822 if (rq->execution_mask != engine->mask && 823 port != execlists->pending) { 824 GEM_TRACE_ERR("%s: virtual engine:%llx not in prime position[%zd]\n", 825 engine->name, 826 ce->timeline->fence_context, 827 port - execlists->pending); 828 return false; 829 } 830 831 /* Hold tightly onto the lock to prevent concurrent retires! */ 832 if (!spin_trylock_irqsave(&rq->lock, flags)) 833 continue; 834 835 if (__i915_request_is_complete(rq)) 836 goto unlock; 837 838 if (i915_active_is_idle(&ce->active) && 839 !intel_context_is_barrier(ce)) { 840 GEM_TRACE_ERR("%s: Inactive context:%llx in pending[%zd]\n", 841 engine->name, 842 ce->timeline->fence_context, 843 port - execlists->pending); 844 ok = false; 845 goto unlock; 846 } 847 848 if (!i915_vma_is_pinned(ce->state)) { 849 GEM_TRACE_ERR("%s: Unpinned context:%llx in pending[%zd]\n", 850 engine->name, 851 ce->timeline->fence_context, 852 port - execlists->pending); 853 ok = false; 854 goto unlock; 855 } 856 857 if (!i915_vma_is_pinned(ce->ring->vma)) { 858 GEM_TRACE_ERR("%s: Unpinned ring:%llx in pending[%zd]\n", 859 engine->name, 860 ce->timeline->fence_context, 861 port - execlists->pending); 862 ok = false; 863 goto unlock; 864 } 865 866 unlock: 867 spin_unlock_irqrestore(&rq->lock, flags); 868 if (!ok) 869 return false; 870 } 871 872 return ce; 873 } 874 875 static void execlists_submit_ports(struct intel_engine_cs *engine) 876 { 877 struct intel_engine_execlists *execlists = &engine->execlists; 878 unsigned int n; 879 880 GEM_BUG_ON(!assert_pending_valid(execlists, "submit")); 881 882 /* 883 * We can skip acquiring intel_runtime_pm_get() here as it was taken 884 * on our behalf by the request (see i915_gem_mark_busy()) and it will 885 * not be relinquished until the device is idle (see 886 * i915_gem_idle_work_handler()). As a precaution, we make sure 887 * that all ELSP are drained i.e. we have processed the CSB, 888 * before allowing ourselves to idle and calling intel_runtime_pm_put(). 889 */ 890 GEM_BUG_ON(!intel_engine_pm_is_awake(engine)); 891 892 /* 893 * ELSQ note: the submit queue is not cleared after being submitted 894 * to the HW so we need to make sure we always clean it up. This is 895 * currently ensured by the fact that we always write the same number 896 * of elsq entries, keep this in mind before changing the loop below. 897 */ 898 for (n = execlists_num_ports(execlists); n--; ) { 899 struct i915_request *rq = execlists->pending[n]; 900 901 write_desc(execlists, 902 rq ? execlists_update_context(rq) : 0, 903 n); 904 } 905 906 /* we need to manually load the submit queue */ 907 if (execlists->ctrl_reg) 908 writel(EL_CTRL_LOAD, execlists->ctrl_reg); 909 } 910 911 static bool ctx_single_port_submission(const struct intel_context *ce) 912 { 913 return (IS_ENABLED(CONFIG_DRM_I915_GVT) && 914 intel_context_force_single_submission(ce)); 915 } 916 917 static bool can_merge_ctx(const struct intel_context *prev, 918 const struct intel_context *next) 919 { 920 if (prev != next) 921 return false; 922 923 if (ctx_single_port_submission(prev)) 924 return false; 925 926 return true; 927 } 928 929 static unsigned long i915_request_flags(const struct i915_request *rq) 930 { 931 return READ_ONCE(rq->fence.flags); 932 } 933 934 static bool can_merge_rq(const struct i915_request *prev, 935 const struct i915_request *next) 936 { 937 GEM_BUG_ON(prev == next); 938 GEM_BUG_ON(!assert_priority_queue(prev, next)); 939 940 /* 941 * We do not submit known completed requests. Therefore if the next 942 * request is already completed, we can pretend to merge it in 943 * with the previous context (and we will skip updating the ELSP 944 * and tracking). Thus hopefully keeping the ELSP full with active 945 * contexts, despite the best efforts of preempt-to-busy to confuse 946 * us. 947 */ 948 if (__i915_request_is_complete(next)) 949 return true; 950 951 if (unlikely((i915_request_flags(prev) ^ i915_request_flags(next)) & 952 (BIT(I915_FENCE_FLAG_NOPREEMPT) | 953 BIT(I915_FENCE_FLAG_SENTINEL)))) 954 return false; 955 956 if (!can_merge_ctx(prev->context, next->context)) 957 return false; 958 959 GEM_BUG_ON(i915_seqno_passed(prev->fence.seqno, next->fence.seqno)); 960 return true; 961 } 962 963 static bool virtual_matches(const struct virtual_engine *ve, 964 const struct i915_request *rq, 965 const struct intel_engine_cs *engine) 966 { 967 const struct intel_engine_cs *inflight; 968 969 if (!rq) 970 return false; 971 972 if (!(rq->execution_mask & engine->mask)) /* We peeked too soon! */ 973 return false; 974 975 /* 976 * We track when the HW has completed saving the context image 977 * (i.e. when we have seen the final CS event switching out of 978 * the context) and must not overwrite the context image before 979 * then. This restricts us to only using the active engine 980 * while the previous virtualized request is inflight (so 981 * we reuse the register offsets). This is a very small 982 * hystersis on the greedy seelction algorithm. 983 */ 984 inflight = intel_context_inflight(&ve->context); 985 if (inflight && inflight != engine) 986 return false; 987 988 return true; 989 } 990 991 static struct virtual_engine * 992 first_virtual_engine(struct intel_engine_cs *engine) 993 { 994 struct intel_engine_execlists *el = &engine->execlists; 995 struct rb_node *rb = rb_first_cached(&el->virtual); 996 997 while (rb) { 998 struct virtual_engine *ve = 999 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 1000 struct i915_request *rq = READ_ONCE(ve->request); 1001 1002 /* lazily cleanup after another engine handled rq */ 1003 if (!rq || !virtual_matches(ve, rq, engine)) { 1004 rb_erase_cached(rb, &el->virtual); 1005 RB_CLEAR_NODE(rb); 1006 rb = rb_first_cached(&el->virtual); 1007 continue; 1008 } 1009 1010 return ve; 1011 } 1012 1013 return NULL; 1014 } 1015 1016 static void virtual_xfer_context(struct virtual_engine *ve, 1017 struct intel_engine_cs *engine) 1018 { 1019 unsigned int n; 1020 1021 if (likely(engine == ve->siblings[0])) 1022 return; 1023 1024 GEM_BUG_ON(READ_ONCE(ve->context.inflight)); 1025 if (!intel_engine_has_relative_mmio(engine)) 1026 lrc_update_offsets(&ve->context, engine); 1027 1028 /* 1029 * Move the bound engine to the top of the list for 1030 * future execution. We then kick this tasklet first 1031 * before checking others, so that we preferentially 1032 * reuse this set of bound registers. 1033 */ 1034 for (n = 1; n < ve->num_siblings; n++) { 1035 if (ve->siblings[n] == engine) { 1036 swap(ve->siblings[n], ve->siblings[0]); 1037 break; 1038 } 1039 } 1040 } 1041 1042 static void defer_request(struct i915_request *rq, struct list_head * const pl) 1043 { 1044 LIST_HEAD(list); 1045 1046 /* 1047 * We want to move the interrupted request to the back of 1048 * the round-robin list (i.e. its priority level), but 1049 * in doing so, we must then move all requests that were in 1050 * flight and were waiting for the interrupted request to 1051 * be run after it again. 1052 */ 1053 do { 1054 struct i915_dependency *p; 1055 1056 GEM_BUG_ON(i915_request_is_active(rq)); 1057 list_move_tail(&rq->sched.link, pl); 1058 1059 for_each_waiter(p, rq) { 1060 struct i915_request *w = 1061 container_of(p->waiter, typeof(*w), sched); 1062 1063 if (p->flags & I915_DEPENDENCY_WEAK) 1064 continue; 1065 1066 /* Leave semaphores spinning on the other engines */ 1067 if (w->engine != rq->engine) 1068 continue; 1069 1070 /* No waiter should start before its signaler */ 1071 GEM_BUG_ON(i915_request_has_initial_breadcrumb(w) && 1072 __i915_request_has_started(w) && 1073 !__i915_request_is_complete(rq)); 1074 1075 GEM_BUG_ON(i915_request_is_active(w)); 1076 if (!i915_request_is_ready(w)) 1077 continue; 1078 1079 if (rq_prio(w) < rq_prio(rq)) 1080 continue; 1081 1082 GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); 1083 list_move_tail(&w->sched.link, &list); 1084 } 1085 1086 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 1087 } while (rq); 1088 } 1089 1090 static void defer_active(struct intel_engine_cs *engine) 1091 { 1092 struct i915_request *rq; 1093 1094 rq = __unwind_incomplete_requests(engine); 1095 if (!rq) 1096 return; 1097 1098 defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq))); 1099 } 1100 1101 static bool 1102 timeslice_yield(const struct intel_engine_execlists *el, 1103 const struct i915_request *rq) 1104 { 1105 /* 1106 * Once bitten, forever smitten! 1107 * 1108 * If the active context ever busy-waited on a semaphore, 1109 * it will be treated as a hog until the end of its timeslice (i.e. 1110 * until it is scheduled out and replaced by a new submission, 1111 * possibly even its own lite-restore). The HW only sends an interrupt 1112 * on the first miss, and we do know if that semaphore has been 1113 * signaled, or even if it is now stuck on another semaphore. Play 1114 * safe, yield if it might be stuck -- it will be given a fresh 1115 * timeslice in the near future. 1116 */ 1117 return rq->context->lrc.ccid == READ_ONCE(el->yield); 1118 } 1119 1120 static bool needs_timeslice(const struct intel_engine_cs *engine, 1121 const struct i915_request *rq) 1122 { 1123 if (!intel_engine_has_timeslices(engine)) 1124 return false; 1125 1126 /* If not currently active, or about to switch, wait for next event */ 1127 if (!rq || __i915_request_is_complete(rq)) 1128 return false; 1129 1130 /* We do not need to start the timeslice until after the ACK */ 1131 if (READ_ONCE(engine->execlists.pending[0])) 1132 return false; 1133 1134 /* If ELSP[1] is occupied, always check to see if worth slicing */ 1135 if (!list_is_last_rcu(&rq->sched.link, &engine->active.requests)) { 1136 ENGINE_TRACE(engine, "timeslice required for second inflight context\n"); 1137 return true; 1138 } 1139 1140 /* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */ 1141 if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)) { 1142 ENGINE_TRACE(engine, "timeslice required for queue\n"); 1143 return true; 1144 } 1145 1146 if (!RB_EMPTY_ROOT(&engine->execlists.virtual.rb_root)) { 1147 ENGINE_TRACE(engine, "timeslice required for virtual\n"); 1148 return true; 1149 } 1150 1151 return false; 1152 } 1153 1154 static bool 1155 timeslice_expired(struct intel_engine_cs *engine, const struct i915_request *rq) 1156 { 1157 const struct intel_engine_execlists *el = &engine->execlists; 1158 1159 if (i915_request_has_nopreempt(rq) && __i915_request_has_started(rq)) 1160 return false; 1161 1162 if (!needs_timeslice(engine, rq)) 1163 return false; 1164 1165 return timer_expired(&el->timer) || timeslice_yield(el, rq); 1166 } 1167 1168 static unsigned long timeslice(const struct intel_engine_cs *engine) 1169 { 1170 return READ_ONCE(engine->props.timeslice_duration_ms); 1171 } 1172 1173 static void start_timeslice(struct intel_engine_cs *engine) 1174 { 1175 struct intel_engine_execlists *el = &engine->execlists; 1176 unsigned long duration; 1177 1178 /* Disable the timer if there is nothing to switch to */ 1179 duration = 0; 1180 if (needs_timeslice(engine, *el->active)) { 1181 /* Avoid continually prolonging an active timeslice */ 1182 if (timer_active(&el->timer)) { 1183 /* 1184 * If we just submitted a new ELSP after an old 1185 * context, that context may have already consumed 1186 * its timeslice, so recheck. 1187 */ 1188 if (!timer_pending(&el->timer)) 1189 tasklet_hi_schedule(&el->tasklet); 1190 return; 1191 } 1192 1193 duration = timeslice(engine); 1194 } 1195 1196 set_timer_ms(&el->timer, duration); 1197 } 1198 1199 static void record_preemption(struct intel_engine_execlists *execlists) 1200 { 1201 (void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++); 1202 } 1203 1204 static unsigned long active_preempt_timeout(struct intel_engine_cs *engine, 1205 const struct i915_request *rq) 1206 { 1207 if (!rq) 1208 return 0; 1209 1210 /* Force a fast reset for terminated contexts (ignoring sysfs!) */ 1211 if (unlikely(intel_context_is_banned(rq->context))) 1212 return 1; 1213 1214 return READ_ONCE(engine->props.preempt_timeout_ms); 1215 } 1216 1217 static void set_preempt_timeout(struct intel_engine_cs *engine, 1218 const struct i915_request *rq) 1219 { 1220 if (!intel_engine_has_preempt_reset(engine)) 1221 return; 1222 1223 set_timer_ms(&engine->execlists.preempt, 1224 active_preempt_timeout(engine, rq)); 1225 } 1226 1227 static bool completed(const struct i915_request *rq) 1228 { 1229 if (i915_request_has_sentinel(rq)) 1230 return false; 1231 1232 return __i915_request_is_complete(rq); 1233 } 1234 1235 static void execlists_dequeue(struct intel_engine_cs *engine) 1236 { 1237 struct intel_engine_execlists * const execlists = &engine->execlists; 1238 struct i915_request **port = execlists->pending; 1239 struct i915_request ** const last_port = port + execlists->port_mask; 1240 struct i915_request *last, * const *active; 1241 struct virtual_engine *ve; 1242 struct rb_node *rb; 1243 bool submit = false; 1244 1245 /* 1246 * Hardware submission is through 2 ports. Conceptually each port 1247 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is 1248 * static for a context, and unique to each, so we only execute 1249 * requests belonging to a single context from each ring. RING_HEAD 1250 * is maintained by the CS in the context image, it marks the place 1251 * where it got up to last time, and through RING_TAIL we tell the CS 1252 * where we want to execute up to this time. 1253 * 1254 * In this list the requests are in order of execution. Consecutive 1255 * requests from the same context are adjacent in the ringbuffer. We 1256 * can combine these requests into a single RING_TAIL update: 1257 * 1258 * RING_HEAD...req1...req2 1259 * ^- RING_TAIL 1260 * since to execute req2 the CS must first execute req1. 1261 * 1262 * Our goal then is to point each port to the end of a consecutive 1263 * sequence of requests as being the most optimal (fewest wake ups 1264 * and context switches) submission. 1265 */ 1266 1267 spin_lock(&engine->active.lock); 1268 1269 /* 1270 * If the queue is higher priority than the last 1271 * request in the currently active context, submit afresh. 1272 * We will resubmit again afterwards in case we need to split 1273 * the active context to interject the preemption request, 1274 * i.e. we will retrigger preemption following the ack in case 1275 * of trouble. 1276 * 1277 */ 1278 active = execlists->active; 1279 while ((last = *active) && completed(last)) 1280 active++; 1281 1282 if (last) { 1283 if (need_preempt(engine, last)) { 1284 ENGINE_TRACE(engine, 1285 "preempting last=%llx:%lld, prio=%d, hint=%d\n", 1286 last->fence.context, 1287 last->fence.seqno, 1288 last->sched.attr.priority, 1289 execlists->queue_priority_hint); 1290 record_preemption(execlists); 1291 1292 /* 1293 * Don't let the RING_HEAD advance past the breadcrumb 1294 * as we unwind (and until we resubmit) so that we do 1295 * not accidentally tell it to go backwards. 1296 */ 1297 ring_set_paused(engine, 1); 1298 1299 /* 1300 * Note that we have not stopped the GPU at this point, 1301 * so we are unwinding the incomplete requests as they 1302 * remain inflight and so by the time we do complete 1303 * the preemption, some of the unwound requests may 1304 * complete! 1305 */ 1306 __unwind_incomplete_requests(engine); 1307 1308 last = NULL; 1309 } else if (timeslice_expired(engine, last)) { 1310 ENGINE_TRACE(engine, 1311 "expired:%s last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n", 1312 yesno(timer_expired(&execlists->timer)), 1313 last->fence.context, last->fence.seqno, 1314 rq_prio(last), 1315 execlists->queue_priority_hint, 1316 yesno(timeslice_yield(execlists, last))); 1317 1318 /* 1319 * Consume this timeslice; ensure we start a new one. 1320 * 1321 * The timeslice expired, and we will unwind the 1322 * running contexts and recompute the next ELSP. 1323 * If that submit will be the same pair of contexts 1324 * (due to dependency ordering), we will skip the 1325 * submission. If we don't cancel the timer now, 1326 * we will see that the timer has expired and 1327 * reschedule the tasklet; continually until the 1328 * next context switch or other preeemption event. 1329 * 1330 * Since we have decided to reschedule based on 1331 * consumption of this timeslice, if we submit the 1332 * same context again, grant it a full timeslice. 1333 */ 1334 cancel_timer(&execlists->timer); 1335 ring_set_paused(engine, 1); 1336 defer_active(engine); 1337 1338 /* 1339 * Unlike for preemption, if we rewind and continue 1340 * executing the same context as previously active, 1341 * the order of execution will remain the same and 1342 * the tail will only advance. We do not need to 1343 * force a full context restore, as a lite-restore 1344 * is sufficient to resample the monotonic TAIL. 1345 * 1346 * If we switch to any other context, similarly we 1347 * will not rewind TAIL of current context, and 1348 * normal save/restore will preserve state and allow 1349 * us to later continue executing the same request. 1350 */ 1351 last = NULL; 1352 } else { 1353 /* 1354 * Otherwise if we already have a request pending 1355 * for execution after the current one, we can 1356 * just wait until the next CS event before 1357 * queuing more. In either case we will force a 1358 * lite-restore preemption event, but if we wait 1359 * we hopefully coalesce several updates into a single 1360 * submission. 1361 */ 1362 if (active[1]) { 1363 /* 1364 * Even if ELSP[1] is occupied and not worthy 1365 * of timeslices, our queue might be. 1366 */ 1367 spin_unlock(&engine->active.lock); 1368 return; 1369 } 1370 } 1371 } 1372 1373 /* XXX virtual is always taking precedence */ 1374 while ((ve = first_virtual_engine(engine))) { 1375 struct i915_request *rq; 1376 1377 spin_lock(&ve->base.active.lock); 1378 1379 rq = ve->request; 1380 if (unlikely(!virtual_matches(ve, rq, engine))) 1381 goto unlock; /* lost the race to a sibling */ 1382 1383 GEM_BUG_ON(rq->engine != &ve->base); 1384 GEM_BUG_ON(rq->context != &ve->context); 1385 1386 if (unlikely(rq_prio(rq) < queue_prio(execlists))) { 1387 spin_unlock(&ve->base.active.lock); 1388 break; 1389 } 1390 1391 if (last && !can_merge_rq(last, rq)) { 1392 spin_unlock(&ve->base.active.lock); 1393 spin_unlock(&engine->active.lock); 1394 return; /* leave this for another sibling */ 1395 } 1396 1397 ENGINE_TRACE(engine, 1398 "virtual rq=%llx:%lld%s, new engine? %s\n", 1399 rq->fence.context, 1400 rq->fence.seqno, 1401 __i915_request_is_complete(rq) ? "!" : 1402 __i915_request_has_started(rq) ? "*" : 1403 "", 1404 yesno(engine != ve->siblings[0])); 1405 1406 WRITE_ONCE(ve->request, NULL); 1407 WRITE_ONCE(ve->base.execlists.queue_priority_hint, INT_MIN); 1408 1409 rb = &ve->nodes[engine->id].rb; 1410 rb_erase_cached(rb, &execlists->virtual); 1411 RB_CLEAR_NODE(rb); 1412 1413 GEM_BUG_ON(!(rq->execution_mask & engine->mask)); 1414 WRITE_ONCE(rq->engine, engine); 1415 1416 if (__i915_request_submit(rq)) { 1417 /* 1418 * Only after we confirm that we will submit 1419 * this request (i.e. it has not already 1420 * completed), do we want to update the context. 1421 * 1422 * This serves two purposes. It avoids 1423 * unnecessary work if we are resubmitting an 1424 * already completed request after timeslicing. 1425 * But more importantly, it prevents us altering 1426 * ve->siblings[] on an idle context, where 1427 * we may be using ve->siblings[] in 1428 * virtual_context_enter / virtual_context_exit. 1429 */ 1430 virtual_xfer_context(ve, engine); 1431 GEM_BUG_ON(ve->siblings[0] != engine); 1432 1433 submit = true; 1434 last = rq; 1435 } 1436 1437 i915_request_put(rq); 1438 unlock: 1439 spin_unlock(&ve->base.active.lock); 1440 1441 /* 1442 * Hmm, we have a bunch of virtual engine requests, 1443 * but the first one was already completed (thanks 1444 * preempt-to-busy!). Keep looking at the veng queue 1445 * until we have no more relevant requests (i.e. 1446 * the normal submit queue has higher priority). 1447 */ 1448 if (submit) 1449 break; 1450 } 1451 1452 while ((rb = rb_first_cached(&execlists->queue))) { 1453 struct i915_priolist *p = to_priolist(rb); 1454 struct i915_request *rq, *rn; 1455 int i; 1456 1457 priolist_for_each_request_consume(rq, rn, p, i) { 1458 bool merge = true; 1459 1460 /* 1461 * Can we combine this request with the current port? 1462 * It has to be the same context/ringbuffer and not 1463 * have any exceptions (e.g. GVT saying never to 1464 * combine contexts). 1465 * 1466 * If we can combine the requests, we can execute both 1467 * by updating the RING_TAIL to point to the end of the 1468 * second request, and so we never need to tell the 1469 * hardware about the first. 1470 */ 1471 if (last && !can_merge_rq(last, rq)) { 1472 /* 1473 * If we are on the second port and cannot 1474 * combine this request with the last, then we 1475 * are done. 1476 */ 1477 if (port == last_port) 1478 goto done; 1479 1480 /* 1481 * We must not populate both ELSP[] with the 1482 * same LRCA, i.e. we must submit 2 different 1483 * contexts if we submit 2 ELSP. 1484 */ 1485 if (last->context == rq->context) 1486 goto done; 1487 1488 if (i915_request_has_sentinel(last)) 1489 goto done; 1490 1491 /* 1492 * We avoid submitting virtual requests into 1493 * the secondary ports so that we can migrate 1494 * the request immediately to another engine 1495 * rather than wait for the primary request. 1496 */ 1497 if (rq->execution_mask != engine->mask) 1498 goto done; 1499 1500 /* 1501 * If GVT overrides us we only ever submit 1502 * port[0], leaving port[1] empty. Note that we 1503 * also have to be careful that we don't queue 1504 * the same context (even though a different 1505 * request) to the second port. 1506 */ 1507 if (ctx_single_port_submission(last->context) || 1508 ctx_single_port_submission(rq->context)) 1509 goto done; 1510 1511 merge = false; 1512 } 1513 1514 if (__i915_request_submit(rq)) { 1515 if (!merge) { 1516 *port++ = i915_request_get(last); 1517 last = NULL; 1518 } 1519 1520 GEM_BUG_ON(last && 1521 !can_merge_ctx(last->context, 1522 rq->context)); 1523 GEM_BUG_ON(last && 1524 i915_seqno_passed(last->fence.seqno, 1525 rq->fence.seqno)); 1526 1527 submit = true; 1528 last = rq; 1529 } 1530 } 1531 1532 rb_erase_cached(&p->node, &execlists->queue); 1533 i915_priolist_free(p); 1534 } 1535 done: 1536 *port++ = i915_request_get(last); 1537 1538 /* 1539 * Here be a bit of magic! Or sleight-of-hand, whichever you prefer. 1540 * 1541 * We choose the priority hint such that if we add a request of greater 1542 * priority than this, we kick the submission tasklet to decide on 1543 * the right order of submitting the requests to hardware. We must 1544 * also be prepared to reorder requests as they are in-flight on the 1545 * HW. We derive the priority hint then as the first "hole" in 1546 * the HW submission ports and if there are no available slots, 1547 * the priority of the lowest executing request, i.e. last. 1548 * 1549 * When we do receive a higher priority request ready to run from the 1550 * user, see queue_request(), the priority hint is bumped to that 1551 * request triggering preemption on the next dequeue (or subsequent 1552 * interrupt for secondary ports). 1553 */ 1554 execlists->queue_priority_hint = queue_prio(execlists); 1555 spin_unlock(&engine->active.lock); 1556 1557 /* 1558 * We can skip poking the HW if we ended up with exactly the same set 1559 * of requests as currently running, e.g. trying to timeslice a pair 1560 * of ordered contexts. 1561 */ 1562 if (submit && 1563 memcmp(active, 1564 execlists->pending, 1565 (port - execlists->pending) * sizeof(*port))) { 1566 *port = NULL; 1567 while (port-- != execlists->pending) 1568 execlists_schedule_in(*port, port - execlists->pending); 1569 1570 WRITE_ONCE(execlists->yield, -1); 1571 set_preempt_timeout(engine, *active); 1572 execlists_submit_ports(engine); 1573 } else { 1574 ring_set_paused(engine, 0); 1575 while (port-- != execlists->pending) 1576 i915_request_put(*port); 1577 *execlists->pending = NULL; 1578 } 1579 } 1580 1581 static void execlists_dequeue_irq(struct intel_engine_cs *engine) 1582 { 1583 local_irq_disable(); /* Suspend interrupts across request submission */ 1584 execlists_dequeue(engine); 1585 local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */ 1586 } 1587 1588 static void clear_ports(struct i915_request **ports, int count) 1589 { 1590 memset_p((void **)ports, NULL, count); 1591 } 1592 1593 static void 1594 copy_ports(struct i915_request **dst, struct i915_request **src, int count) 1595 { 1596 /* A memcpy_p() would be very useful here! */ 1597 while (count--) 1598 WRITE_ONCE(*dst++, *src++); /* avoid write tearing */ 1599 } 1600 1601 static struct i915_request ** 1602 cancel_port_requests(struct intel_engine_execlists * const execlists, 1603 struct i915_request **inactive) 1604 { 1605 struct i915_request * const *port; 1606 1607 for (port = execlists->pending; *port; port++) 1608 *inactive++ = *port; 1609 clear_ports(execlists->pending, ARRAY_SIZE(execlists->pending)); 1610 1611 /* Mark the end of active before we overwrite *active */ 1612 for (port = xchg(&execlists->active, execlists->pending); *port; port++) 1613 *inactive++ = *port; 1614 clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight)); 1615 1616 smp_wmb(); /* complete the seqlock for execlists_active() */ 1617 WRITE_ONCE(execlists->active, execlists->inflight); 1618 1619 /* Having cancelled all outstanding process_csb(), stop their timers */ 1620 GEM_BUG_ON(execlists->pending[0]); 1621 cancel_timer(&execlists->timer); 1622 cancel_timer(&execlists->preempt); 1623 1624 return inactive; 1625 } 1626 1627 static void invalidate_csb_entries(const u64 *first, const u64 *last) 1628 { 1629 clflush((void *)first); 1630 clflush((void *)last); 1631 } 1632 1633 /* 1634 * Starting with Gen12, the status has a new format: 1635 * 1636 * bit 0: switched to new queue 1637 * bit 1: reserved 1638 * bit 2: semaphore wait mode (poll or signal), only valid when 1639 * switch detail is set to "wait on semaphore" 1640 * bits 3-5: engine class 1641 * bits 6-11: engine instance 1642 * bits 12-14: reserved 1643 * bits 15-25: sw context id of the lrc the GT switched to 1644 * bits 26-31: sw counter of the lrc the GT switched to 1645 * bits 32-35: context switch detail 1646 * - 0: ctx complete 1647 * - 1: wait on sync flip 1648 * - 2: wait on vblank 1649 * - 3: wait on scanline 1650 * - 4: wait on semaphore 1651 * - 5: context preempted (not on SEMAPHORE_WAIT or 1652 * WAIT_FOR_EVENT) 1653 * bit 36: reserved 1654 * bits 37-43: wait detail (for switch detail 1 to 4) 1655 * bits 44-46: reserved 1656 * bits 47-57: sw context id of the lrc the GT switched away from 1657 * bits 58-63: sw counter of the lrc the GT switched away from 1658 */ 1659 static bool gen12_csb_parse(const u64 csb) 1660 { 1661 bool ctx_away_valid = GEN12_CSB_CTX_VALID(upper_32_bits(csb)); 1662 bool new_queue = 1663 lower_32_bits(csb) & GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE; 1664 1665 /* 1666 * The context switch detail is not guaranteed to be 5 when a preemption 1667 * occurs, so we can't just check for that. The check below works for 1668 * all the cases we care about, including preemptions of WAIT 1669 * instructions and lite-restore. Preempt-to-idle via the CTRL register 1670 * would require some extra handling, but we don't support that. 1671 */ 1672 if (!ctx_away_valid || new_queue) { 1673 GEM_BUG_ON(!GEN12_CSB_CTX_VALID(lower_32_bits(csb))); 1674 return true; 1675 } 1676 1677 /* 1678 * switch detail = 5 is covered by the case above and we do not expect a 1679 * context switch on an unsuccessful wait instruction since we always 1680 * use polling mode. 1681 */ 1682 GEM_BUG_ON(GEN12_CTX_SWITCH_DETAIL(upper_32_bits(csb))); 1683 return false; 1684 } 1685 1686 static bool gen8_csb_parse(const u64 csb) 1687 { 1688 return csb & (GEN8_CTX_STATUS_IDLE_ACTIVE | GEN8_CTX_STATUS_PREEMPTED); 1689 } 1690 1691 static noinline u64 1692 wa_csb_read(const struct intel_engine_cs *engine, u64 * const csb) 1693 { 1694 u64 entry; 1695 1696 /* 1697 * Reading from the HWSP has one particular advantage: we can detect 1698 * a stale entry. Since the write into HWSP is broken, we have no reason 1699 * to trust the HW at all, the mmio entry may equally be unordered, so 1700 * we prefer the path that is self-checking and as a last resort, 1701 * return the mmio value. 1702 * 1703 * tgl,dg1:HSDES#22011327657 1704 */ 1705 preempt_disable(); 1706 if (wait_for_atomic_us((entry = READ_ONCE(*csb)) != -1, 10)) { 1707 int idx = csb - engine->execlists.csb_status; 1708 int status; 1709 1710 status = GEN8_EXECLISTS_STATUS_BUF; 1711 if (idx >= 6) { 1712 status = GEN11_EXECLISTS_STATUS_BUF2; 1713 idx -= 6; 1714 } 1715 status += sizeof(u64) * idx; 1716 1717 entry = intel_uncore_read64(engine->uncore, 1718 _MMIO(engine->mmio_base + status)); 1719 } 1720 preempt_enable(); 1721 1722 return entry; 1723 } 1724 1725 static u64 csb_read(const struct intel_engine_cs *engine, u64 * const csb) 1726 { 1727 u64 entry = READ_ONCE(*csb); 1728 1729 /* 1730 * Unfortunately, the GPU does not always serialise its write 1731 * of the CSB entries before its write of the CSB pointer, at least 1732 * from the perspective of the CPU, using what is known as a Global 1733 * Observation Point. We may read a new CSB tail pointer, but then 1734 * read the stale CSB entries, causing us to misinterpret the 1735 * context-switch events, and eventually declare the GPU hung. 1736 * 1737 * icl:HSDES#1806554093 1738 * tgl:HSDES#22011248461 1739 */ 1740 if (unlikely(entry == -1)) 1741 entry = wa_csb_read(engine, csb); 1742 1743 /* Consume this entry so that we can spot its future reuse. */ 1744 WRITE_ONCE(*csb, -1); 1745 1746 /* ELSP is an implicit wmb() before the GPU wraps and overwrites csb */ 1747 return entry; 1748 } 1749 1750 static void new_timeslice(struct intel_engine_execlists *el) 1751 { 1752 /* By cancelling, we will start afresh in start_timeslice() */ 1753 cancel_timer(&el->timer); 1754 } 1755 1756 static struct i915_request ** 1757 process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) 1758 { 1759 struct intel_engine_execlists * const execlists = &engine->execlists; 1760 u64 * const buf = execlists->csb_status; 1761 const u8 num_entries = execlists->csb_size; 1762 struct i915_request **prev; 1763 u8 head, tail; 1764 1765 /* 1766 * As we modify our execlists state tracking we require exclusive 1767 * access. Either we are inside the tasklet, or the tasklet is disabled 1768 * and we assume that is only inside the reset paths and so serialised. 1769 */ 1770 GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) && 1771 !reset_in_progress(execlists)); 1772 GEM_BUG_ON(!intel_engine_in_execlists_submission_mode(engine)); 1773 1774 /* 1775 * Note that csb_write, csb_status may be either in HWSP or mmio. 1776 * When reading from the csb_write mmio register, we have to be 1777 * careful to only use the GEN8_CSB_WRITE_PTR portion, which is 1778 * the low 4bits. As it happens we know the next 4bits are always 1779 * zero and so we can simply masked off the low u8 of the register 1780 * and treat it identically to reading from the HWSP (without having 1781 * to use explicit shifting and masking, and probably bifurcating 1782 * the code to handle the legacy mmio read). 1783 */ 1784 head = execlists->csb_head; 1785 tail = READ_ONCE(*execlists->csb_write); 1786 if (unlikely(head == tail)) 1787 return inactive; 1788 1789 /* 1790 * We will consume all events from HW, or at least pretend to. 1791 * 1792 * The sequence of events from the HW is deterministic, and derived 1793 * from our writes to the ELSP, with a smidgen of variability for 1794 * the arrival of the asynchronous requests wrt to the inflight 1795 * execution. If the HW sends an event that does not correspond with 1796 * the one we are expecting, we have to abandon all hope as we lose 1797 * all tracking of what the engine is actually executing. We will 1798 * only detect we are out of sequence with the HW when we get an 1799 * 'impossible' event because we have already drained our own 1800 * preemption/promotion queue. If this occurs, we know that we likely 1801 * lost track of execution earlier and must unwind and restart, the 1802 * simplest way is by stop processing the event queue and force the 1803 * engine to reset. 1804 */ 1805 execlists->csb_head = tail; 1806 ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail); 1807 1808 /* 1809 * Hopefully paired with a wmb() in HW! 1810 * 1811 * We must complete the read of the write pointer before any reads 1812 * from the CSB, so that we do not see stale values. Without an rmb 1813 * (lfence) the HW may speculatively perform the CSB[] reads *before* 1814 * we perform the READ_ONCE(*csb_write). 1815 */ 1816 rmb(); 1817 1818 /* Remember who was last running under the timer */ 1819 prev = inactive; 1820 *prev = NULL; 1821 1822 do { 1823 bool promote; 1824 u64 csb; 1825 1826 if (++head == num_entries) 1827 head = 0; 1828 1829 /* 1830 * We are flying near dragons again. 1831 * 1832 * We hold a reference to the request in execlist_port[] 1833 * but no more than that. We are operating in softirq 1834 * context and so cannot hold any mutex or sleep. That 1835 * prevents us stopping the requests we are processing 1836 * in port[] from being retired simultaneously (the 1837 * breadcrumb will be complete before we see the 1838 * context-switch). As we only hold the reference to the 1839 * request, any pointer chasing underneath the request 1840 * is subject to a potential use-after-free. Thus we 1841 * store all of the bookkeeping within port[] as 1842 * required, and avoid using unguarded pointers beneath 1843 * request itself. The same applies to the atomic 1844 * status notifier. 1845 */ 1846 1847 csb = csb_read(engine, buf + head); 1848 ENGINE_TRACE(engine, "csb[%d]: status=0x%08x:0x%08x\n", 1849 head, upper_32_bits(csb), lower_32_bits(csb)); 1850 1851 if (INTEL_GEN(engine->i915) >= 12) 1852 promote = gen12_csb_parse(csb); 1853 else 1854 promote = gen8_csb_parse(csb); 1855 if (promote) { 1856 struct i915_request * const *old = execlists->active; 1857 1858 if (GEM_WARN_ON(!*execlists->pending)) { 1859 execlists->error_interrupt |= ERROR_CSB; 1860 break; 1861 } 1862 1863 ring_set_paused(engine, 0); 1864 1865 /* Point active to the new ELSP; prevent overwriting */ 1866 WRITE_ONCE(execlists->active, execlists->pending); 1867 smp_wmb(); /* notify execlists_active() */ 1868 1869 /* cancel old inflight, prepare for switch */ 1870 trace_ports(execlists, "preempted", old); 1871 while (*old) 1872 *inactive++ = *old++; 1873 1874 /* switch pending to inflight */ 1875 GEM_BUG_ON(!assert_pending_valid(execlists, "promote")); 1876 copy_ports(execlists->inflight, 1877 execlists->pending, 1878 execlists_num_ports(execlists)); 1879 smp_wmb(); /* complete the seqlock */ 1880 WRITE_ONCE(execlists->active, execlists->inflight); 1881 1882 /* XXX Magic delay for tgl */ 1883 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 1884 1885 WRITE_ONCE(execlists->pending[0], NULL); 1886 } else { 1887 if (GEM_WARN_ON(!*execlists->active)) { 1888 execlists->error_interrupt |= ERROR_CSB; 1889 break; 1890 } 1891 1892 /* port0 completed, advanced to port1 */ 1893 trace_ports(execlists, "completed", execlists->active); 1894 1895 /* 1896 * We rely on the hardware being strongly 1897 * ordered, that the breadcrumb write is 1898 * coherent (visible from the CPU) before the 1899 * user interrupt is processed. One might assume 1900 * that the breadcrumb write being before the 1901 * user interrupt and the CS event for the context 1902 * switch would therefore be before the CS event 1903 * itself... 1904 */ 1905 if (GEM_SHOW_DEBUG() && 1906 !__i915_request_is_complete(*execlists->active)) { 1907 struct i915_request *rq = *execlists->active; 1908 const u32 *regs __maybe_unused = 1909 rq->context->lrc_reg_state; 1910 1911 ENGINE_TRACE(engine, 1912 "context completed before request!\n"); 1913 ENGINE_TRACE(engine, 1914 "ring:{start:0x%08x, head:%04x, tail:%04x, ctl:%08x, mode:%08x}\n", 1915 ENGINE_READ(engine, RING_START), 1916 ENGINE_READ(engine, RING_HEAD) & HEAD_ADDR, 1917 ENGINE_READ(engine, RING_TAIL) & TAIL_ADDR, 1918 ENGINE_READ(engine, RING_CTL), 1919 ENGINE_READ(engine, RING_MI_MODE)); 1920 ENGINE_TRACE(engine, 1921 "rq:{start:%08x, head:%04x, tail:%04x, seqno:%llx:%d, hwsp:%d}, ", 1922 i915_ggtt_offset(rq->ring->vma), 1923 rq->head, rq->tail, 1924 rq->fence.context, 1925 lower_32_bits(rq->fence.seqno), 1926 hwsp_seqno(rq)); 1927 ENGINE_TRACE(engine, 1928 "ctx:{start:%08x, head:%04x, tail:%04x}, ", 1929 regs[CTX_RING_START], 1930 regs[CTX_RING_HEAD], 1931 regs[CTX_RING_TAIL]); 1932 } 1933 1934 *inactive++ = *execlists->active++; 1935 1936 GEM_BUG_ON(execlists->active - execlists->inflight > 1937 execlists_num_ports(execlists)); 1938 } 1939 } while (head != tail); 1940 1941 /* 1942 * Gen11 has proven to fail wrt global observation point between 1943 * entry and tail update, failing on the ordering and thus 1944 * we see an old entry in the context status buffer. 1945 * 1946 * Forcibly evict out entries for the next gpu csb update, 1947 * to increase the odds that we get a fresh entries with non 1948 * working hardware. The cost for doing so comes out mostly with 1949 * the wash as hardware, working or not, will need to do the 1950 * invalidation before. 1951 */ 1952 invalidate_csb_entries(&buf[0], &buf[num_entries - 1]); 1953 1954 /* 1955 * We assume that any event reflects a change in context flow 1956 * and merits a fresh timeslice. We reinstall the timer after 1957 * inspecting the queue to see if we need to resumbit. 1958 */ 1959 if (*prev != *execlists->active) /* elide lite-restores */ 1960 new_timeslice(execlists); 1961 1962 return inactive; 1963 } 1964 1965 static void post_process_csb(struct i915_request **port, 1966 struct i915_request **last) 1967 { 1968 while (port != last) 1969 execlists_schedule_out(*port++); 1970 } 1971 1972 static void __execlists_hold(struct i915_request *rq) 1973 { 1974 LIST_HEAD(list); 1975 1976 do { 1977 struct i915_dependency *p; 1978 1979 if (i915_request_is_active(rq)) 1980 __i915_request_unsubmit(rq); 1981 1982 clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 1983 list_move_tail(&rq->sched.link, &rq->engine->active.hold); 1984 i915_request_set_hold(rq); 1985 RQ_TRACE(rq, "on hold\n"); 1986 1987 for_each_waiter(p, rq) { 1988 struct i915_request *w = 1989 container_of(p->waiter, typeof(*w), sched); 1990 1991 if (p->flags & I915_DEPENDENCY_WEAK) 1992 continue; 1993 1994 /* Leave semaphores spinning on the other engines */ 1995 if (w->engine != rq->engine) 1996 continue; 1997 1998 if (!i915_request_is_ready(w)) 1999 continue; 2000 2001 if (__i915_request_is_complete(w)) 2002 continue; 2003 2004 if (i915_request_on_hold(w)) 2005 continue; 2006 2007 list_move_tail(&w->sched.link, &list); 2008 } 2009 2010 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 2011 } while (rq); 2012 } 2013 2014 static bool execlists_hold(struct intel_engine_cs *engine, 2015 struct i915_request *rq) 2016 { 2017 if (i915_request_on_hold(rq)) 2018 return false; 2019 2020 spin_lock_irq(&engine->active.lock); 2021 2022 if (__i915_request_is_complete(rq)) { /* too late! */ 2023 rq = NULL; 2024 goto unlock; 2025 } 2026 2027 /* 2028 * Transfer this request onto the hold queue to prevent it 2029 * being resumbitted to HW (and potentially completed) before we have 2030 * released it. Since we may have already submitted following 2031 * requests, we need to remove those as well. 2032 */ 2033 GEM_BUG_ON(i915_request_on_hold(rq)); 2034 GEM_BUG_ON(rq->engine != engine); 2035 __execlists_hold(rq); 2036 GEM_BUG_ON(list_empty(&engine->active.hold)); 2037 2038 unlock: 2039 spin_unlock_irq(&engine->active.lock); 2040 return rq; 2041 } 2042 2043 static bool hold_request(const struct i915_request *rq) 2044 { 2045 struct i915_dependency *p; 2046 bool result = false; 2047 2048 /* 2049 * If one of our ancestors is on hold, we must also be on hold, 2050 * otherwise we will bypass it and execute before it. 2051 */ 2052 rcu_read_lock(); 2053 for_each_signaler(p, rq) { 2054 const struct i915_request *s = 2055 container_of(p->signaler, typeof(*s), sched); 2056 2057 if (s->engine != rq->engine) 2058 continue; 2059 2060 result = i915_request_on_hold(s); 2061 if (result) 2062 break; 2063 } 2064 rcu_read_unlock(); 2065 2066 return result; 2067 } 2068 2069 static void __execlists_unhold(struct i915_request *rq) 2070 { 2071 LIST_HEAD(list); 2072 2073 do { 2074 struct i915_dependency *p; 2075 2076 RQ_TRACE(rq, "hold release\n"); 2077 2078 GEM_BUG_ON(!i915_request_on_hold(rq)); 2079 GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit)); 2080 2081 i915_request_clear_hold(rq); 2082 list_move_tail(&rq->sched.link, 2083 i915_sched_lookup_priolist(rq->engine, 2084 rq_prio(rq))); 2085 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2086 2087 /* Also release any children on this engine that are ready */ 2088 for_each_waiter(p, rq) { 2089 struct i915_request *w = 2090 container_of(p->waiter, typeof(*w), sched); 2091 2092 if (p->flags & I915_DEPENDENCY_WEAK) 2093 continue; 2094 2095 /* Propagate any change in error status */ 2096 if (rq->fence.error) 2097 i915_request_set_error_once(w, rq->fence.error); 2098 2099 if (w->engine != rq->engine) 2100 continue; 2101 2102 if (!i915_request_on_hold(w)) 2103 continue; 2104 2105 /* Check that no other parents are also on hold */ 2106 if (hold_request(w)) 2107 continue; 2108 2109 list_move_tail(&w->sched.link, &list); 2110 } 2111 2112 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 2113 } while (rq); 2114 } 2115 2116 static void execlists_unhold(struct intel_engine_cs *engine, 2117 struct i915_request *rq) 2118 { 2119 spin_lock_irq(&engine->active.lock); 2120 2121 /* 2122 * Move this request back to the priority queue, and all of its 2123 * children and grandchildren that were suspended along with it. 2124 */ 2125 __execlists_unhold(rq); 2126 2127 if (rq_prio(rq) > engine->execlists.queue_priority_hint) { 2128 engine->execlists.queue_priority_hint = rq_prio(rq); 2129 tasklet_hi_schedule(&engine->execlists.tasklet); 2130 } 2131 2132 spin_unlock_irq(&engine->active.lock); 2133 } 2134 2135 struct execlists_capture { 2136 struct work_struct work; 2137 struct i915_request *rq; 2138 struct i915_gpu_coredump *error; 2139 }; 2140 2141 static void execlists_capture_work(struct work_struct *work) 2142 { 2143 struct execlists_capture *cap = container_of(work, typeof(*cap), work); 2144 const gfp_t gfp = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN; 2145 struct intel_engine_cs *engine = cap->rq->engine; 2146 struct intel_gt_coredump *gt = cap->error->gt; 2147 struct intel_engine_capture_vma *vma; 2148 2149 /* Compress all the objects attached to the request, slow! */ 2150 vma = intel_engine_coredump_add_request(gt->engine, cap->rq, gfp); 2151 if (vma) { 2152 struct i915_vma_compress *compress = 2153 i915_vma_capture_prepare(gt); 2154 2155 intel_engine_coredump_add_vma(gt->engine, vma, compress); 2156 i915_vma_capture_finish(gt, compress); 2157 } 2158 2159 gt->simulated = gt->engine->simulated; 2160 cap->error->simulated = gt->simulated; 2161 2162 /* Publish the error state, and announce it to the world */ 2163 i915_error_state_store(cap->error); 2164 i915_gpu_coredump_put(cap->error); 2165 2166 /* Return this request and all that depend upon it for signaling */ 2167 execlists_unhold(engine, cap->rq); 2168 i915_request_put(cap->rq); 2169 2170 kfree(cap); 2171 } 2172 2173 static struct execlists_capture *capture_regs(struct intel_engine_cs *engine) 2174 { 2175 const gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN; 2176 struct execlists_capture *cap; 2177 2178 cap = kmalloc(sizeof(*cap), gfp); 2179 if (!cap) 2180 return NULL; 2181 2182 cap->error = i915_gpu_coredump_alloc(engine->i915, gfp); 2183 if (!cap->error) 2184 goto err_cap; 2185 2186 cap->error->gt = intel_gt_coredump_alloc(engine->gt, gfp); 2187 if (!cap->error->gt) 2188 goto err_gpu; 2189 2190 cap->error->gt->engine = intel_engine_coredump_alloc(engine, gfp); 2191 if (!cap->error->gt->engine) 2192 goto err_gt; 2193 2194 cap->error->gt->engine->hung = true; 2195 2196 return cap; 2197 2198 err_gt: 2199 kfree(cap->error->gt); 2200 err_gpu: 2201 kfree(cap->error); 2202 err_cap: 2203 kfree(cap); 2204 return NULL; 2205 } 2206 2207 static struct i915_request * 2208 active_context(struct intel_engine_cs *engine, u32 ccid) 2209 { 2210 const struct intel_engine_execlists * const el = &engine->execlists; 2211 struct i915_request * const *port, *rq; 2212 2213 /* 2214 * Use the most recent result from process_csb(), but just in case 2215 * we trigger an error (via interrupt) before the first CS event has 2216 * been written, peek at the next submission. 2217 */ 2218 2219 for (port = el->active; (rq = *port); port++) { 2220 if (rq->context->lrc.ccid == ccid) { 2221 ENGINE_TRACE(engine, 2222 "ccid:%x found at active:%zd\n", 2223 ccid, port - el->active); 2224 return rq; 2225 } 2226 } 2227 2228 for (port = el->pending; (rq = *port); port++) { 2229 if (rq->context->lrc.ccid == ccid) { 2230 ENGINE_TRACE(engine, 2231 "ccid:%x found at pending:%zd\n", 2232 ccid, port - el->pending); 2233 return rq; 2234 } 2235 } 2236 2237 ENGINE_TRACE(engine, "ccid:%x not found\n", ccid); 2238 return NULL; 2239 } 2240 2241 static u32 active_ccid(struct intel_engine_cs *engine) 2242 { 2243 return ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI); 2244 } 2245 2246 static void execlists_capture(struct intel_engine_cs *engine) 2247 { 2248 struct execlists_capture *cap; 2249 2250 if (!IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)) 2251 return; 2252 2253 /* 2254 * We need to _quickly_ capture the engine state before we reset. 2255 * We are inside an atomic section (softirq) here and we are delaying 2256 * the forced preemption event. 2257 */ 2258 cap = capture_regs(engine); 2259 if (!cap) 2260 return; 2261 2262 spin_lock_irq(&engine->active.lock); 2263 cap->rq = active_context(engine, active_ccid(engine)); 2264 if (cap->rq) { 2265 cap->rq = active_request(cap->rq->context->timeline, cap->rq); 2266 cap->rq = i915_request_get_rcu(cap->rq); 2267 } 2268 spin_unlock_irq(&engine->active.lock); 2269 if (!cap->rq) 2270 goto err_free; 2271 2272 /* 2273 * Remove the request from the execlists queue, and take ownership 2274 * of the request. We pass it to our worker who will _slowly_ compress 2275 * all the pages the _user_ requested for debugging their batch, after 2276 * which we return it to the queue for signaling. 2277 * 2278 * By removing them from the execlists queue, we also remove the 2279 * requests from being processed by __unwind_incomplete_requests() 2280 * during the intel_engine_reset(), and so they will *not* be replayed 2281 * afterwards. 2282 * 2283 * Note that because we have not yet reset the engine at this point, 2284 * it is possible for the request that we have identified as being 2285 * guilty, did in fact complete and we will then hit an arbitration 2286 * point allowing the outstanding preemption to succeed. The likelihood 2287 * of that is very low (as capturing of the engine registers should be 2288 * fast enough to run inside an irq-off atomic section!), so we will 2289 * simply hold that request accountable for being non-preemptible 2290 * long enough to force the reset. 2291 */ 2292 if (!execlists_hold(engine, cap->rq)) 2293 goto err_rq; 2294 2295 INIT_WORK(&cap->work, execlists_capture_work); 2296 schedule_work(&cap->work); 2297 return; 2298 2299 err_rq: 2300 i915_request_put(cap->rq); 2301 err_free: 2302 i915_gpu_coredump_put(cap->error); 2303 kfree(cap); 2304 } 2305 2306 static void execlists_reset(struct intel_engine_cs *engine, const char *msg) 2307 { 2308 const unsigned int bit = I915_RESET_ENGINE + engine->id; 2309 unsigned long *lock = &engine->gt->reset.flags; 2310 2311 if (!intel_has_reset_engine(engine->gt)) 2312 return; 2313 2314 if (test_and_set_bit(bit, lock)) 2315 return; 2316 2317 ENGINE_TRACE(engine, "reset for %s\n", msg); 2318 2319 /* Mark this tasklet as disabled to avoid waiting for it to complete */ 2320 tasklet_disable_nosync(&engine->execlists.tasklet); 2321 2322 ring_set_paused(engine, 1); /* Freeze the current request in place */ 2323 execlists_capture(engine); 2324 intel_engine_reset(engine, msg); 2325 2326 tasklet_enable(&engine->execlists.tasklet); 2327 clear_and_wake_up_bit(bit, lock); 2328 } 2329 2330 static bool preempt_timeout(const struct intel_engine_cs *const engine) 2331 { 2332 const struct timer_list *t = &engine->execlists.preempt; 2333 2334 if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT) 2335 return false; 2336 2337 if (!timer_expired(t)) 2338 return false; 2339 2340 return engine->execlists.pending[0]; 2341 } 2342 2343 /* 2344 * Check the unread Context Status Buffers and manage the submission of new 2345 * contexts to the ELSP accordingly. 2346 */ 2347 static void execlists_submission_tasklet(unsigned long data) 2348 { 2349 struct intel_engine_cs * const engine = (struct intel_engine_cs *)data; 2350 struct i915_request *post[2 * EXECLIST_MAX_PORTS]; 2351 struct i915_request **inactive; 2352 2353 rcu_read_lock(); 2354 inactive = process_csb(engine, post); 2355 GEM_BUG_ON(inactive - post > ARRAY_SIZE(post)); 2356 2357 if (unlikely(preempt_timeout(engine))) { 2358 cancel_timer(&engine->execlists.preempt); 2359 engine->execlists.error_interrupt |= ERROR_PREEMPT; 2360 } 2361 2362 if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) { 2363 const char *msg; 2364 2365 /* Generate the error message in priority wrt to the user! */ 2366 if (engine->execlists.error_interrupt & GENMASK(15, 0)) 2367 msg = "CS error"; /* thrown by a user payload */ 2368 else if (engine->execlists.error_interrupt & ERROR_CSB) 2369 msg = "invalid CSB event"; 2370 else if (engine->execlists.error_interrupt & ERROR_PREEMPT) 2371 msg = "preemption time out"; 2372 else 2373 msg = "internal error"; 2374 2375 engine->execlists.error_interrupt = 0; 2376 execlists_reset(engine, msg); 2377 } 2378 2379 if (!engine->execlists.pending[0]) { 2380 execlists_dequeue_irq(engine); 2381 start_timeslice(engine); 2382 } 2383 2384 post_process_csb(post, inactive); 2385 rcu_read_unlock(); 2386 } 2387 2388 static void __execlists_kick(struct intel_engine_execlists *execlists) 2389 { 2390 /* Kick the tasklet for some interrupt coalescing and reset handling */ 2391 tasklet_hi_schedule(&execlists->tasklet); 2392 } 2393 2394 #define execlists_kick(t, member) \ 2395 __execlists_kick(container_of(t, struct intel_engine_execlists, member)) 2396 2397 static void execlists_timeslice(struct timer_list *timer) 2398 { 2399 execlists_kick(timer, timer); 2400 } 2401 2402 static void execlists_preempt(struct timer_list *timer) 2403 { 2404 execlists_kick(timer, preempt); 2405 } 2406 2407 static void queue_request(struct intel_engine_cs *engine, 2408 struct i915_request *rq) 2409 { 2410 GEM_BUG_ON(!list_empty(&rq->sched.link)); 2411 list_add_tail(&rq->sched.link, 2412 i915_sched_lookup_priolist(engine, rq_prio(rq))); 2413 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2414 } 2415 2416 static bool submit_queue(struct intel_engine_cs *engine, 2417 const struct i915_request *rq) 2418 { 2419 struct intel_engine_execlists *execlists = &engine->execlists; 2420 2421 if (rq_prio(rq) <= execlists->queue_priority_hint) 2422 return false; 2423 2424 execlists->queue_priority_hint = rq_prio(rq); 2425 return true; 2426 } 2427 2428 static bool ancestor_on_hold(const struct intel_engine_cs *engine, 2429 const struct i915_request *rq) 2430 { 2431 GEM_BUG_ON(i915_request_on_hold(rq)); 2432 return !list_empty(&engine->active.hold) && hold_request(rq); 2433 } 2434 2435 static void execlists_submit_request(struct i915_request *request) 2436 { 2437 struct intel_engine_cs *engine = request->engine; 2438 unsigned long flags; 2439 2440 /* Will be called from irq-context when using foreign fences. */ 2441 spin_lock_irqsave(&engine->active.lock, flags); 2442 2443 if (unlikely(ancestor_on_hold(engine, request))) { 2444 RQ_TRACE(request, "ancestor on hold\n"); 2445 list_add_tail(&request->sched.link, &engine->active.hold); 2446 i915_request_set_hold(request); 2447 } else { 2448 queue_request(engine, request); 2449 2450 GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); 2451 GEM_BUG_ON(list_empty(&request->sched.link)); 2452 2453 if (submit_queue(engine, request)) 2454 __execlists_kick(&engine->execlists); 2455 } 2456 2457 spin_unlock_irqrestore(&engine->active.lock, flags); 2458 } 2459 2460 static int execlists_context_pre_pin(struct intel_context *ce, 2461 struct i915_gem_ww_ctx *ww, 2462 void **vaddr) 2463 { 2464 return lrc_pre_pin(ce, ce->engine, ww, vaddr); 2465 } 2466 2467 static int execlists_context_pin(struct intel_context *ce, void *vaddr) 2468 { 2469 return lrc_pin(ce, ce->engine, vaddr); 2470 } 2471 2472 static int execlists_context_alloc(struct intel_context *ce) 2473 { 2474 return lrc_alloc(ce, ce->engine); 2475 } 2476 2477 static const struct intel_context_ops execlists_context_ops = { 2478 .flags = COPS_HAS_INFLIGHT, 2479 2480 .alloc = execlists_context_alloc, 2481 2482 .pre_pin = execlists_context_pre_pin, 2483 .pin = execlists_context_pin, 2484 .unpin = lrc_unpin, 2485 .post_unpin = lrc_post_unpin, 2486 2487 .enter = intel_context_enter_engine, 2488 .exit = intel_context_exit_engine, 2489 2490 .reset = lrc_reset, 2491 .destroy = lrc_destroy, 2492 }; 2493 2494 static int emit_pdps(struct i915_request *rq) 2495 { 2496 const struct intel_engine_cs * const engine = rq->engine; 2497 struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(rq->context->vm); 2498 int err, i; 2499 u32 *cs; 2500 2501 GEM_BUG_ON(intel_vgpu_active(rq->engine->i915)); 2502 2503 /* 2504 * Beware ye of the dragons, this sequence is magic! 2505 * 2506 * Small changes to this sequence can cause anything from 2507 * GPU hangs to forcewake errors and machine lockups! 2508 */ 2509 2510 cs = intel_ring_begin(rq, 2); 2511 if (IS_ERR(cs)) 2512 return PTR_ERR(cs); 2513 2514 *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; 2515 *cs++ = MI_NOOP; 2516 intel_ring_advance(rq, cs); 2517 2518 /* Flush any residual operations from the context load */ 2519 err = engine->emit_flush(rq, EMIT_FLUSH); 2520 if (err) 2521 return err; 2522 2523 /* Magic required to prevent forcewake errors! */ 2524 err = engine->emit_flush(rq, EMIT_INVALIDATE); 2525 if (err) 2526 return err; 2527 2528 cs = intel_ring_begin(rq, 4 * GEN8_3LVL_PDPES + 2); 2529 if (IS_ERR(cs)) 2530 return PTR_ERR(cs); 2531 2532 /* Ensure the LRI have landed before we invalidate & continue */ 2533 *cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED; 2534 for (i = GEN8_3LVL_PDPES; i--; ) { 2535 const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i); 2536 u32 base = engine->mmio_base; 2537 2538 *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, i)); 2539 *cs++ = upper_32_bits(pd_daddr); 2540 *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(base, i)); 2541 *cs++ = lower_32_bits(pd_daddr); 2542 } 2543 *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; 2544 intel_ring_advance(rq, cs); 2545 2546 intel_ring_advance(rq, cs); 2547 2548 return 0; 2549 } 2550 2551 static int execlists_request_alloc(struct i915_request *request) 2552 { 2553 int ret; 2554 2555 GEM_BUG_ON(!intel_context_is_pinned(request->context)); 2556 2557 /* 2558 * Flush enough space to reduce the likelihood of waiting after 2559 * we start building the request - in which case we will just 2560 * have to repeat work. 2561 */ 2562 request->reserved_space += EXECLISTS_REQUEST_SIZE; 2563 2564 /* 2565 * Note that after this point, we have committed to using 2566 * this request as it is being used to both track the 2567 * state of engine initialisation and liveness of the 2568 * golden renderstate above. Think twice before you try 2569 * to cancel/unwind this request now. 2570 */ 2571 2572 if (!i915_vm_is_4lvl(request->context->vm)) { 2573 ret = emit_pdps(request); 2574 if (ret) 2575 return ret; 2576 } 2577 2578 /* Unconditionally invalidate GPU caches and TLBs. */ 2579 ret = request->engine->emit_flush(request, EMIT_INVALIDATE); 2580 if (ret) 2581 return ret; 2582 2583 request->reserved_space -= EXECLISTS_REQUEST_SIZE; 2584 return 0; 2585 } 2586 2587 static void reset_csb_pointers(struct intel_engine_cs *engine) 2588 { 2589 struct intel_engine_execlists * const execlists = &engine->execlists; 2590 const unsigned int reset_value = execlists->csb_size - 1; 2591 2592 ring_set_paused(engine, 0); 2593 2594 /* 2595 * Sometimes Icelake forgets to reset its pointers on a GPU reset. 2596 * Bludgeon them with a mmio update to be sure. 2597 */ 2598 ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR, 2599 0xffff << 16 | reset_value << 8 | reset_value); 2600 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 2601 2602 /* 2603 * After a reset, the HW starts writing into CSB entry [0]. We 2604 * therefore have to set our HEAD pointer back one entry so that 2605 * the *first* entry we check is entry 0. To complicate this further, 2606 * as we don't wait for the first interrupt after reset, we have to 2607 * fake the HW write to point back to the last entry so that our 2608 * inline comparison of our cached head position against the last HW 2609 * write works even before the first interrupt. 2610 */ 2611 execlists->csb_head = reset_value; 2612 WRITE_ONCE(*execlists->csb_write, reset_value); 2613 wmb(); /* Make sure this is visible to HW (paranoia?) */ 2614 2615 /* Check that the GPU does indeed update the CSB entries! */ 2616 memset(execlists->csb_status, -1, (reset_value + 1) * sizeof(u64)); 2617 invalidate_csb_entries(&execlists->csb_status[0], 2618 &execlists->csb_status[reset_value]); 2619 2620 /* Once more for luck and our trusty paranoia */ 2621 ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR, 2622 0xffff << 16 | reset_value << 8 | reset_value); 2623 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 2624 2625 GEM_BUG_ON(READ_ONCE(*execlists->csb_write) != reset_value); 2626 } 2627 2628 static void sanitize_hwsp(struct intel_engine_cs *engine) 2629 { 2630 struct intel_timeline *tl; 2631 2632 list_for_each_entry(tl, &engine->status_page.timelines, engine_link) 2633 intel_timeline_reset_seqno(tl); 2634 } 2635 2636 static void execlists_sanitize(struct intel_engine_cs *engine) 2637 { 2638 GEM_BUG_ON(execlists_active(&engine->execlists)); 2639 2640 /* 2641 * Poison residual state on resume, in case the suspend didn't! 2642 * 2643 * We have to assume that across suspend/resume (or other loss 2644 * of control) that the contents of our pinned buffers has been 2645 * lost, replaced by garbage. Since this doesn't always happen, 2646 * let's poison such state so that we more quickly spot when 2647 * we falsely assume it has been preserved. 2648 */ 2649 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 2650 memset(engine->status_page.addr, POISON_INUSE, PAGE_SIZE); 2651 2652 reset_csb_pointers(engine); 2653 2654 /* 2655 * The kernel_context HWSP is stored in the status_page. As above, 2656 * that may be lost on resume/initialisation, and so we need to 2657 * reset the value in the HWSP. 2658 */ 2659 sanitize_hwsp(engine); 2660 2661 /* And scrub the dirty cachelines for the HWSP */ 2662 clflush_cache_range(engine->status_page.addr, PAGE_SIZE); 2663 } 2664 2665 static void enable_error_interrupt(struct intel_engine_cs *engine) 2666 { 2667 u32 status; 2668 2669 engine->execlists.error_interrupt = 0; 2670 ENGINE_WRITE(engine, RING_EMR, ~0u); 2671 ENGINE_WRITE(engine, RING_EIR, ~0u); /* clear all existing errors */ 2672 2673 status = ENGINE_READ(engine, RING_ESR); 2674 if (unlikely(status)) { 2675 drm_err(&engine->i915->drm, 2676 "engine '%s' resumed still in error: %08x\n", 2677 engine->name, status); 2678 __intel_gt_reset(engine->gt, engine->mask); 2679 } 2680 2681 /* 2682 * On current gen8+, we have 2 signals to play with 2683 * 2684 * - I915_ERROR_INSTUCTION (bit 0) 2685 * 2686 * Generate an error if the command parser encounters an invalid 2687 * instruction 2688 * 2689 * This is a fatal error. 2690 * 2691 * - CP_PRIV (bit 2) 2692 * 2693 * Generate an error on privilege violation (where the CP replaces 2694 * the instruction with a no-op). This also fires for writes into 2695 * read-only scratch pages. 2696 * 2697 * This is a non-fatal error, parsing continues. 2698 * 2699 * * there are a few others defined for odd HW that we do not use 2700 * 2701 * Since CP_PRIV fires for cases where we have chosen to ignore the 2702 * error (as the HW is validating and suppressing the mistakes), we 2703 * only unmask the instruction error bit. 2704 */ 2705 ENGINE_WRITE(engine, RING_EMR, ~I915_ERROR_INSTRUCTION); 2706 } 2707 2708 static void enable_execlists(struct intel_engine_cs *engine) 2709 { 2710 u32 mode; 2711 2712 assert_forcewakes_active(engine->uncore, FORCEWAKE_ALL); 2713 2714 intel_engine_set_hwsp_writemask(engine, ~0u); /* HWSTAM */ 2715 2716 if (INTEL_GEN(engine->i915) >= 11) 2717 mode = _MASKED_BIT_ENABLE(GEN11_GFX_DISABLE_LEGACY_MODE); 2718 else 2719 mode = _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE); 2720 ENGINE_WRITE_FW(engine, RING_MODE_GEN7, mode); 2721 2722 ENGINE_WRITE_FW(engine, RING_MI_MODE, _MASKED_BIT_DISABLE(STOP_RING)); 2723 2724 ENGINE_WRITE_FW(engine, 2725 RING_HWS_PGA, 2726 i915_ggtt_offset(engine->status_page.vma)); 2727 ENGINE_POSTING_READ(engine, RING_HWS_PGA); 2728 2729 enable_error_interrupt(engine); 2730 } 2731 2732 static bool unexpected_starting_state(struct intel_engine_cs *engine) 2733 { 2734 bool unexpected = false; 2735 2736 if (ENGINE_READ_FW(engine, RING_MI_MODE) & STOP_RING) { 2737 drm_dbg(&engine->i915->drm, 2738 "STOP_RING still set in RING_MI_MODE\n"); 2739 unexpected = true; 2740 } 2741 2742 return unexpected; 2743 } 2744 2745 static int execlists_resume(struct intel_engine_cs *engine) 2746 { 2747 intel_mocs_init_engine(engine); 2748 2749 intel_breadcrumbs_reset(engine->breadcrumbs); 2750 2751 if (GEM_SHOW_DEBUG() && unexpected_starting_state(engine)) { 2752 struct drm_printer p = drm_debug_printer(__func__); 2753 2754 intel_engine_dump(engine, &p, NULL); 2755 } 2756 2757 enable_execlists(engine); 2758 2759 return 0; 2760 } 2761 2762 static void execlists_reset_prepare(struct intel_engine_cs *engine) 2763 { 2764 struct intel_engine_execlists * const execlists = &engine->execlists; 2765 2766 ENGINE_TRACE(engine, "depth<-%d\n", 2767 atomic_read(&execlists->tasklet.count)); 2768 2769 /* 2770 * Prevent request submission to the hardware until we have 2771 * completed the reset in i915_gem_reset_finish(). If a request 2772 * is completed by one engine, it may then queue a request 2773 * to a second via its execlists->tasklet *just* as we are 2774 * calling engine->resume() and also writing the ELSP. 2775 * Turning off the execlists->tasklet until the reset is over 2776 * prevents the race. 2777 */ 2778 __tasklet_disable_sync_once(&execlists->tasklet); 2779 GEM_BUG_ON(!reset_in_progress(execlists)); 2780 2781 /* 2782 * We stop engines, otherwise we might get failed reset and a 2783 * dead gpu (on elk). Also as modern gpu as kbl can suffer 2784 * from system hang if batchbuffer is progressing when 2785 * the reset is issued, regardless of READY_TO_RESET ack. 2786 * Thus assume it is best to stop engines on all gens 2787 * where we have a gpu reset. 2788 * 2789 * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES) 2790 * 2791 * FIXME: Wa for more modern gens needs to be validated 2792 */ 2793 ring_set_paused(engine, 1); 2794 intel_engine_stop_cs(engine); 2795 2796 engine->execlists.reset_ccid = active_ccid(engine); 2797 } 2798 2799 static struct i915_request ** 2800 reset_csb(struct intel_engine_cs *engine, struct i915_request **inactive) 2801 { 2802 struct intel_engine_execlists * const execlists = &engine->execlists; 2803 2804 mb(); /* paranoia: read the CSB pointers from after the reset */ 2805 clflush(execlists->csb_write); 2806 mb(); 2807 2808 inactive = process_csb(engine, inactive); /* drain preemption events */ 2809 2810 /* Following the reset, we need to reload the CSB read/write pointers */ 2811 reset_csb_pointers(engine); 2812 2813 return inactive; 2814 } 2815 2816 static void 2817 execlists_reset_active(struct intel_engine_cs *engine, bool stalled) 2818 { 2819 struct intel_context *ce; 2820 struct i915_request *rq; 2821 u32 head; 2822 2823 /* 2824 * Save the currently executing context, even if we completed 2825 * its request, it was still running at the time of the 2826 * reset and will have been clobbered. 2827 */ 2828 rq = active_context(engine, engine->execlists.reset_ccid); 2829 if (!rq) 2830 return; 2831 2832 ce = rq->context; 2833 GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); 2834 2835 if (__i915_request_is_complete(rq)) { 2836 /* Idle context; tidy up the ring so we can restart afresh */ 2837 head = intel_ring_wrap(ce->ring, rq->tail); 2838 goto out_replay; 2839 } 2840 2841 /* We still have requests in-flight; the engine should be active */ 2842 GEM_BUG_ON(!intel_engine_pm_is_awake(engine)); 2843 2844 /* Context has requests still in-flight; it should not be idle! */ 2845 GEM_BUG_ON(i915_active_is_idle(&ce->active)); 2846 2847 rq = active_request(ce->timeline, rq); 2848 head = intel_ring_wrap(ce->ring, rq->head); 2849 GEM_BUG_ON(head == ce->ring->tail); 2850 2851 /* 2852 * If this request hasn't started yet, e.g. it is waiting on a 2853 * semaphore, we need to avoid skipping the request or else we 2854 * break the signaling chain. However, if the context is corrupt 2855 * the request will not restart and we will be stuck with a wedged 2856 * device. It is quite often the case that if we issue a reset 2857 * while the GPU is loading the context image, that the context 2858 * image becomes corrupt. 2859 * 2860 * Otherwise, if we have not started yet, the request should replay 2861 * perfectly and we do not need to flag the result as being erroneous. 2862 */ 2863 if (!__i915_request_has_started(rq)) 2864 goto out_replay; 2865 2866 /* 2867 * If the request was innocent, we leave the request in the ELSP 2868 * and will try to replay it on restarting. The context image may 2869 * have been corrupted by the reset, in which case we may have 2870 * to service a new GPU hang, but more likely we can continue on 2871 * without impact. 2872 * 2873 * If the request was guilty, we presume the context is corrupt 2874 * and have to at least restore the RING register in the context 2875 * image back to the expected values to skip over the guilty request. 2876 */ 2877 __i915_request_reset(rq, stalled); 2878 2879 /* 2880 * We want a simple context + ring to execute the breadcrumb update. 2881 * We cannot rely on the context being intact across the GPU hang, 2882 * so clear it and rebuild just what we need for the breadcrumb. 2883 * All pending requests for this context will be zapped, and any 2884 * future request will be after userspace has had the opportunity 2885 * to recreate its own state. 2886 */ 2887 out_replay: 2888 ENGINE_TRACE(engine, "replay {head:%04x, tail:%04x}\n", 2889 head, ce->ring->tail); 2890 lrc_reset_regs(ce, engine); 2891 ce->lrc.lrca = lrc_update_regs(ce, engine, head); 2892 } 2893 2894 static void execlists_reset_csb(struct intel_engine_cs *engine, bool stalled) 2895 { 2896 struct intel_engine_execlists * const execlists = &engine->execlists; 2897 struct i915_request *post[2 * EXECLIST_MAX_PORTS]; 2898 struct i915_request **inactive; 2899 2900 rcu_read_lock(); 2901 inactive = reset_csb(engine, post); 2902 2903 execlists_reset_active(engine, true); 2904 2905 inactive = cancel_port_requests(execlists, inactive); 2906 post_process_csb(post, inactive); 2907 rcu_read_unlock(); 2908 } 2909 2910 static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) 2911 { 2912 unsigned long flags; 2913 2914 ENGINE_TRACE(engine, "\n"); 2915 2916 /* Process the csb, find the guilty context and throw away */ 2917 execlists_reset_csb(engine, stalled); 2918 2919 /* Push back any incomplete requests for replay after the reset. */ 2920 rcu_read_lock(); 2921 spin_lock_irqsave(&engine->active.lock, flags); 2922 __unwind_incomplete_requests(engine); 2923 spin_unlock_irqrestore(&engine->active.lock, flags); 2924 rcu_read_unlock(); 2925 } 2926 2927 static void nop_submission_tasklet(unsigned long data) 2928 { 2929 struct intel_engine_cs * const engine = (struct intel_engine_cs *)data; 2930 2931 /* The driver is wedged; don't process any more events. */ 2932 WRITE_ONCE(engine->execlists.queue_priority_hint, INT_MIN); 2933 } 2934 2935 static void execlists_reset_cancel(struct intel_engine_cs *engine) 2936 { 2937 struct intel_engine_execlists * const execlists = &engine->execlists; 2938 struct i915_request *rq, *rn; 2939 struct rb_node *rb; 2940 unsigned long flags; 2941 2942 ENGINE_TRACE(engine, "\n"); 2943 2944 /* 2945 * Before we call engine->cancel_requests(), we should have exclusive 2946 * access to the submission state. This is arranged for us by the 2947 * caller disabling the interrupt generation, the tasklet and other 2948 * threads that may then access the same state, giving us a free hand 2949 * to reset state. However, we still need to let lockdep be aware that 2950 * we know this state may be accessed in hardirq context, so we 2951 * disable the irq around this manipulation and we want to keep 2952 * the spinlock focused on its duties and not accidentally conflate 2953 * coverage to the submission's irq state. (Similarly, although we 2954 * shouldn't need to disable irq around the manipulation of the 2955 * submission's irq state, we also wish to remind ourselves that 2956 * it is irq state.) 2957 */ 2958 execlists_reset_csb(engine, true); 2959 2960 rcu_read_lock(); 2961 spin_lock_irqsave(&engine->active.lock, flags); 2962 2963 /* Mark all executing requests as skipped. */ 2964 list_for_each_entry(rq, &engine->active.requests, sched.link) 2965 i915_request_mark_eio(rq); 2966 intel_engine_signal_breadcrumbs(engine); 2967 2968 /* Flush the queued requests to the timeline list (for retiring). */ 2969 while ((rb = rb_first_cached(&execlists->queue))) { 2970 struct i915_priolist *p = to_priolist(rb); 2971 int i; 2972 2973 priolist_for_each_request_consume(rq, rn, p, i) { 2974 i915_request_mark_eio(rq); 2975 __i915_request_submit(rq); 2976 } 2977 2978 rb_erase_cached(&p->node, &execlists->queue); 2979 i915_priolist_free(p); 2980 } 2981 2982 /* On-hold requests will be flushed to timeline upon their release */ 2983 list_for_each_entry(rq, &engine->active.hold, sched.link) 2984 i915_request_mark_eio(rq); 2985 2986 /* Cancel all attached virtual engines */ 2987 while ((rb = rb_first_cached(&execlists->virtual))) { 2988 struct virtual_engine *ve = 2989 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 2990 2991 rb_erase_cached(rb, &execlists->virtual); 2992 RB_CLEAR_NODE(rb); 2993 2994 spin_lock(&ve->base.active.lock); 2995 rq = fetch_and_zero(&ve->request); 2996 if (rq) { 2997 i915_request_mark_eio(rq); 2998 2999 rq->engine = engine; 3000 __i915_request_submit(rq); 3001 i915_request_put(rq); 3002 3003 ve->base.execlists.queue_priority_hint = INT_MIN; 3004 } 3005 spin_unlock(&ve->base.active.lock); 3006 } 3007 3008 /* Remaining _unready_ requests will be nop'ed when submitted */ 3009 3010 execlists->queue_priority_hint = INT_MIN; 3011 execlists->queue = RB_ROOT_CACHED; 3012 3013 GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet)); 3014 execlists->tasklet.func = nop_submission_tasklet; 3015 3016 spin_unlock_irqrestore(&engine->active.lock, flags); 3017 rcu_read_unlock(); 3018 } 3019 3020 static void execlists_reset_finish(struct intel_engine_cs *engine) 3021 { 3022 struct intel_engine_execlists * const execlists = &engine->execlists; 3023 3024 /* 3025 * After a GPU reset, we may have requests to replay. Do so now while 3026 * we still have the forcewake to be sure that the GPU is not allowed 3027 * to sleep before we restart and reload a context. 3028 * 3029 * If the GPU reset fails, the engine may still be alive with requests 3030 * inflight. We expect those to complete, or for the device to be 3031 * reset as the next level of recovery, and as a final resort we 3032 * will declare the device wedged. 3033 */ 3034 GEM_BUG_ON(!reset_in_progress(execlists)); 3035 3036 /* And kick in case we missed a new request submission. */ 3037 if (__tasklet_enable(&execlists->tasklet)) 3038 __execlists_kick(execlists); 3039 3040 ENGINE_TRACE(engine, "depth->%d\n", 3041 atomic_read(&execlists->tasklet.count)); 3042 } 3043 3044 static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine) 3045 { 3046 ENGINE_WRITE(engine, RING_IMR, 3047 ~(engine->irq_enable_mask | engine->irq_keep_mask)); 3048 ENGINE_POSTING_READ(engine, RING_IMR); 3049 } 3050 3051 static void gen8_logical_ring_disable_irq(struct intel_engine_cs *engine) 3052 { 3053 ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask); 3054 } 3055 3056 static void execlists_park(struct intel_engine_cs *engine) 3057 { 3058 cancel_timer(&engine->execlists.timer); 3059 cancel_timer(&engine->execlists.preempt); 3060 } 3061 3062 static bool can_preempt(struct intel_engine_cs *engine) 3063 { 3064 if (INTEL_GEN(engine->i915) > 8) 3065 return true; 3066 3067 /* GPGPU on bdw requires extra w/a; not implemented */ 3068 return engine->class != RENDER_CLASS; 3069 } 3070 3071 static void execlists_set_default_submission(struct intel_engine_cs *engine) 3072 { 3073 engine->submit_request = execlists_submit_request; 3074 engine->schedule = i915_schedule; 3075 engine->execlists.tasklet.func = execlists_submission_tasklet; 3076 3077 engine->reset.prepare = execlists_reset_prepare; 3078 engine->reset.rewind = execlists_reset_rewind; 3079 engine->reset.cancel = execlists_reset_cancel; 3080 engine->reset.finish = execlists_reset_finish; 3081 3082 engine->park = execlists_park; 3083 engine->unpark = NULL; 3084 3085 engine->flags |= I915_ENGINE_SUPPORTS_STATS; 3086 if (!intel_vgpu_active(engine->i915)) { 3087 engine->flags |= I915_ENGINE_HAS_SEMAPHORES; 3088 if (can_preempt(engine)) { 3089 engine->flags |= I915_ENGINE_HAS_PREEMPTION; 3090 if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION)) 3091 engine->flags |= I915_ENGINE_HAS_TIMESLICES; 3092 } 3093 } 3094 3095 if (intel_engine_has_preemption(engine)) 3096 engine->emit_bb_start = gen8_emit_bb_start; 3097 else 3098 engine->emit_bb_start = gen8_emit_bb_start_noarb; 3099 } 3100 3101 static void execlists_shutdown(struct intel_engine_cs *engine) 3102 { 3103 /* Synchronise with residual timers and any softirq they raise */ 3104 del_timer_sync(&engine->execlists.timer); 3105 del_timer_sync(&engine->execlists.preempt); 3106 tasklet_kill(&engine->execlists.tasklet); 3107 } 3108 3109 static void execlists_release(struct intel_engine_cs *engine) 3110 { 3111 engine->sanitize = NULL; /* no longer in control, nothing to sanitize */ 3112 3113 execlists_shutdown(engine); 3114 3115 intel_engine_cleanup_common(engine); 3116 lrc_fini_wa_ctx(engine); 3117 } 3118 3119 static void 3120 logical_ring_default_vfuncs(struct intel_engine_cs *engine) 3121 { 3122 /* Default vfuncs which can be overriden by each engine. */ 3123 3124 engine->resume = execlists_resume; 3125 3126 engine->cops = &execlists_context_ops; 3127 engine->request_alloc = execlists_request_alloc; 3128 3129 engine->emit_flush = gen8_emit_flush_xcs; 3130 engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb; 3131 engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs; 3132 if (INTEL_GEN(engine->i915) >= 12) { 3133 engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_xcs; 3134 engine->emit_flush = gen12_emit_flush_xcs; 3135 } 3136 engine->set_default_submission = execlists_set_default_submission; 3137 3138 if (INTEL_GEN(engine->i915) < 11) { 3139 engine->irq_enable = gen8_logical_ring_enable_irq; 3140 engine->irq_disable = gen8_logical_ring_disable_irq; 3141 } else { 3142 /* 3143 * TODO: On Gen11 interrupt masks need to be clear 3144 * to allow C6 entry. Keep interrupts enabled at 3145 * and take the hit of generating extra interrupts 3146 * until a more refined solution exists. 3147 */ 3148 } 3149 } 3150 3151 static void logical_ring_default_irqs(struct intel_engine_cs *engine) 3152 { 3153 unsigned int shift = 0; 3154 3155 if (INTEL_GEN(engine->i915) < 11) { 3156 const u8 irq_shifts[] = { 3157 [RCS0] = GEN8_RCS_IRQ_SHIFT, 3158 [BCS0] = GEN8_BCS_IRQ_SHIFT, 3159 [VCS0] = GEN8_VCS0_IRQ_SHIFT, 3160 [VCS1] = GEN8_VCS1_IRQ_SHIFT, 3161 [VECS0] = GEN8_VECS_IRQ_SHIFT, 3162 }; 3163 3164 shift = irq_shifts[engine->id]; 3165 } 3166 3167 engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << shift; 3168 engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift; 3169 engine->irq_keep_mask |= GT_CS_MASTER_ERROR_INTERRUPT << shift; 3170 engine->irq_keep_mask |= GT_WAIT_SEMAPHORE_INTERRUPT << shift; 3171 } 3172 3173 static void rcs_submission_override(struct intel_engine_cs *engine) 3174 { 3175 switch (INTEL_GEN(engine->i915)) { 3176 case 12: 3177 engine->emit_flush = gen12_emit_flush_rcs; 3178 engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_rcs; 3179 break; 3180 case 11: 3181 engine->emit_flush = gen11_emit_flush_rcs; 3182 engine->emit_fini_breadcrumb = gen11_emit_fini_breadcrumb_rcs; 3183 break; 3184 default: 3185 engine->emit_flush = gen8_emit_flush_rcs; 3186 engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs; 3187 break; 3188 } 3189 } 3190 3191 int intel_execlists_submission_setup(struct intel_engine_cs *engine) 3192 { 3193 struct intel_engine_execlists * const execlists = &engine->execlists; 3194 struct drm_i915_private *i915 = engine->i915; 3195 struct intel_uncore *uncore = engine->uncore; 3196 u32 base = engine->mmio_base; 3197 3198 tasklet_init(&engine->execlists.tasklet, 3199 execlists_submission_tasklet, (unsigned long)engine); 3200 timer_setup(&engine->execlists.timer, execlists_timeslice, 0); 3201 timer_setup(&engine->execlists.preempt, execlists_preempt, 0); 3202 3203 logical_ring_default_vfuncs(engine); 3204 logical_ring_default_irqs(engine); 3205 3206 if (engine->class == RENDER_CLASS) 3207 rcs_submission_override(engine); 3208 3209 lrc_init_wa_ctx(engine); 3210 3211 if (HAS_LOGICAL_RING_ELSQ(i915)) { 3212 execlists->submit_reg = uncore->regs + 3213 i915_mmio_reg_offset(RING_EXECLIST_SQ_CONTENTS(base)); 3214 execlists->ctrl_reg = uncore->regs + 3215 i915_mmio_reg_offset(RING_EXECLIST_CONTROL(base)); 3216 } else { 3217 execlists->submit_reg = uncore->regs + 3218 i915_mmio_reg_offset(RING_ELSP(base)); 3219 } 3220 3221 execlists->csb_status = 3222 (u64 *)&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX]; 3223 3224 execlists->csb_write = 3225 &engine->status_page.addr[intel_hws_csb_write_index(i915)]; 3226 3227 if (INTEL_GEN(i915) < 11) 3228 execlists->csb_size = GEN8_CSB_ENTRIES; 3229 else 3230 execlists->csb_size = GEN11_CSB_ENTRIES; 3231 3232 engine->context_tag = GENMASK(BITS_PER_LONG - 2, 0); 3233 if (INTEL_GEN(engine->i915) >= 11) { 3234 execlists->ccid |= engine->instance << (GEN11_ENGINE_INSTANCE_SHIFT - 32); 3235 execlists->ccid |= engine->class << (GEN11_ENGINE_CLASS_SHIFT - 32); 3236 } 3237 3238 /* Finally, take ownership and responsibility for cleanup! */ 3239 engine->sanitize = execlists_sanitize; 3240 engine->release = execlists_release; 3241 3242 return 0; 3243 } 3244 3245 static struct list_head *virtual_queue(struct virtual_engine *ve) 3246 { 3247 return &ve->base.execlists.default_priolist.requests[0]; 3248 } 3249 3250 static void rcu_virtual_context_destroy(struct work_struct *wrk) 3251 { 3252 struct virtual_engine *ve = 3253 container_of(wrk, typeof(*ve), rcu.work); 3254 unsigned int n; 3255 3256 GEM_BUG_ON(ve->context.inflight); 3257 3258 /* Preempt-to-busy may leave a stale request behind. */ 3259 if (unlikely(ve->request)) { 3260 struct i915_request *old; 3261 3262 spin_lock_irq(&ve->base.active.lock); 3263 3264 old = fetch_and_zero(&ve->request); 3265 if (old) { 3266 GEM_BUG_ON(!__i915_request_is_complete(old)); 3267 __i915_request_submit(old); 3268 i915_request_put(old); 3269 } 3270 3271 spin_unlock_irq(&ve->base.active.lock); 3272 } 3273 3274 /* 3275 * Flush the tasklet in case it is still running on another core. 3276 * 3277 * This needs to be done before we remove ourselves from the siblings' 3278 * rbtrees as in the case it is running in parallel, it may reinsert 3279 * the rb_node into a sibling. 3280 */ 3281 tasklet_kill(&ve->base.execlists.tasklet); 3282 3283 /* Decouple ourselves from the siblings, no more access allowed. */ 3284 for (n = 0; n < ve->num_siblings; n++) { 3285 struct intel_engine_cs *sibling = ve->siblings[n]; 3286 struct rb_node *node = &ve->nodes[sibling->id].rb; 3287 3288 if (RB_EMPTY_NODE(node)) 3289 continue; 3290 3291 spin_lock_irq(&sibling->active.lock); 3292 3293 /* Detachment is lazily performed in the execlists tasklet */ 3294 if (!RB_EMPTY_NODE(node)) 3295 rb_erase_cached(node, &sibling->execlists.virtual); 3296 3297 spin_unlock_irq(&sibling->active.lock); 3298 } 3299 GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet)); 3300 GEM_BUG_ON(!list_empty(virtual_queue(ve))); 3301 3302 lrc_fini(&ve->context); 3303 intel_context_fini(&ve->context); 3304 3305 intel_breadcrumbs_free(ve->base.breadcrumbs); 3306 intel_engine_free_request_pool(&ve->base); 3307 3308 kfree(ve->bonds); 3309 kfree(ve); 3310 } 3311 3312 static void virtual_context_destroy(struct kref *kref) 3313 { 3314 struct virtual_engine *ve = 3315 container_of(kref, typeof(*ve), context.ref); 3316 3317 GEM_BUG_ON(!list_empty(&ve->context.signals)); 3318 3319 /* 3320 * When destroying the virtual engine, we have to be aware that 3321 * it may still be in use from an hardirq/softirq context causing 3322 * the resubmission of a completed request (background completion 3323 * due to preempt-to-busy). Before we can free the engine, we need 3324 * to flush the submission code and tasklets that are still potentially 3325 * accessing the engine. Flushing the tasklets requires process context, 3326 * and since we can guard the resubmit onto the engine with an RCU read 3327 * lock, we can delegate the free of the engine to an RCU worker. 3328 */ 3329 INIT_RCU_WORK(&ve->rcu, rcu_virtual_context_destroy); 3330 queue_rcu_work(system_wq, &ve->rcu); 3331 } 3332 3333 static void virtual_engine_initial_hint(struct virtual_engine *ve) 3334 { 3335 int swp; 3336 3337 /* 3338 * Pick a random sibling on starting to help spread the load around. 3339 * 3340 * New contexts are typically created with exactly the same order 3341 * of siblings, and often started in batches. Due to the way we iterate 3342 * the array of sibling when submitting requests, sibling[0] is 3343 * prioritised for dequeuing. If we make sure that sibling[0] is fairly 3344 * randomised across the system, we also help spread the load by the 3345 * first engine we inspect being different each time. 3346 * 3347 * NB This does not force us to execute on this engine, it will just 3348 * typically be the first we inspect for submission. 3349 */ 3350 swp = prandom_u32_max(ve->num_siblings); 3351 if (swp) 3352 swap(ve->siblings[swp], ve->siblings[0]); 3353 } 3354 3355 static int virtual_context_alloc(struct intel_context *ce) 3356 { 3357 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3358 3359 return lrc_alloc(ce, ve->siblings[0]); 3360 } 3361 3362 static int virtual_context_pre_pin(struct intel_context *ce, 3363 struct i915_gem_ww_ctx *ww, 3364 void **vaddr) 3365 { 3366 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3367 3368 /* Note: we must use a real engine class for setting up reg state */ 3369 return lrc_pre_pin(ce, ve->siblings[0], ww, vaddr); 3370 } 3371 3372 static int virtual_context_pin(struct intel_context *ce, void *vaddr) 3373 { 3374 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3375 3376 return lrc_pin(ce, ve->siblings[0], vaddr); 3377 } 3378 3379 static void virtual_context_enter(struct intel_context *ce) 3380 { 3381 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3382 unsigned int n; 3383 3384 for (n = 0; n < ve->num_siblings; n++) 3385 intel_engine_pm_get(ve->siblings[n]); 3386 3387 intel_timeline_enter(ce->timeline); 3388 } 3389 3390 static void virtual_context_exit(struct intel_context *ce) 3391 { 3392 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3393 unsigned int n; 3394 3395 intel_timeline_exit(ce->timeline); 3396 3397 for (n = 0; n < ve->num_siblings; n++) 3398 intel_engine_pm_put(ve->siblings[n]); 3399 } 3400 3401 static const struct intel_context_ops virtual_context_ops = { 3402 .flags = COPS_HAS_INFLIGHT, 3403 3404 .alloc = virtual_context_alloc, 3405 3406 .pre_pin = virtual_context_pre_pin, 3407 .pin = virtual_context_pin, 3408 .unpin = lrc_unpin, 3409 .post_unpin = lrc_post_unpin, 3410 3411 .enter = virtual_context_enter, 3412 .exit = virtual_context_exit, 3413 3414 .destroy = virtual_context_destroy, 3415 }; 3416 3417 static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) 3418 { 3419 struct i915_request *rq; 3420 intel_engine_mask_t mask; 3421 3422 rq = READ_ONCE(ve->request); 3423 if (!rq) 3424 return 0; 3425 3426 /* The rq is ready for submission; rq->execution_mask is now stable. */ 3427 mask = rq->execution_mask; 3428 if (unlikely(!mask)) { 3429 /* Invalid selection, submit to a random engine in error */ 3430 i915_request_set_error_once(rq, -ENODEV); 3431 mask = ve->siblings[0]->mask; 3432 } 3433 3434 ENGINE_TRACE(&ve->base, "rq=%llx:%lld, mask=%x, prio=%d\n", 3435 rq->fence.context, rq->fence.seqno, 3436 mask, ve->base.execlists.queue_priority_hint); 3437 3438 return mask; 3439 } 3440 3441 static void virtual_submission_tasklet(unsigned long data) 3442 { 3443 struct virtual_engine * const ve = (struct virtual_engine *)data; 3444 const int prio = READ_ONCE(ve->base.execlists.queue_priority_hint); 3445 intel_engine_mask_t mask; 3446 unsigned int n; 3447 3448 rcu_read_lock(); 3449 mask = virtual_submission_mask(ve); 3450 rcu_read_unlock(); 3451 if (unlikely(!mask)) 3452 return; 3453 3454 for (n = 0; n < ve->num_siblings; n++) { 3455 struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]); 3456 struct ve_node * const node = &ve->nodes[sibling->id]; 3457 struct rb_node **parent, *rb; 3458 bool first; 3459 3460 if (!READ_ONCE(ve->request)) 3461 break; /* already handled by a sibling's tasklet */ 3462 3463 spin_lock_irq(&sibling->active.lock); 3464 3465 if (unlikely(!(mask & sibling->mask))) { 3466 if (!RB_EMPTY_NODE(&node->rb)) { 3467 rb_erase_cached(&node->rb, 3468 &sibling->execlists.virtual); 3469 RB_CLEAR_NODE(&node->rb); 3470 } 3471 3472 goto unlock_engine; 3473 } 3474 3475 if (unlikely(!RB_EMPTY_NODE(&node->rb))) { 3476 /* 3477 * Cheat and avoid rebalancing the tree if we can 3478 * reuse this node in situ. 3479 */ 3480 first = rb_first_cached(&sibling->execlists.virtual) == 3481 &node->rb; 3482 if (prio == node->prio || (prio > node->prio && first)) 3483 goto submit_engine; 3484 3485 rb_erase_cached(&node->rb, &sibling->execlists.virtual); 3486 } 3487 3488 rb = NULL; 3489 first = true; 3490 parent = &sibling->execlists.virtual.rb_root.rb_node; 3491 while (*parent) { 3492 struct ve_node *other; 3493 3494 rb = *parent; 3495 other = rb_entry(rb, typeof(*other), rb); 3496 if (prio > other->prio) { 3497 parent = &rb->rb_left; 3498 } else { 3499 parent = &rb->rb_right; 3500 first = false; 3501 } 3502 } 3503 3504 rb_link_node(&node->rb, rb, parent); 3505 rb_insert_color_cached(&node->rb, 3506 &sibling->execlists.virtual, 3507 first); 3508 3509 submit_engine: 3510 GEM_BUG_ON(RB_EMPTY_NODE(&node->rb)); 3511 node->prio = prio; 3512 if (first && prio > sibling->execlists.queue_priority_hint) 3513 tasklet_hi_schedule(&sibling->execlists.tasklet); 3514 3515 unlock_engine: 3516 spin_unlock_irq(&sibling->active.lock); 3517 3518 if (intel_context_inflight(&ve->context)) 3519 break; 3520 } 3521 } 3522 3523 static void virtual_submit_request(struct i915_request *rq) 3524 { 3525 struct virtual_engine *ve = to_virtual_engine(rq->engine); 3526 unsigned long flags; 3527 3528 ENGINE_TRACE(&ve->base, "rq=%llx:%lld\n", 3529 rq->fence.context, 3530 rq->fence.seqno); 3531 3532 GEM_BUG_ON(ve->base.submit_request != virtual_submit_request); 3533 3534 spin_lock_irqsave(&ve->base.active.lock, flags); 3535 3536 /* By the time we resubmit a request, it may be completed */ 3537 if (__i915_request_is_complete(rq)) { 3538 __i915_request_submit(rq); 3539 goto unlock; 3540 } 3541 3542 if (ve->request) { /* background completion from preempt-to-busy */ 3543 GEM_BUG_ON(!__i915_request_is_complete(ve->request)); 3544 __i915_request_submit(ve->request); 3545 i915_request_put(ve->request); 3546 } 3547 3548 ve->base.execlists.queue_priority_hint = rq_prio(rq); 3549 ve->request = i915_request_get(rq); 3550 3551 GEM_BUG_ON(!list_empty(virtual_queue(ve))); 3552 list_move_tail(&rq->sched.link, virtual_queue(ve)); 3553 3554 tasklet_hi_schedule(&ve->base.execlists.tasklet); 3555 3556 unlock: 3557 spin_unlock_irqrestore(&ve->base.active.lock, flags); 3558 } 3559 3560 static struct ve_bond * 3561 virtual_find_bond(struct virtual_engine *ve, 3562 const struct intel_engine_cs *master) 3563 { 3564 int i; 3565 3566 for (i = 0; i < ve->num_bonds; i++) { 3567 if (ve->bonds[i].master == master) 3568 return &ve->bonds[i]; 3569 } 3570 3571 return NULL; 3572 } 3573 3574 static void 3575 virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) 3576 { 3577 struct virtual_engine *ve = to_virtual_engine(rq->engine); 3578 intel_engine_mask_t allowed, exec; 3579 struct ve_bond *bond; 3580 3581 allowed = ~to_request(signal)->engine->mask; 3582 3583 bond = virtual_find_bond(ve, to_request(signal)->engine); 3584 if (bond) 3585 allowed &= bond->sibling_mask; 3586 3587 /* Restrict the bonded request to run on only the available engines */ 3588 exec = READ_ONCE(rq->execution_mask); 3589 while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed)) 3590 ; 3591 3592 /* Prevent the master from being re-run on the bonded engines */ 3593 to_request(signal)->execution_mask &= ~allowed; 3594 } 3595 3596 struct intel_context * 3597 intel_execlists_create_virtual(struct intel_engine_cs **siblings, 3598 unsigned int count) 3599 { 3600 struct virtual_engine *ve; 3601 unsigned int n; 3602 int err; 3603 3604 if (count == 0) 3605 return ERR_PTR(-EINVAL); 3606 3607 if (count == 1) 3608 return intel_context_create(siblings[0]); 3609 3610 ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL); 3611 if (!ve) 3612 return ERR_PTR(-ENOMEM); 3613 3614 ve->base.i915 = siblings[0]->i915; 3615 ve->base.gt = siblings[0]->gt; 3616 ve->base.uncore = siblings[0]->uncore; 3617 ve->base.id = -1; 3618 3619 ve->base.class = OTHER_CLASS; 3620 ve->base.uabi_class = I915_ENGINE_CLASS_INVALID; 3621 ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; 3622 ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; 3623 3624 /* 3625 * The decision on whether to submit a request using semaphores 3626 * depends on the saturated state of the engine. We only compute 3627 * this during HW submission of the request, and we need for this 3628 * state to be globally applied to all requests being submitted 3629 * to this engine. Virtual engines encompass more than one physical 3630 * engine and so we cannot accurately tell in advance if one of those 3631 * engines is already saturated and so cannot afford to use a semaphore 3632 * and be pessimized in priority for doing so -- if we are the only 3633 * context using semaphores after all other clients have stopped, we 3634 * will be starved on the saturated system. Such a global switch for 3635 * semaphores is less than ideal, but alas is the current compromise. 3636 */ 3637 ve->base.saturated = ALL_ENGINES; 3638 3639 snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); 3640 3641 intel_engine_init_active(&ve->base, ENGINE_VIRTUAL); 3642 intel_engine_init_execlists(&ve->base); 3643 3644 ve->base.cops = &virtual_context_ops; 3645 ve->base.request_alloc = execlists_request_alloc; 3646 3647 ve->base.schedule = i915_schedule; 3648 ve->base.submit_request = virtual_submit_request; 3649 ve->base.bond_execute = virtual_bond_execute; 3650 3651 INIT_LIST_HEAD(virtual_queue(ve)); 3652 ve->base.execlists.queue_priority_hint = INT_MIN; 3653 tasklet_init(&ve->base.execlists.tasklet, 3654 virtual_submission_tasklet, 3655 (unsigned long)ve); 3656 3657 intel_context_init(&ve->context, &ve->base); 3658 3659 ve->base.breadcrumbs = intel_breadcrumbs_create(NULL); 3660 if (!ve->base.breadcrumbs) { 3661 err = -ENOMEM; 3662 goto err_put; 3663 } 3664 3665 for (n = 0; n < count; n++) { 3666 struct intel_engine_cs *sibling = siblings[n]; 3667 3668 GEM_BUG_ON(!is_power_of_2(sibling->mask)); 3669 if (sibling->mask & ve->base.mask) { 3670 DRM_DEBUG("duplicate %s entry in load balancer\n", 3671 sibling->name); 3672 err = -EINVAL; 3673 goto err_put; 3674 } 3675 3676 /* 3677 * The virtual engine implementation is tightly coupled to 3678 * the execlists backend -- we push out request directly 3679 * into a tree inside each physical engine. We could support 3680 * layering if we handle cloning of the requests and 3681 * submitting a copy into each backend. 3682 */ 3683 if (sibling->execlists.tasklet.func != 3684 execlists_submission_tasklet) { 3685 err = -ENODEV; 3686 goto err_put; 3687 } 3688 3689 GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb)); 3690 RB_CLEAR_NODE(&ve->nodes[sibling->id].rb); 3691 3692 ve->siblings[ve->num_siblings++] = sibling; 3693 ve->base.mask |= sibling->mask; 3694 3695 /* 3696 * All physical engines must be compatible for their emission 3697 * functions (as we build the instructions during request 3698 * construction and do not alter them before submission 3699 * on the physical engine). We use the engine class as a guide 3700 * here, although that could be refined. 3701 */ 3702 if (ve->base.class != OTHER_CLASS) { 3703 if (ve->base.class != sibling->class) { 3704 DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n", 3705 sibling->class, ve->base.class); 3706 err = -EINVAL; 3707 goto err_put; 3708 } 3709 continue; 3710 } 3711 3712 ve->base.class = sibling->class; 3713 ve->base.uabi_class = sibling->uabi_class; 3714 snprintf(ve->base.name, sizeof(ve->base.name), 3715 "v%dx%d", ve->base.class, count); 3716 ve->base.context_size = sibling->context_size; 3717 3718 ve->base.emit_bb_start = sibling->emit_bb_start; 3719 ve->base.emit_flush = sibling->emit_flush; 3720 ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb; 3721 ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb; 3722 ve->base.emit_fini_breadcrumb_dw = 3723 sibling->emit_fini_breadcrumb_dw; 3724 3725 ve->base.flags = sibling->flags; 3726 } 3727 3728 ve->base.flags |= I915_ENGINE_IS_VIRTUAL; 3729 3730 virtual_engine_initial_hint(ve); 3731 return &ve->context; 3732 3733 err_put: 3734 intel_context_put(&ve->context); 3735 return ERR_PTR(err); 3736 } 3737 3738 struct intel_context * 3739 intel_execlists_clone_virtual(struct intel_engine_cs *src) 3740 { 3741 struct virtual_engine *se = to_virtual_engine(src); 3742 struct intel_context *dst; 3743 3744 dst = intel_execlists_create_virtual(se->siblings, 3745 se->num_siblings); 3746 if (IS_ERR(dst)) 3747 return dst; 3748 3749 if (se->num_bonds) { 3750 struct virtual_engine *de = to_virtual_engine(dst->engine); 3751 3752 de->bonds = kmemdup(se->bonds, 3753 sizeof(*se->bonds) * se->num_bonds, 3754 GFP_KERNEL); 3755 if (!de->bonds) { 3756 intel_context_put(dst); 3757 return ERR_PTR(-ENOMEM); 3758 } 3759 3760 de->num_bonds = se->num_bonds; 3761 } 3762 3763 return dst; 3764 } 3765 3766 int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine, 3767 const struct intel_engine_cs *master, 3768 const struct intel_engine_cs *sibling) 3769 { 3770 struct virtual_engine *ve = to_virtual_engine(engine); 3771 struct ve_bond *bond; 3772 int n; 3773 3774 /* Sanity check the sibling is part of the virtual engine */ 3775 for (n = 0; n < ve->num_siblings; n++) 3776 if (sibling == ve->siblings[n]) 3777 break; 3778 if (n == ve->num_siblings) 3779 return -EINVAL; 3780 3781 bond = virtual_find_bond(ve, master); 3782 if (bond) { 3783 bond->sibling_mask |= sibling->mask; 3784 return 0; 3785 } 3786 3787 bond = krealloc(ve->bonds, 3788 sizeof(*bond) * (ve->num_bonds + 1), 3789 GFP_KERNEL); 3790 if (!bond) 3791 return -ENOMEM; 3792 3793 bond[ve->num_bonds].master = master; 3794 bond[ve->num_bonds].sibling_mask = sibling->mask; 3795 3796 ve->bonds = bond; 3797 ve->num_bonds++; 3798 3799 return 0; 3800 } 3801 3802 void intel_execlists_show_requests(struct intel_engine_cs *engine, 3803 struct drm_printer *m, 3804 void (*show_request)(struct drm_printer *m, 3805 const struct i915_request *rq, 3806 const char *prefix, 3807 int indent), 3808 unsigned int max) 3809 { 3810 const struct intel_engine_execlists *execlists = &engine->execlists; 3811 struct i915_request *rq, *last; 3812 unsigned long flags; 3813 unsigned int count; 3814 struct rb_node *rb; 3815 3816 spin_lock_irqsave(&engine->active.lock, flags); 3817 3818 last = NULL; 3819 count = 0; 3820 list_for_each_entry(rq, &engine->active.requests, sched.link) { 3821 if (count++ < max - 1) 3822 show_request(m, rq, "\t\t", 0); 3823 else 3824 last = rq; 3825 } 3826 if (last) { 3827 if (count > max) { 3828 drm_printf(m, 3829 "\t\t...skipping %d executing requests...\n", 3830 count - max); 3831 } 3832 show_request(m, last, "\t\t", 0); 3833 } 3834 3835 if (execlists->queue_priority_hint != INT_MIN) 3836 drm_printf(m, "\t\tQueue priority hint: %d\n", 3837 READ_ONCE(execlists->queue_priority_hint)); 3838 3839 last = NULL; 3840 count = 0; 3841 for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) { 3842 struct i915_priolist *p = rb_entry(rb, typeof(*p), node); 3843 int i; 3844 3845 priolist_for_each_request(rq, p, i) { 3846 if (count++ < max - 1) 3847 show_request(m, rq, "\t\t", 0); 3848 else 3849 last = rq; 3850 } 3851 } 3852 if (last) { 3853 if (count > max) { 3854 drm_printf(m, 3855 "\t\t...skipping %d queued requests...\n", 3856 count - max); 3857 } 3858 show_request(m, last, "\t\t", 0); 3859 } 3860 3861 last = NULL; 3862 count = 0; 3863 for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) { 3864 struct virtual_engine *ve = 3865 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 3866 struct i915_request *rq = READ_ONCE(ve->request); 3867 3868 if (rq) { 3869 if (count++ < max - 1) 3870 show_request(m, rq, "\t\t", 0); 3871 else 3872 last = rq; 3873 } 3874 } 3875 if (last) { 3876 if (count > max) { 3877 drm_printf(m, 3878 "\t\t...skipping %d virtual requests...\n", 3879 count - max); 3880 } 3881 show_request(m, last, "\t\t", 0); 3882 } 3883 3884 spin_unlock_irqrestore(&engine->active.lock, flags); 3885 } 3886 3887 bool 3888 intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine) 3889 { 3890 return engine->set_default_submission == 3891 execlists_set_default_submission; 3892 } 3893 3894 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) 3895 #include "selftest_execlists.c" 3896 #endif 3897