1 // SPDX-License-Identifier: MIT 2 /* 3 * Copyright © 2014 Intel Corporation 4 */ 5 6 /** 7 * DOC: Logical Rings, Logical Ring Contexts and Execlists 8 * 9 * Motivation: 10 * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts". 11 * These expanded contexts enable a number of new abilities, especially 12 * "Execlists" (also implemented in this file). 13 * 14 * One of the main differences with the legacy HW contexts is that logical 15 * ring contexts incorporate many more things to the context's state, like 16 * PDPs or ringbuffer control registers: 17 * 18 * The reason why PDPs are included in the context is straightforward: as 19 * PPGTTs (per-process GTTs) are actually per-context, having the PDPs 20 * contained there mean you don't need to do a ppgtt->switch_mm yourself, 21 * instead, the GPU will do it for you on the context switch. 22 * 23 * But, what about the ringbuffer control registers (head, tail, etc..)? 24 * shouldn't we just need a set of those per engine command streamer? This is 25 * where the name "Logical Rings" starts to make sense: by virtualizing the 26 * rings, the engine cs shifts to a new "ring buffer" with every context 27 * switch. When you want to submit a workload to the GPU you: A) choose your 28 * context, B) find its appropriate virtualized ring, C) write commands to it 29 * and then, finally, D) tell the GPU to switch to that context. 30 * 31 * Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch 32 * to a contexts is via a context execution list, ergo "Execlists". 33 * 34 * LRC implementation: 35 * Regarding the creation of contexts, we have: 36 * 37 * - One global default context. 38 * - One local default context for each opened fd. 39 * - One local extra context for each context create ioctl call. 40 * 41 * Now that ringbuffers belong per-context (and not per-engine, like before) 42 * and that contexts are uniquely tied to a given engine (and not reusable, 43 * like before) we need: 44 * 45 * - One ringbuffer per-engine inside each context. 46 * - One backing object per-engine inside each context. 47 * 48 * The global default context starts its life with these new objects fully 49 * allocated and populated. The local default context for each opened fd is 50 * more complex, because we don't know at creation time which engine is going 51 * to use them. To handle this, we have implemented a deferred creation of LR 52 * contexts: 53 * 54 * The local context starts its life as a hollow or blank holder, that only 55 * gets populated for a given engine once we receive an execbuffer. If later 56 * on we receive another execbuffer ioctl for the same context but a different 57 * engine, we allocate/populate a new ringbuffer and context backing object and 58 * so on. 59 * 60 * Finally, regarding local contexts created using the ioctl call: as they are 61 * only allowed with the render ring, we can allocate & populate them right 62 * away (no need to defer anything, at least for now). 63 * 64 * Execlists implementation: 65 * Execlists are the new method by which, on gen8+ hardware, workloads are 66 * submitted for execution (as opposed to the legacy, ringbuffer-based, method). 67 * This method works as follows: 68 * 69 * When a request is committed, its commands (the BB start and any leading or 70 * trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer 71 * for the appropriate context. The tail pointer in the hardware context is not 72 * updated at this time, but instead, kept by the driver in the ringbuffer 73 * structure. A structure representing this request is added to a request queue 74 * for the appropriate engine: this structure contains a copy of the context's 75 * tail after the request was written to the ring buffer and a pointer to the 76 * context itself. 77 * 78 * If the engine's request queue was empty before the request was added, the 79 * queue is processed immediately. Otherwise the queue will be processed during 80 * a context switch interrupt. In any case, elements on the queue will get sent 81 * (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a 82 * globally unique 20-bits submission ID. 83 * 84 * When execution of a request completes, the GPU updates the context status 85 * buffer with a context complete event and generates a context switch interrupt. 86 * During the interrupt handling, the driver examines the events in the buffer: 87 * for each context complete event, if the announced ID matches that on the head 88 * of the request queue, then that request is retired and removed from the queue. 89 * 90 * After processing, if any requests were retired and the queue is not empty 91 * then a new execution list can be submitted. The two requests at the front of 92 * the queue are next to be submitted but since a context may not occur twice in 93 * an execution list, if subsequent requests have the same ID as the first then 94 * the two requests must be combined. This is done simply by discarding requests 95 * at the head of the queue until either only one requests is left (in which case 96 * we use a NULL second context) or the first two requests have unique IDs. 97 * 98 * By always executing the first two requests in the queue the driver ensures 99 * that the GPU is kept as busy as possible. In the case where a single context 100 * completes but a second context is still executing, the request for this second 101 * context will be at the head of the queue when we remove the first one. This 102 * request will then be resubmitted along with a new request for a different context, 103 * which will cause the hardware to continue executing the second request and queue 104 * the new request (the GPU detects the condition of a context getting preempted 105 * with the same context and optimizes the context switch flow by not doing 106 * preemption, but just sampling the new tail pointer). 107 * 108 */ 109 #include <linux/interrupt.h> 110 111 #include "i915_drv.h" 112 #include "i915_trace.h" 113 #include "i915_vgpu.h" 114 #include "gen8_engine_cs.h" 115 #include "intel_breadcrumbs.h" 116 #include "intel_context.h" 117 #include "intel_engine_pm.h" 118 #include "intel_engine_stats.h" 119 #include "intel_execlists_submission.h" 120 #include "intel_gt.h" 121 #include "intel_gt_pm.h" 122 #include "intel_gt_requests.h" 123 #include "intel_lrc.h" 124 #include "intel_lrc_reg.h" 125 #include "intel_mocs.h" 126 #include "intel_reset.h" 127 #include "intel_ring.h" 128 #include "intel_workarounds.h" 129 #include "shmem_utils.h" 130 131 #define RING_EXECLIST_QFULL (1 << 0x2) 132 #define RING_EXECLIST1_VALID (1 << 0x3) 133 #define RING_EXECLIST0_VALID (1 << 0x4) 134 #define RING_EXECLIST_ACTIVE_STATUS (3 << 0xE) 135 #define RING_EXECLIST1_ACTIVE (1 << 0x11) 136 #define RING_EXECLIST0_ACTIVE (1 << 0x12) 137 138 #define GEN8_CTX_STATUS_IDLE_ACTIVE (1 << 0) 139 #define GEN8_CTX_STATUS_PREEMPTED (1 << 1) 140 #define GEN8_CTX_STATUS_ELEMENT_SWITCH (1 << 2) 141 #define GEN8_CTX_STATUS_ACTIVE_IDLE (1 << 3) 142 #define GEN8_CTX_STATUS_COMPLETE (1 << 4) 143 #define GEN8_CTX_STATUS_LITE_RESTORE (1 << 15) 144 145 #define GEN8_CTX_STATUS_COMPLETED_MASK \ 146 (GEN8_CTX_STATUS_COMPLETE | GEN8_CTX_STATUS_PREEMPTED) 147 148 #define GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE (0x1) /* lower csb dword */ 149 #define GEN12_CTX_SWITCH_DETAIL(csb_dw) ((csb_dw) & 0xF) /* upper csb dword */ 150 #define GEN12_CSB_SW_CTX_ID_MASK GENMASK(25, 15) 151 #define GEN12_IDLE_CTX_ID 0x7FF 152 #define GEN12_CSB_CTX_VALID(csb_dw) \ 153 (FIELD_GET(GEN12_CSB_SW_CTX_ID_MASK, csb_dw) != GEN12_IDLE_CTX_ID) 154 155 /* Typical size of the average request (2 pipecontrols and a MI_BB) */ 156 #define EXECLISTS_REQUEST_SIZE 64 /* bytes */ 157 158 struct virtual_engine { 159 struct intel_engine_cs base; 160 struct intel_context context; 161 struct rcu_work rcu; 162 163 /* 164 * We allow only a single request through the virtual engine at a time 165 * (each request in the timeline waits for the completion fence of 166 * the previous before being submitted). By restricting ourselves to 167 * only submitting a single request, each request is placed on to a 168 * physical to maximise load spreading (by virtue of the late greedy 169 * scheduling -- each real engine takes the next available request 170 * upon idling). 171 */ 172 struct i915_request *request; 173 174 /* 175 * We keep a rbtree of available virtual engines inside each physical 176 * engine, sorted by priority. Here we preallocate the nodes we need 177 * for the virtual engine, indexed by physical_engine->id. 178 */ 179 struct ve_node { 180 struct rb_node rb; 181 int prio; 182 } nodes[I915_NUM_ENGINES]; 183 184 /* 185 * Keep track of bonded pairs -- restrictions upon on our selection 186 * of physical engines any particular request may be submitted to. 187 * If we receive a submit-fence from a master engine, we will only 188 * use one of sibling_mask physical engines. 189 */ 190 struct ve_bond { 191 const struct intel_engine_cs *master; 192 intel_engine_mask_t sibling_mask; 193 } *bonds; 194 unsigned int num_bonds; 195 196 /* And finally, which physical engines this virtual engine maps onto. */ 197 unsigned int num_siblings; 198 struct intel_engine_cs *siblings[]; 199 }; 200 201 static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine) 202 { 203 GEM_BUG_ON(!intel_engine_is_virtual(engine)); 204 return container_of(engine, struct virtual_engine, base); 205 } 206 207 static struct i915_request * 208 __active_request(const struct intel_timeline * const tl, 209 struct i915_request *rq, 210 int error) 211 { 212 struct i915_request *active = rq; 213 214 list_for_each_entry_from_reverse(rq, &tl->requests, link) { 215 if (__i915_request_is_complete(rq)) 216 break; 217 218 if (error) { 219 i915_request_set_error_once(rq, error); 220 __i915_request_skip(rq); 221 } 222 active = rq; 223 } 224 225 return active; 226 } 227 228 static struct i915_request * 229 active_request(const struct intel_timeline * const tl, struct i915_request *rq) 230 { 231 return __active_request(tl, rq, 0); 232 } 233 234 static void ring_set_paused(const struct intel_engine_cs *engine, int state) 235 { 236 /* 237 * We inspect HWS_PREEMPT with a semaphore inside 238 * engine->emit_fini_breadcrumb. If the dword is true, 239 * the ring is paused as the semaphore will busywait 240 * until the dword is false. 241 */ 242 engine->status_page.addr[I915_GEM_HWS_PREEMPT] = state; 243 if (state) 244 wmb(); 245 } 246 247 static struct i915_priolist *to_priolist(struct rb_node *rb) 248 { 249 return rb_entry(rb, struct i915_priolist, node); 250 } 251 252 static int rq_prio(const struct i915_request *rq) 253 { 254 return READ_ONCE(rq->sched.attr.priority); 255 } 256 257 static int effective_prio(const struct i915_request *rq) 258 { 259 int prio = rq_prio(rq); 260 261 /* 262 * If this request is special and must not be interrupted at any 263 * cost, so be it. Note we are only checking the most recent request 264 * in the context and so may be masking an earlier vip request. It 265 * is hoped that under the conditions where nopreempt is used, this 266 * will not matter (i.e. all requests to that context will be 267 * nopreempt for as long as desired). 268 */ 269 if (i915_request_has_nopreempt(rq)) 270 prio = I915_PRIORITY_UNPREEMPTABLE; 271 272 return prio; 273 } 274 275 static int queue_prio(const struct intel_engine_execlists *execlists) 276 { 277 struct rb_node *rb; 278 279 rb = rb_first_cached(&execlists->queue); 280 if (!rb) 281 return INT_MIN; 282 283 return to_priolist(rb)->priority; 284 } 285 286 static int virtual_prio(const struct intel_engine_execlists *el) 287 { 288 struct rb_node *rb = rb_first_cached(&el->virtual); 289 290 return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN; 291 } 292 293 static bool need_preempt(const struct intel_engine_cs *engine, 294 const struct i915_request *rq) 295 { 296 int last_prio; 297 298 if (!intel_engine_has_semaphores(engine)) 299 return false; 300 301 /* 302 * Check if the current priority hint merits a preemption attempt. 303 * 304 * We record the highest value priority we saw during rescheduling 305 * prior to this dequeue, therefore we know that if it is strictly 306 * less than the current tail of ESLP[0], we do not need to force 307 * a preempt-to-idle cycle. 308 * 309 * However, the priority hint is a mere hint that we may need to 310 * preempt. If that hint is stale or we may be trying to preempt 311 * ourselves, ignore the request. 312 * 313 * More naturally we would write 314 * prio >= max(0, last); 315 * except that we wish to prevent triggering preemption at the same 316 * priority level: the task that is running should remain running 317 * to preserve FIFO ordering of dependencies. 318 */ 319 last_prio = max(effective_prio(rq), I915_PRIORITY_NORMAL - 1); 320 if (engine->execlists.queue_priority_hint <= last_prio) 321 return false; 322 323 /* 324 * Check against the first request in ELSP[1], it will, thanks to the 325 * power of PI, be the highest priority of that context. 326 */ 327 if (!list_is_last(&rq->sched.link, &engine->active.requests) && 328 rq_prio(list_next_entry(rq, sched.link)) > last_prio) 329 return true; 330 331 /* 332 * If the inflight context did not trigger the preemption, then maybe 333 * it was the set of queued requests? Pick the highest priority in 334 * the queue (the first active priolist) and see if it deserves to be 335 * running instead of ELSP[0]. 336 * 337 * The highest priority request in the queue can not be either 338 * ELSP[0] or ELSP[1] as, thanks again to PI, if it was the same 339 * context, it's priority would not exceed ELSP[0] aka last_prio. 340 */ 341 return max(virtual_prio(&engine->execlists), 342 queue_prio(&engine->execlists)) > last_prio; 343 } 344 345 __maybe_unused static bool 346 assert_priority_queue(const struct i915_request *prev, 347 const struct i915_request *next) 348 { 349 /* 350 * Without preemption, the prev may refer to the still active element 351 * which we refuse to let go. 352 * 353 * Even with preemption, there are times when we think it is better not 354 * to preempt and leave an ostensibly lower priority request in flight. 355 */ 356 if (i915_request_is_active(prev)) 357 return true; 358 359 return rq_prio(prev) >= rq_prio(next); 360 } 361 362 static struct i915_request * 363 __unwind_incomplete_requests(struct intel_engine_cs *engine) 364 { 365 struct i915_request *rq, *rn, *active = NULL; 366 struct list_head *pl; 367 int prio = I915_PRIORITY_INVALID; 368 369 lockdep_assert_held(&engine->active.lock); 370 371 list_for_each_entry_safe_reverse(rq, rn, 372 &engine->active.requests, 373 sched.link) { 374 if (__i915_request_is_complete(rq)) { 375 list_del_init(&rq->sched.link); 376 continue; 377 } 378 379 __i915_request_unsubmit(rq); 380 381 GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); 382 if (rq_prio(rq) != prio) { 383 prio = rq_prio(rq); 384 pl = i915_sched_lookup_priolist(engine, prio); 385 } 386 GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); 387 388 list_move(&rq->sched.link, pl); 389 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 390 391 /* Check in case we rollback so far we wrap [size/2] */ 392 if (intel_ring_direction(rq->ring, 393 rq->tail, 394 rq->ring->tail + 8) > 0) 395 rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE; 396 397 active = rq; 398 } 399 400 return active; 401 } 402 403 struct i915_request * 404 execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists) 405 { 406 struct intel_engine_cs *engine = 407 container_of(execlists, typeof(*engine), execlists); 408 409 return __unwind_incomplete_requests(engine); 410 } 411 412 static void 413 execlists_context_status_change(struct i915_request *rq, unsigned long status) 414 { 415 /* 416 * Only used when GVT-g is enabled now. When GVT-g is disabled, 417 * The compiler should eliminate this function as dead-code. 418 */ 419 if (!IS_ENABLED(CONFIG_DRM_I915_GVT)) 420 return; 421 422 atomic_notifier_call_chain(&rq->engine->context_status_notifier, 423 status, rq); 424 } 425 426 static void reset_active(struct i915_request *rq, 427 struct intel_engine_cs *engine) 428 { 429 struct intel_context * const ce = rq->context; 430 u32 head; 431 432 /* 433 * The executing context has been cancelled. We want to prevent 434 * further execution along this context and propagate the error on 435 * to anything depending on its results. 436 * 437 * In __i915_request_submit(), we apply the -EIO and remove the 438 * requests' payloads for any banned requests. But first, we must 439 * rewind the context back to the start of the incomplete request so 440 * that we do not jump back into the middle of the batch. 441 * 442 * We preserve the breadcrumbs and semaphores of the incomplete 443 * requests so that inter-timeline dependencies (i.e other timelines) 444 * remain correctly ordered. And we defer to __i915_request_submit() 445 * so that all asynchronous waits are correctly handled. 446 */ 447 ENGINE_TRACE(engine, "{ reset rq=%llx:%lld }\n", 448 rq->fence.context, rq->fence.seqno); 449 450 /* On resubmission of the active request, payload will be scrubbed */ 451 if (__i915_request_is_complete(rq)) 452 head = rq->tail; 453 else 454 head = __active_request(ce->timeline, rq, -EIO)->head; 455 head = intel_ring_wrap(ce->ring, head); 456 457 /* Scrub the context image to prevent replaying the previous batch */ 458 lrc_init_regs(ce, engine, true); 459 460 /* We've switched away, so this should be a no-op, but intent matters */ 461 ce->lrc.lrca = lrc_update_regs(ce, engine, head); 462 } 463 464 static bool bad_request(const struct i915_request *rq) 465 { 466 return rq->fence.error && i915_request_started(rq); 467 } 468 469 static struct intel_engine_cs * 470 __execlists_schedule_in(struct i915_request *rq) 471 { 472 struct intel_engine_cs * const engine = rq->engine; 473 struct intel_context * const ce = rq->context; 474 475 intel_context_get(ce); 476 477 if (unlikely(intel_context_is_closed(ce) && 478 !intel_engine_has_heartbeat(engine))) 479 intel_context_set_banned(ce); 480 481 if (unlikely(intel_context_is_banned(ce) || bad_request(rq))) 482 reset_active(rq, engine); 483 484 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 485 lrc_check_regs(ce, engine, "before"); 486 487 if (ce->tag) { 488 /* Use a fixed tag for OA and friends */ 489 GEM_BUG_ON(ce->tag <= BITS_PER_LONG); 490 ce->lrc.ccid = ce->tag; 491 } else { 492 /* We don't need a strict matching tag, just different values */ 493 unsigned int tag = __ffs(engine->context_tag); 494 495 GEM_BUG_ON(tag >= BITS_PER_LONG); 496 __clear_bit(tag, &engine->context_tag); 497 ce->lrc.ccid = (1 + tag) << (GEN11_SW_CTX_ID_SHIFT - 32); 498 499 BUILD_BUG_ON(BITS_PER_LONG > GEN12_MAX_CONTEXT_HW_ID); 500 } 501 502 ce->lrc.ccid |= engine->execlists.ccid; 503 504 __intel_gt_pm_get(engine->gt); 505 if (engine->fw_domain && !engine->fw_active++) 506 intel_uncore_forcewake_get(engine->uncore, engine->fw_domain); 507 execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN); 508 intel_engine_context_in(engine); 509 510 CE_TRACE(ce, "schedule-in, ccid:%x\n", ce->lrc.ccid); 511 512 return engine; 513 } 514 515 static void execlists_schedule_in(struct i915_request *rq, int idx) 516 { 517 struct intel_context * const ce = rq->context; 518 struct intel_engine_cs *old; 519 520 GEM_BUG_ON(!intel_engine_pm_is_awake(rq->engine)); 521 trace_i915_request_in(rq, idx); 522 523 old = ce->inflight; 524 if (!old) 525 old = __execlists_schedule_in(rq); 526 WRITE_ONCE(ce->inflight, ptr_inc(old)); 527 528 GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); 529 } 530 531 static void 532 resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve) 533 { 534 struct intel_engine_cs *engine = rq->engine; 535 536 spin_lock_irq(&engine->active.lock); 537 538 clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 539 WRITE_ONCE(rq->engine, &ve->base); 540 ve->base.submit_request(rq); 541 542 spin_unlock_irq(&engine->active.lock); 543 } 544 545 static void kick_siblings(struct i915_request *rq, struct intel_context *ce) 546 { 547 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 548 struct intel_engine_cs *engine = rq->engine; 549 550 /* 551 * After this point, the rq may be transferred to a new sibling, so 552 * before we clear ce->inflight make sure that the context has been 553 * removed from the b->signalers and furthermore we need to make sure 554 * that the concurrent iterator in signal_irq_work is no longer 555 * following ce->signal_link. 556 */ 557 if (!list_empty(&ce->signals)) 558 intel_context_remove_breadcrumbs(ce, engine->breadcrumbs); 559 560 /* 561 * This engine is now too busy to run this virtual request, so 562 * see if we can find an alternative engine for it to execute on. 563 * Once a request has become bonded to this engine, we treat it the 564 * same as other native request. 565 */ 566 if (i915_request_in_priority_queue(rq) && 567 rq->execution_mask != engine->mask) 568 resubmit_virtual_request(rq, ve); 569 570 if (READ_ONCE(ve->request)) 571 tasklet_hi_schedule(&ve->base.execlists.tasklet); 572 } 573 574 static void __execlists_schedule_out(struct i915_request * const rq, 575 struct intel_context * const ce) 576 { 577 struct intel_engine_cs * const engine = rq->engine; 578 unsigned int ccid; 579 580 /* 581 * NB process_csb() is not under the engine->active.lock and hence 582 * schedule_out can race with schedule_in meaning that we should 583 * refrain from doing non-trivial work here. 584 */ 585 586 CE_TRACE(ce, "schedule-out, ccid:%x\n", ce->lrc.ccid); 587 GEM_BUG_ON(ce->inflight != engine); 588 589 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 590 lrc_check_regs(ce, engine, "after"); 591 592 /* 593 * If we have just completed this context, the engine may now be 594 * idle and we want to re-enter powersaving. 595 */ 596 if (intel_timeline_is_last(ce->timeline, rq) && 597 __i915_request_is_complete(rq)) 598 intel_engine_add_retire(engine, ce->timeline); 599 600 ccid = ce->lrc.ccid; 601 ccid >>= GEN11_SW_CTX_ID_SHIFT - 32; 602 ccid &= GEN12_MAX_CONTEXT_HW_ID; 603 if (ccid < BITS_PER_LONG) { 604 GEM_BUG_ON(ccid == 0); 605 GEM_BUG_ON(test_bit(ccid - 1, &engine->context_tag)); 606 __set_bit(ccid - 1, &engine->context_tag); 607 } 608 609 lrc_update_runtime(ce); 610 intel_engine_context_out(engine); 611 execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT); 612 if (engine->fw_domain && !--engine->fw_active) 613 intel_uncore_forcewake_put(engine->uncore, engine->fw_domain); 614 intel_gt_pm_put_async(engine->gt); 615 616 /* 617 * If this is part of a virtual engine, its next request may 618 * have been blocked waiting for access to the active context. 619 * We have to kick all the siblings again in case we need to 620 * switch (e.g. the next request is not runnable on this 621 * engine). Hopefully, we will already have submitted the next 622 * request before the tasklet runs and do not need to rebuild 623 * each virtual tree and kick everyone again. 624 */ 625 if (ce->engine != engine) 626 kick_siblings(rq, ce); 627 628 WRITE_ONCE(ce->inflight, NULL); 629 intel_context_put(ce); 630 } 631 632 static inline void execlists_schedule_out(struct i915_request *rq) 633 { 634 struct intel_context * const ce = rq->context; 635 636 trace_i915_request_out(rq); 637 638 GEM_BUG_ON(!ce->inflight); 639 ce->inflight = ptr_dec(ce->inflight); 640 if (!__intel_context_inflight_count(ce->inflight)) 641 __execlists_schedule_out(rq, ce); 642 643 i915_request_put(rq); 644 } 645 646 static u64 execlists_update_context(struct i915_request *rq) 647 { 648 struct intel_context *ce = rq->context; 649 u64 desc = ce->lrc.desc; 650 u32 tail, prev; 651 652 /* 653 * WaIdleLiteRestore:bdw,skl 654 * 655 * We should never submit the context with the same RING_TAIL twice 656 * just in case we submit an empty ring, which confuses the HW. 657 * 658 * We append a couple of NOOPs (gen8_emit_wa_tail) after the end of 659 * the normal request to be able to always advance the RING_TAIL on 660 * subsequent resubmissions (for lite restore). Should that fail us, 661 * and we try and submit the same tail again, force the context 662 * reload. 663 * 664 * If we need to return to a preempted context, we need to skip the 665 * lite-restore and force it to reload the RING_TAIL. Otherwise, the 666 * HW has a tendency to ignore us rewinding the TAIL to the end of 667 * an earlier request. 668 */ 669 GEM_BUG_ON(ce->lrc_reg_state[CTX_RING_TAIL] != rq->ring->tail); 670 prev = rq->ring->tail; 671 tail = intel_ring_set_tail(rq->ring, rq->tail); 672 if (unlikely(intel_ring_direction(rq->ring, tail, prev) <= 0)) 673 desc |= CTX_DESC_FORCE_RESTORE; 674 ce->lrc_reg_state[CTX_RING_TAIL] = tail; 675 rq->tail = rq->wa_tail; 676 677 /* 678 * Make sure the context image is complete before we submit it to HW. 679 * 680 * Ostensibly, writes (including the WCB) should be flushed prior to 681 * an uncached write such as our mmio register access, the empirical 682 * evidence (esp. on Braswell) suggests that the WC write into memory 683 * may not be visible to the HW prior to the completion of the UC 684 * register write and that we may begin execution from the context 685 * before its image is complete leading to invalid PD chasing. 686 */ 687 wmb(); 688 689 ce->lrc.desc &= ~CTX_DESC_FORCE_RESTORE; 690 return desc; 691 } 692 693 static void write_desc(struct intel_engine_execlists *execlists, u64 desc, u32 port) 694 { 695 if (execlists->ctrl_reg) { 696 writel(lower_32_bits(desc), execlists->submit_reg + port * 2); 697 writel(upper_32_bits(desc), execlists->submit_reg + port * 2 + 1); 698 } else { 699 writel(upper_32_bits(desc), execlists->submit_reg); 700 writel(lower_32_bits(desc), execlists->submit_reg); 701 } 702 } 703 704 static __maybe_unused char * 705 dump_port(char *buf, int buflen, const char *prefix, struct i915_request *rq) 706 { 707 if (!rq) 708 return ""; 709 710 snprintf(buf, buflen, "%sccid:%x %llx:%lld%s prio %d", 711 prefix, 712 rq->context->lrc.ccid, 713 rq->fence.context, rq->fence.seqno, 714 __i915_request_is_complete(rq) ? "!" : 715 __i915_request_has_started(rq) ? "*" : 716 "", 717 rq_prio(rq)); 718 719 return buf; 720 } 721 722 static __maybe_unused noinline void 723 trace_ports(const struct intel_engine_execlists *execlists, 724 const char *msg, 725 struct i915_request * const *ports) 726 { 727 const struct intel_engine_cs *engine = 728 container_of(execlists, typeof(*engine), execlists); 729 char __maybe_unused p0[40], p1[40]; 730 731 if (!ports[0]) 732 return; 733 734 ENGINE_TRACE(engine, "%s { %s%s }\n", msg, 735 dump_port(p0, sizeof(p0), "", ports[0]), 736 dump_port(p1, sizeof(p1), ", ", ports[1])); 737 } 738 739 static bool 740 reset_in_progress(const struct intel_engine_execlists *execlists) 741 { 742 return unlikely(!__tasklet_is_enabled(&execlists->tasklet)); 743 } 744 745 static __maybe_unused noinline bool 746 assert_pending_valid(const struct intel_engine_execlists *execlists, 747 const char *msg) 748 { 749 struct intel_engine_cs *engine = 750 container_of(execlists, typeof(*engine), execlists); 751 struct i915_request * const *port, *rq, *prev = NULL; 752 struct intel_context *ce = NULL; 753 u32 ccid = -1; 754 755 trace_ports(execlists, msg, execlists->pending); 756 757 /* We may be messing around with the lists during reset, lalala */ 758 if (reset_in_progress(execlists)) 759 return true; 760 761 if (!execlists->pending[0]) { 762 GEM_TRACE_ERR("%s: Nothing pending for promotion!\n", 763 engine->name); 764 return false; 765 } 766 767 if (execlists->pending[execlists_num_ports(execlists)]) { 768 GEM_TRACE_ERR("%s: Excess pending[%d] for promotion!\n", 769 engine->name, execlists_num_ports(execlists)); 770 return false; 771 } 772 773 for (port = execlists->pending; (rq = *port); port++) { 774 unsigned long flags; 775 bool ok = true; 776 777 GEM_BUG_ON(!kref_read(&rq->fence.refcount)); 778 GEM_BUG_ON(!i915_request_is_active(rq)); 779 780 if (ce == rq->context) { 781 GEM_TRACE_ERR("%s: Dup context:%llx in pending[%zd]\n", 782 engine->name, 783 ce->timeline->fence_context, 784 port - execlists->pending); 785 return false; 786 } 787 ce = rq->context; 788 789 if (ccid == ce->lrc.ccid) { 790 GEM_TRACE_ERR("%s: Dup ccid:%x context:%llx in pending[%zd]\n", 791 engine->name, 792 ccid, ce->timeline->fence_context, 793 port - execlists->pending); 794 return false; 795 } 796 ccid = ce->lrc.ccid; 797 798 /* 799 * Sentinels are supposed to be the last request so they flush 800 * the current execution off the HW. Check that they are the only 801 * request in the pending submission. 802 * 803 * NB: Due to the async nature of preempt-to-busy and request 804 * cancellation we need to handle the case where request 805 * becomes a sentinel in parallel to CSB processing. 806 */ 807 if (prev && i915_request_has_sentinel(prev) && 808 !READ_ONCE(prev->fence.error)) { 809 GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n", 810 engine->name, 811 ce->timeline->fence_context, 812 port - execlists->pending); 813 return false; 814 } 815 prev = rq; 816 817 /* 818 * We want virtual requests to only be in the first slot so 819 * that they are never stuck behind a hog and can be immediately 820 * transferred onto the next idle engine. 821 */ 822 if (rq->execution_mask != engine->mask && 823 port != execlists->pending) { 824 GEM_TRACE_ERR("%s: virtual engine:%llx not in prime position[%zd]\n", 825 engine->name, 826 ce->timeline->fence_context, 827 port - execlists->pending); 828 return false; 829 } 830 831 /* Hold tightly onto the lock to prevent concurrent retires! */ 832 if (!spin_trylock_irqsave(&rq->lock, flags)) 833 continue; 834 835 if (__i915_request_is_complete(rq)) 836 goto unlock; 837 838 if (i915_active_is_idle(&ce->active) && 839 !intel_context_is_barrier(ce)) { 840 GEM_TRACE_ERR("%s: Inactive context:%llx in pending[%zd]\n", 841 engine->name, 842 ce->timeline->fence_context, 843 port - execlists->pending); 844 ok = false; 845 goto unlock; 846 } 847 848 if (!i915_vma_is_pinned(ce->state)) { 849 GEM_TRACE_ERR("%s: Unpinned context:%llx in pending[%zd]\n", 850 engine->name, 851 ce->timeline->fence_context, 852 port - execlists->pending); 853 ok = false; 854 goto unlock; 855 } 856 857 if (!i915_vma_is_pinned(ce->ring->vma)) { 858 GEM_TRACE_ERR("%s: Unpinned ring:%llx in pending[%zd]\n", 859 engine->name, 860 ce->timeline->fence_context, 861 port - execlists->pending); 862 ok = false; 863 goto unlock; 864 } 865 866 unlock: 867 spin_unlock_irqrestore(&rq->lock, flags); 868 if (!ok) 869 return false; 870 } 871 872 return ce; 873 } 874 875 static void execlists_submit_ports(struct intel_engine_cs *engine) 876 { 877 struct intel_engine_execlists *execlists = &engine->execlists; 878 unsigned int n; 879 880 GEM_BUG_ON(!assert_pending_valid(execlists, "submit")); 881 882 /* 883 * We can skip acquiring intel_runtime_pm_get() here as it was taken 884 * on our behalf by the request (see i915_gem_mark_busy()) and it will 885 * not be relinquished until the device is idle (see 886 * i915_gem_idle_work_handler()). As a precaution, we make sure 887 * that all ELSP are drained i.e. we have processed the CSB, 888 * before allowing ourselves to idle and calling intel_runtime_pm_put(). 889 */ 890 GEM_BUG_ON(!intel_engine_pm_is_awake(engine)); 891 892 /* 893 * ELSQ note: the submit queue is not cleared after being submitted 894 * to the HW so we need to make sure we always clean it up. This is 895 * currently ensured by the fact that we always write the same number 896 * of elsq entries, keep this in mind before changing the loop below. 897 */ 898 for (n = execlists_num_ports(execlists); n--; ) { 899 struct i915_request *rq = execlists->pending[n]; 900 901 write_desc(execlists, 902 rq ? execlists_update_context(rq) : 0, 903 n); 904 } 905 906 /* we need to manually load the submit queue */ 907 if (execlists->ctrl_reg) 908 writel(EL_CTRL_LOAD, execlists->ctrl_reg); 909 } 910 911 static bool ctx_single_port_submission(const struct intel_context *ce) 912 { 913 return (IS_ENABLED(CONFIG_DRM_I915_GVT) && 914 intel_context_force_single_submission(ce)); 915 } 916 917 static bool can_merge_ctx(const struct intel_context *prev, 918 const struct intel_context *next) 919 { 920 if (prev != next) 921 return false; 922 923 if (ctx_single_port_submission(prev)) 924 return false; 925 926 return true; 927 } 928 929 static unsigned long i915_request_flags(const struct i915_request *rq) 930 { 931 return READ_ONCE(rq->fence.flags); 932 } 933 934 static bool can_merge_rq(const struct i915_request *prev, 935 const struct i915_request *next) 936 { 937 GEM_BUG_ON(prev == next); 938 GEM_BUG_ON(!assert_priority_queue(prev, next)); 939 940 /* 941 * We do not submit known completed requests. Therefore if the next 942 * request is already completed, we can pretend to merge it in 943 * with the previous context (and we will skip updating the ELSP 944 * and tracking). Thus hopefully keeping the ELSP full with active 945 * contexts, despite the best efforts of preempt-to-busy to confuse 946 * us. 947 */ 948 if (__i915_request_is_complete(next)) 949 return true; 950 951 if (unlikely((i915_request_flags(prev) | i915_request_flags(next)) & 952 (BIT(I915_FENCE_FLAG_NOPREEMPT) | 953 BIT(I915_FENCE_FLAG_SENTINEL)))) 954 return false; 955 956 if (!can_merge_ctx(prev->context, next->context)) 957 return false; 958 959 GEM_BUG_ON(i915_seqno_passed(prev->fence.seqno, next->fence.seqno)); 960 return true; 961 } 962 963 static bool virtual_matches(const struct virtual_engine *ve, 964 const struct i915_request *rq, 965 const struct intel_engine_cs *engine) 966 { 967 const struct intel_engine_cs *inflight; 968 969 if (!rq) 970 return false; 971 972 if (!(rq->execution_mask & engine->mask)) /* We peeked too soon! */ 973 return false; 974 975 /* 976 * We track when the HW has completed saving the context image 977 * (i.e. when we have seen the final CS event switching out of 978 * the context) and must not overwrite the context image before 979 * then. This restricts us to only using the active engine 980 * while the previous virtualized request is inflight (so 981 * we reuse the register offsets). This is a very small 982 * hystersis on the greedy seelction algorithm. 983 */ 984 inflight = intel_context_inflight(&ve->context); 985 if (inflight && inflight != engine) 986 return false; 987 988 return true; 989 } 990 991 static struct virtual_engine * 992 first_virtual_engine(struct intel_engine_cs *engine) 993 { 994 struct intel_engine_execlists *el = &engine->execlists; 995 struct rb_node *rb = rb_first_cached(&el->virtual); 996 997 while (rb) { 998 struct virtual_engine *ve = 999 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 1000 struct i915_request *rq = READ_ONCE(ve->request); 1001 1002 /* lazily cleanup after another engine handled rq */ 1003 if (!rq || !virtual_matches(ve, rq, engine)) { 1004 rb_erase_cached(rb, &el->virtual); 1005 RB_CLEAR_NODE(rb); 1006 rb = rb_first_cached(&el->virtual); 1007 continue; 1008 } 1009 1010 return ve; 1011 } 1012 1013 return NULL; 1014 } 1015 1016 static void virtual_xfer_context(struct virtual_engine *ve, 1017 struct intel_engine_cs *engine) 1018 { 1019 unsigned int n; 1020 1021 if (likely(engine == ve->siblings[0])) 1022 return; 1023 1024 GEM_BUG_ON(READ_ONCE(ve->context.inflight)); 1025 if (!intel_engine_has_relative_mmio(engine)) 1026 lrc_update_offsets(&ve->context, engine); 1027 1028 /* 1029 * Move the bound engine to the top of the list for 1030 * future execution. We then kick this tasklet first 1031 * before checking others, so that we preferentially 1032 * reuse this set of bound registers. 1033 */ 1034 for (n = 1; n < ve->num_siblings; n++) { 1035 if (ve->siblings[n] == engine) { 1036 swap(ve->siblings[n], ve->siblings[0]); 1037 break; 1038 } 1039 } 1040 } 1041 1042 static void defer_request(struct i915_request *rq, struct list_head * const pl) 1043 { 1044 LIST_HEAD(list); 1045 1046 /* 1047 * We want to move the interrupted request to the back of 1048 * the round-robin list (i.e. its priority level), but 1049 * in doing so, we must then move all requests that were in 1050 * flight and were waiting for the interrupted request to 1051 * be run after it again. 1052 */ 1053 do { 1054 struct i915_dependency *p; 1055 1056 GEM_BUG_ON(i915_request_is_active(rq)); 1057 list_move_tail(&rq->sched.link, pl); 1058 1059 for_each_waiter(p, rq) { 1060 struct i915_request *w = 1061 container_of(p->waiter, typeof(*w), sched); 1062 1063 if (p->flags & I915_DEPENDENCY_WEAK) 1064 continue; 1065 1066 /* Leave semaphores spinning on the other engines */ 1067 if (w->engine != rq->engine) 1068 continue; 1069 1070 /* No waiter should start before its signaler */ 1071 GEM_BUG_ON(i915_request_has_initial_breadcrumb(w) && 1072 __i915_request_has_started(w) && 1073 !__i915_request_is_complete(rq)); 1074 1075 if (!i915_request_is_ready(w)) 1076 continue; 1077 1078 if (rq_prio(w) < rq_prio(rq)) 1079 continue; 1080 1081 GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); 1082 GEM_BUG_ON(i915_request_is_active(w)); 1083 list_move_tail(&w->sched.link, &list); 1084 } 1085 1086 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 1087 } while (rq); 1088 } 1089 1090 static void defer_active(struct intel_engine_cs *engine) 1091 { 1092 struct i915_request *rq; 1093 1094 rq = __unwind_incomplete_requests(engine); 1095 if (!rq) 1096 return; 1097 1098 defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq))); 1099 } 1100 1101 static bool 1102 timeslice_yield(const struct intel_engine_execlists *el, 1103 const struct i915_request *rq) 1104 { 1105 /* 1106 * Once bitten, forever smitten! 1107 * 1108 * If the active context ever busy-waited on a semaphore, 1109 * it will be treated as a hog until the end of its timeslice (i.e. 1110 * until it is scheduled out and replaced by a new submission, 1111 * possibly even its own lite-restore). The HW only sends an interrupt 1112 * on the first miss, and we do know if that semaphore has been 1113 * signaled, or even if it is now stuck on another semaphore. Play 1114 * safe, yield if it might be stuck -- it will be given a fresh 1115 * timeslice in the near future. 1116 */ 1117 return rq->context->lrc.ccid == READ_ONCE(el->yield); 1118 } 1119 1120 static bool needs_timeslice(const struct intel_engine_cs *engine, 1121 const struct i915_request *rq) 1122 { 1123 if (!intel_engine_has_timeslices(engine)) 1124 return false; 1125 1126 /* If not currently active, or about to switch, wait for next event */ 1127 if (!rq || __i915_request_is_complete(rq)) 1128 return false; 1129 1130 /* We do not need to start the timeslice until after the ACK */ 1131 if (READ_ONCE(engine->execlists.pending[0])) 1132 return false; 1133 1134 /* If ELSP[1] is occupied, always check to see if worth slicing */ 1135 if (!list_is_last_rcu(&rq->sched.link, &engine->active.requests)) { 1136 ENGINE_TRACE(engine, "timeslice required for second inflight context\n"); 1137 return true; 1138 } 1139 1140 /* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */ 1141 if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)) { 1142 ENGINE_TRACE(engine, "timeslice required for queue\n"); 1143 return true; 1144 } 1145 1146 if (!RB_EMPTY_ROOT(&engine->execlists.virtual.rb_root)) { 1147 ENGINE_TRACE(engine, "timeslice required for virtual\n"); 1148 return true; 1149 } 1150 1151 return false; 1152 } 1153 1154 static bool 1155 timeslice_expired(struct intel_engine_cs *engine, const struct i915_request *rq) 1156 { 1157 const struct intel_engine_execlists *el = &engine->execlists; 1158 1159 if (i915_request_has_nopreempt(rq) && __i915_request_has_started(rq)) 1160 return false; 1161 1162 if (!needs_timeslice(engine, rq)) 1163 return false; 1164 1165 return timer_expired(&el->timer) || timeslice_yield(el, rq); 1166 } 1167 1168 static unsigned long timeslice(const struct intel_engine_cs *engine) 1169 { 1170 return READ_ONCE(engine->props.timeslice_duration_ms); 1171 } 1172 1173 static void start_timeslice(struct intel_engine_cs *engine) 1174 { 1175 struct intel_engine_execlists *el = &engine->execlists; 1176 unsigned long duration; 1177 1178 /* Disable the timer if there is nothing to switch to */ 1179 duration = 0; 1180 if (needs_timeslice(engine, *el->active)) { 1181 /* Avoid continually prolonging an active timeslice */ 1182 if (timer_active(&el->timer)) { 1183 /* 1184 * If we just submitted a new ELSP after an old 1185 * context, that context may have already consumed 1186 * its timeslice, so recheck. 1187 */ 1188 if (!timer_pending(&el->timer)) 1189 tasklet_hi_schedule(&el->tasklet); 1190 return; 1191 } 1192 1193 duration = timeslice(engine); 1194 } 1195 1196 set_timer_ms(&el->timer, duration); 1197 } 1198 1199 static void record_preemption(struct intel_engine_execlists *execlists) 1200 { 1201 (void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++); 1202 } 1203 1204 static unsigned long active_preempt_timeout(struct intel_engine_cs *engine, 1205 const struct i915_request *rq) 1206 { 1207 if (!rq) 1208 return 0; 1209 1210 /* Force a fast reset for terminated contexts (ignoring sysfs!) */ 1211 if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq))) 1212 return 1; 1213 1214 return READ_ONCE(engine->props.preempt_timeout_ms); 1215 } 1216 1217 static void set_preempt_timeout(struct intel_engine_cs *engine, 1218 const struct i915_request *rq) 1219 { 1220 if (!intel_engine_has_preempt_reset(engine)) 1221 return; 1222 1223 set_timer_ms(&engine->execlists.preempt, 1224 active_preempt_timeout(engine, rq)); 1225 } 1226 1227 static bool completed(const struct i915_request *rq) 1228 { 1229 if (i915_request_has_sentinel(rq)) 1230 return false; 1231 1232 return __i915_request_is_complete(rq); 1233 } 1234 1235 static void execlists_dequeue(struct intel_engine_cs *engine) 1236 { 1237 struct intel_engine_execlists * const execlists = &engine->execlists; 1238 struct i915_request **port = execlists->pending; 1239 struct i915_request ** const last_port = port + execlists->port_mask; 1240 struct i915_request *last, * const *active; 1241 struct virtual_engine *ve; 1242 struct rb_node *rb; 1243 bool submit = false; 1244 1245 /* 1246 * Hardware submission is through 2 ports. Conceptually each port 1247 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is 1248 * static for a context, and unique to each, so we only execute 1249 * requests belonging to a single context from each ring. RING_HEAD 1250 * is maintained by the CS in the context image, it marks the place 1251 * where it got up to last time, and through RING_TAIL we tell the CS 1252 * where we want to execute up to this time. 1253 * 1254 * In this list the requests are in order of execution. Consecutive 1255 * requests from the same context are adjacent in the ringbuffer. We 1256 * can combine these requests into a single RING_TAIL update: 1257 * 1258 * RING_HEAD...req1...req2 1259 * ^- RING_TAIL 1260 * since to execute req2 the CS must first execute req1. 1261 * 1262 * Our goal then is to point each port to the end of a consecutive 1263 * sequence of requests as being the most optimal (fewest wake ups 1264 * and context switches) submission. 1265 */ 1266 1267 spin_lock(&engine->active.lock); 1268 1269 /* 1270 * If the queue is higher priority than the last 1271 * request in the currently active context, submit afresh. 1272 * We will resubmit again afterwards in case we need to split 1273 * the active context to interject the preemption request, 1274 * i.e. we will retrigger preemption following the ack in case 1275 * of trouble. 1276 * 1277 */ 1278 active = execlists->active; 1279 while ((last = *active) && completed(last)) 1280 active++; 1281 1282 if (last) { 1283 if (need_preempt(engine, last)) { 1284 ENGINE_TRACE(engine, 1285 "preempting last=%llx:%lld, prio=%d, hint=%d\n", 1286 last->fence.context, 1287 last->fence.seqno, 1288 last->sched.attr.priority, 1289 execlists->queue_priority_hint); 1290 record_preemption(execlists); 1291 1292 /* 1293 * Don't let the RING_HEAD advance past the breadcrumb 1294 * as we unwind (and until we resubmit) so that we do 1295 * not accidentally tell it to go backwards. 1296 */ 1297 ring_set_paused(engine, 1); 1298 1299 /* 1300 * Note that we have not stopped the GPU at this point, 1301 * so we are unwinding the incomplete requests as they 1302 * remain inflight and so by the time we do complete 1303 * the preemption, some of the unwound requests may 1304 * complete! 1305 */ 1306 __unwind_incomplete_requests(engine); 1307 1308 last = NULL; 1309 } else if (timeslice_expired(engine, last)) { 1310 ENGINE_TRACE(engine, 1311 "expired:%s last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n", 1312 yesno(timer_expired(&execlists->timer)), 1313 last->fence.context, last->fence.seqno, 1314 rq_prio(last), 1315 execlists->queue_priority_hint, 1316 yesno(timeslice_yield(execlists, last))); 1317 1318 /* 1319 * Consume this timeslice; ensure we start a new one. 1320 * 1321 * The timeslice expired, and we will unwind the 1322 * running contexts and recompute the next ELSP. 1323 * If that submit will be the same pair of contexts 1324 * (due to dependency ordering), we will skip the 1325 * submission. If we don't cancel the timer now, 1326 * we will see that the timer has expired and 1327 * reschedule the tasklet; continually until the 1328 * next context switch or other preeemption event. 1329 * 1330 * Since we have decided to reschedule based on 1331 * consumption of this timeslice, if we submit the 1332 * same context again, grant it a full timeslice. 1333 */ 1334 cancel_timer(&execlists->timer); 1335 ring_set_paused(engine, 1); 1336 defer_active(engine); 1337 1338 /* 1339 * Unlike for preemption, if we rewind and continue 1340 * executing the same context as previously active, 1341 * the order of execution will remain the same and 1342 * the tail will only advance. We do not need to 1343 * force a full context restore, as a lite-restore 1344 * is sufficient to resample the monotonic TAIL. 1345 * 1346 * If we switch to any other context, similarly we 1347 * will not rewind TAIL of current context, and 1348 * normal save/restore will preserve state and allow 1349 * us to later continue executing the same request. 1350 */ 1351 last = NULL; 1352 } else { 1353 /* 1354 * Otherwise if we already have a request pending 1355 * for execution after the current one, we can 1356 * just wait until the next CS event before 1357 * queuing more. In either case we will force a 1358 * lite-restore preemption event, but if we wait 1359 * we hopefully coalesce several updates into a single 1360 * submission. 1361 */ 1362 if (active[1]) { 1363 /* 1364 * Even if ELSP[1] is occupied and not worthy 1365 * of timeslices, our queue might be. 1366 */ 1367 spin_unlock(&engine->active.lock); 1368 return; 1369 } 1370 } 1371 } 1372 1373 /* XXX virtual is always taking precedence */ 1374 while ((ve = first_virtual_engine(engine))) { 1375 struct i915_request *rq; 1376 1377 spin_lock(&ve->base.active.lock); 1378 1379 rq = ve->request; 1380 if (unlikely(!virtual_matches(ve, rq, engine))) 1381 goto unlock; /* lost the race to a sibling */ 1382 1383 GEM_BUG_ON(rq->engine != &ve->base); 1384 GEM_BUG_ON(rq->context != &ve->context); 1385 1386 if (unlikely(rq_prio(rq) < queue_prio(execlists))) { 1387 spin_unlock(&ve->base.active.lock); 1388 break; 1389 } 1390 1391 if (last && !can_merge_rq(last, rq)) { 1392 spin_unlock(&ve->base.active.lock); 1393 spin_unlock(&engine->active.lock); 1394 return; /* leave this for another sibling */ 1395 } 1396 1397 ENGINE_TRACE(engine, 1398 "virtual rq=%llx:%lld%s, new engine? %s\n", 1399 rq->fence.context, 1400 rq->fence.seqno, 1401 __i915_request_is_complete(rq) ? "!" : 1402 __i915_request_has_started(rq) ? "*" : 1403 "", 1404 yesno(engine != ve->siblings[0])); 1405 1406 WRITE_ONCE(ve->request, NULL); 1407 WRITE_ONCE(ve->base.execlists.queue_priority_hint, INT_MIN); 1408 1409 rb = &ve->nodes[engine->id].rb; 1410 rb_erase_cached(rb, &execlists->virtual); 1411 RB_CLEAR_NODE(rb); 1412 1413 GEM_BUG_ON(!(rq->execution_mask & engine->mask)); 1414 WRITE_ONCE(rq->engine, engine); 1415 1416 if (__i915_request_submit(rq)) { 1417 /* 1418 * Only after we confirm that we will submit 1419 * this request (i.e. it has not already 1420 * completed), do we want to update the context. 1421 * 1422 * This serves two purposes. It avoids 1423 * unnecessary work if we are resubmitting an 1424 * already completed request after timeslicing. 1425 * But more importantly, it prevents us altering 1426 * ve->siblings[] on an idle context, where 1427 * we may be using ve->siblings[] in 1428 * virtual_context_enter / virtual_context_exit. 1429 */ 1430 virtual_xfer_context(ve, engine); 1431 GEM_BUG_ON(ve->siblings[0] != engine); 1432 1433 submit = true; 1434 last = rq; 1435 } 1436 1437 i915_request_put(rq); 1438 unlock: 1439 spin_unlock(&ve->base.active.lock); 1440 1441 /* 1442 * Hmm, we have a bunch of virtual engine requests, 1443 * but the first one was already completed (thanks 1444 * preempt-to-busy!). Keep looking at the veng queue 1445 * until we have no more relevant requests (i.e. 1446 * the normal submit queue has higher priority). 1447 */ 1448 if (submit) 1449 break; 1450 } 1451 1452 while ((rb = rb_first_cached(&execlists->queue))) { 1453 struct i915_priolist *p = to_priolist(rb); 1454 struct i915_request *rq, *rn; 1455 1456 priolist_for_each_request_consume(rq, rn, p) { 1457 bool merge = true; 1458 1459 /* 1460 * Can we combine this request with the current port? 1461 * It has to be the same context/ringbuffer and not 1462 * have any exceptions (e.g. GVT saying never to 1463 * combine contexts). 1464 * 1465 * If we can combine the requests, we can execute both 1466 * by updating the RING_TAIL to point to the end of the 1467 * second request, and so we never need to tell the 1468 * hardware about the first. 1469 */ 1470 if (last && !can_merge_rq(last, rq)) { 1471 /* 1472 * If we are on the second port and cannot 1473 * combine this request with the last, then we 1474 * are done. 1475 */ 1476 if (port == last_port) 1477 goto done; 1478 1479 /* 1480 * We must not populate both ELSP[] with the 1481 * same LRCA, i.e. we must submit 2 different 1482 * contexts if we submit 2 ELSP. 1483 */ 1484 if (last->context == rq->context) 1485 goto done; 1486 1487 if (i915_request_has_sentinel(last)) 1488 goto done; 1489 1490 /* 1491 * We avoid submitting virtual requests into 1492 * the secondary ports so that we can migrate 1493 * the request immediately to another engine 1494 * rather than wait for the primary request. 1495 */ 1496 if (rq->execution_mask != engine->mask) 1497 goto done; 1498 1499 /* 1500 * If GVT overrides us we only ever submit 1501 * port[0], leaving port[1] empty. Note that we 1502 * also have to be careful that we don't queue 1503 * the same context (even though a different 1504 * request) to the second port. 1505 */ 1506 if (ctx_single_port_submission(last->context) || 1507 ctx_single_port_submission(rq->context)) 1508 goto done; 1509 1510 merge = false; 1511 } 1512 1513 if (__i915_request_submit(rq)) { 1514 if (!merge) { 1515 *port++ = i915_request_get(last); 1516 last = NULL; 1517 } 1518 1519 GEM_BUG_ON(last && 1520 !can_merge_ctx(last->context, 1521 rq->context)); 1522 GEM_BUG_ON(last && 1523 i915_seqno_passed(last->fence.seqno, 1524 rq->fence.seqno)); 1525 1526 submit = true; 1527 last = rq; 1528 } 1529 } 1530 1531 rb_erase_cached(&p->node, &execlists->queue); 1532 i915_priolist_free(p); 1533 } 1534 done: 1535 *port++ = i915_request_get(last); 1536 1537 /* 1538 * Here be a bit of magic! Or sleight-of-hand, whichever you prefer. 1539 * 1540 * We choose the priority hint such that if we add a request of greater 1541 * priority than this, we kick the submission tasklet to decide on 1542 * the right order of submitting the requests to hardware. We must 1543 * also be prepared to reorder requests as they are in-flight on the 1544 * HW. We derive the priority hint then as the first "hole" in 1545 * the HW submission ports and if there are no available slots, 1546 * the priority of the lowest executing request, i.e. last. 1547 * 1548 * When we do receive a higher priority request ready to run from the 1549 * user, see queue_request(), the priority hint is bumped to that 1550 * request triggering preemption on the next dequeue (or subsequent 1551 * interrupt for secondary ports). 1552 */ 1553 execlists->queue_priority_hint = queue_prio(execlists); 1554 spin_unlock(&engine->active.lock); 1555 1556 /* 1557 * We can skip poking the HW if we ended up with exactly the same set 1558 * of requests as currently running, e.g. trying to timeslice a pair 1559 * of ordered contexts. 1560 */ 1561 if (submit && 1562 memcmp(active, 1563 execlists->pending, 1564 (port - execlists->pending) * sizeof(*port))) { 1565 *port = NULL; 1566 while (port-- != execlists->pending) 1567 execlists_schedule_in(*port, port - execlists->pending); 1568 1569 WRITE_ONCE(execlists->yield, -1); 1570 set_preempt_timeout(engine, *active); 1571 execlists_submit_ports(engine); 1572 } else { 1573 ring_set_paused(engine, 0); 1574 while (port-- != execlists->pending) 1575 i915_request_put(*port); 1576 *execlists->pending = NULL; 1577 } 1578 } 1579 1580 static void execlists_dequeue_irq(struct intel_engine_cs *engine) 1581 { 1582 local_irq_disable(); /* Suspend interrupts across request submission */ 1583 execlists_dequeue(engine); 1584 local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */ 1585 } 1586 1587 static void clear_ports(struct i915_request **ports, int count) 1588 { 1589 memset_p((void **)ports, NULL, count); 1590 } 1591 1592 static void 1593 copy_ports(struct i915_request **dst, struct i915_request **src, int count) 1594 { 1595 /* A memcpy_p() would be very useful here! */ 1596 while (count--) 1597 WRITE_ONCE(*dst++, *src++); /* avoid write tearing */ 1598 } 1599 1600 static struct i915_request ** 1601 cancel_port_requests(struct intel_engine_execlists * const execlists, 1602 struct i915_request **inactive) 1603 { 1604 struct i915_request * const *port; 1605 1606 for (port = execlists->pending; *port; port++) 1607 *inactive++ = *port; 1608 clear_ports(execlists->pending, ARRAY_SIZE(execlists->pending)); 1609 1610 /* Mark the end of active before we overwrite *active */ 1611 for (port = xchg(&execlists->active, execlists->pending); *port; port++) 1612 *inactive++ = *port; 1613 clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight)); 1614 1615 smp_wmb(); /* complete the seqlock for execlists_active() */ 1616 WRITE_ONCE(execlists->active, execlists->inflight); 1617 1618 /* Having cancelled all outstanding process_csb(), stop their timers */ 1619 GEM_BUG_ON(execlists->pending[0]); 1620 cancel_timer(&execlists->timer); 1621 cancel_timer(&execlists->preempt); 1622 1623 return inactive; 1624 } 1625 1626 static void invalidate_csb_entries(const u64 *first, const u64 *last) 1627 { 1628 clflush((void *)first); 1629 clflush((void *)last); 1630 } 1631 1632 /* 1633 * Starting with Gen12, the status has a new format: 1634 * 1635 * bit 0: switched to new queue 1636 * bit 1: reserved 1637 * bit 2: semaphore wait mode (poll or signal), only valid when 1638 * switch detail is set to "wait on semaphore" 1639 * bits 3-5: engine class 1640 * bits 6-11: engine instance 1641 * bits 12-14: reserved 1642 * bits 15-25: sw context id of the lrc the GT switched to 1643 * bits 26-31: sw counter of the lrc the GT switched to 1644 * bits 32-35: context switch detail 1645 * - 0: ctx complete 1646 * - 1: wait on sync flip 1647 * - 2: wait on vblank 1648 * - 3: wait on scanline 1649 * - 4: wait on semaphore 1650 * - 5: context preempted (not on SEMAPHORE_WAIT or 1651 * WAIT_FOR_EVENT) 1652 * bit 36: reserved 1653 * bits 37-43: wait detail (for switch detail 1 to 4) 1654 * bits 44-46: reserved 1655 * bits 47-57: sw context id of the lrc the GT switched away from 1656 * bits 58-63: sw counter of the lrc the GT switched away from 1657 */ 1658 static bool gen12_csb_parse(const u64 csb) 1659 { 1660 bool ctx_away_valid = GEN12_CSB_CTX_VALID(upper_32_bits(csb)); 1661 bool new_queue = 1662 lower_32_bits(csb) & GEN12_CTX_STATUS_SWITCHED_TO_NEW_QUEUE; 1663 1664 /* 1665 * The context switch detail is not guaranteed to be 5 when a preemption 1666 * occurs, so we can't just check for that. The check below works for 1667 * all the cases we care about, including preemptions of WAIT 1668 * instructions and lite-restore. Preempt-to-idle via the CTRL register 1669 * would require some extra handling, but we don't support that. 1670 */ 1671 if (!ctx_away_valid || new_queue) { 1672 GEM_BUG_ON(!GEN12_CSB_CTX_VALID(lower_32_bits(csb))); 1673 return true; 1674 } 1675 1676 /* 1677 * switch detail = 5 is covered by the case above and we do not expect a 1678 * context switch on an unsuccessful wait instruction since we always 1679 * use polling mode. 1680 */ 1681 GEM_BUG_ON(GEN12_CTX_SWITCH_DETAIL(upper_32_bits(csb))); 1682 return false; 1683 } 1684 1685 static bool gen8_csb_parse(const u64 csb) 1686 { 1687 return csb & (GEN8_CTX_STATUS_IDLE_ACTIVE | GEN8_CTX_STATUS_PREEMPTED); 1688 } 1689 1690 static noinline u64 1691 wa_csb_read(const struct intel_engine_cs *engine, u64 * const csb) 1692 { 1693 u64 entry; 1694 1695 /* 1696 * Reading from the HWSP has one particular advantage: we can detect 1697 * a stale entry. Since the write into HWSP is broken, we have no reason 1698 * to trust the HW at all, the mmio entry may equally be unordered, so 1699 * we prefer the path that is self-checking and as a last resort, 1700 * return the mmio value. 1701 * 1702 * tgl,dg1:HSDES#22011327657 1703 */ 1704 preempt_disable(); 1705 if (wait_for_atomic_us((entry = READ_ONCE(*csb)) != -1, 10)) { 1706 int idx = csb - engine->execlists.csb_status; 1707 int status; 1708 1709 status = GEN8_EXECLISTS_STATUS_BUF; 1710 if (idx >= 6) { 1711 status = GEN11_EXECLISTS_STATUS_BUF2; 1712 idx -= 6; 1713 } 1714 status += sizeof(u64) * idx; 1715 1716 entry = intel_uncore_read64(engine->uncore, 1717 _MMIO(engine->mmio_base + status)); 1718 } 1719 preempt_enable(); 1720 1721 return entry; 1722 } 1723 1724 static u64 csb_read(const struct intel_engine_cs *engine, u64 * const csb) 1725 { 1726 u64 entry = READ_ONCE(*csb); 1727 1728 /* 1729 * Unfortunately, the GPU does not always serialise its write 1730 * of the CSB entries before its write of the CSB pointer, at least 1731 * from the perspective of the CPU, using what is known as a Global 1732 * Observation Point. We may read a new CSB tail pointer, but then 1733 * read the stale CSB entries, causing us to misinterpret the 1734 * context-switch events, and eventually declare the GPU hung. 1735 * 1736 * icl:HSDES#1806554093 1737 * tgl:HSDES#22011248461 1738 */ 1739 if (unlikely(entry == -1)) 1740 entry = wa_csb_read(engine, csb); 1741 1742 /* Consume this entry so that we can spot its future reuse. */ 1743 WRITE_ONCE(*csb, -1); 1744 1745 /* ELSP is an implicit wmb() before the GPU wraps and overwrites csb */ 1746 return entry; 1747 } 1748 1749 static void new_timeslice(struct intel_engine_execlists *el) 1750 { 1751 /* By cancelling, we will start afresh in start_timeslice() */ 1752 cancel_timer(&el->timer); 1753 } 1754 1755 static struct i915_request ** 1756 process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) 1757 { 1758 struct intel_engine_execlists * const execlists = &engine->execlists; 1759 u64 * const buf = execlists->csb_status; 1760 const u8 num_entries = execlists->csb_size; 1761 struct i915_request **prev; 1762 u8 head, tail; 1763 1764 /* 1765 * As we modify our execlists state tracking we require exclusive 1766 * access. Either we are inside the tasklet, or the tasklet is disabled 1767 * and we assume that is only inside the reset paths and so serialised. 1768 */ 1769 GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) && 1770 !reset_in_progress(execlists)); 1771 GEM_BUG_ON(!intel_engine_in_execlists_submission_mode(engine)); 1772 1773 /* 1774 * Note that csb_write, csb_status may be either in HWSP or mmio. 1775 * When reading from the csb_write mmio register, we have to be 1776 * careful to only use the GEN8_CSB_WRITE_PTR portion, which is 1777 * the low 4bits. As it happens we know the next 4bits are always 1778 * zero and so we can simply masked off the low u8 of the register 1779 * and treat it identically to reading from the HWSP (without having 1780 * to use explicit shifting and masking, and probably bifurcating 1781 * the code to handle the legacy mmio read). 1782 */ 1783 head = execlists->csb_head; 1784 tail = READ_ONCE(*execlists->csb_write); 1785 if (unlikely(head == tail)) 1786 return inactive; 1787 1788 /* 1789 * We will consume all events from HW, or at least pretend to. 1790 * 1791 * The sequence of events from the HW is deterministic, and derived 1792 * from our writes to the ELSP, with a smidgen of variability for 1793 * the arrival of the asynchronous requests wrt to the inflight 1794 * execution. If the HW sends an event that does not correspond with 1795 * the one we are expecting, we have to abandon all hope as we lose 1796 * all tracking of what the engine is actually executing. We will 1797 * only detect we are out of sequence with the HW when we get an 1798 * 'impossible' event because we have already drained our own 1799 * preemption/promotion queue. If this occurs, we know that we likely 1800 * lost track of execution earlier and must unwind and restart, the 1801 * simplest way is by stop processing the event queue and force the 1802 * engine to reset. 1803 */ 1804 execlists->csb_head = tail; 1805 ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail); 1806 1807 /* 1808 * Hopefully paired with a wmb() in HW! 1809 * 1810 * We must complete the read of the write pointer before any reads 1811 * from the CSB, so that we do not see stale values. Without an rmb 1812 * (lfence) the HW may speculatively perform the CSB[] reads *before* 1813 * we perform the READ_ONCE(*csb_write). 1814 */ 1815 rmb(); 1816 1817 /* Remember who was last running under the timer */ 1818 prev = inactive; 1819 *prev = NULL; 1820 1821 do { 1822 bool promote; 1823 u64 csb; 1824 1825 if (++head == num_entries) 1826 head = 0; 1827 1828 /* 1829 * We are flying near dragons again. 1830 * 1831 * We hold a reference to the request in execlist_port[] 1832 * but no more than that. We are operating in softirq 1833 * context and so cannot hold any mutex or sleep. That 1834 * prevents us stopping the requests we are processing 1835 * in port[] from being retired simultaneously (the 1836 * breadcrumb will be complete before we see the 1837 * context-switch). As we only hold the reference to the 1838 * request, any pointer chasing underneath the request 1839 * is subject to a potential use-after-free. Thus we 1840 * store all of the bookkeeping within port[] as 1841 * required, and avoid using unguarded pointers beneath 1842 * request itself. The same applies to the atomic 1843 * status notifier. 1844 */ 1845 1846 csb = csb_read(engine, buf + head); 1847 ENGINE_TRACE(engine, "csb[%d]: status=0x%08x:0x%08x\n", 1848 head, upper_32_bits(csb), lower_32_bits(csb)); 1849 1850 if (INTEL_GEN(engine->i915) >= 12) 1851 promote = gen12_csb_parse(csb); 1852 else 1853 promote = gen8_csb_parse(csb); 1854 if (promote) { 1855 struct i915_request * const *old = execlists->active; 1856 1857 if (GEM_WARN_ON(!*execlists->pending)) { 1858 execlists->error_interrupt |= ERROR_CSB; 1859 break; 1860 } 1861 1862 ring_set_paused(engine, 0); 1863 1864 /* Point active to the new ELSP; prevent overwriting */ 1865 WRITE_ONCE(execlists->active, execlists->pending); 1866 smp_wmb(); /* notify execlists_active() */ 1867 1868 /* cancel old inflight, prepare for switch */ 1869 trace_ports(execlists, "preempted", old); 1870 while (*old) 1871 *inactive++ = *old++; 1872 1873 /* switch pending to inflight */ 1874 GEM_BUG_ON(!assert_pending_valid(execlists, "promote")); 1875 copy_ports(execlists->inflight, 1876 execlists->pending, 1877 execlists_num_ports(execlists)); 1878 smp_wmb(); /* complete the seqlock */ 1879 WRITE_ONCE(execlists->active, execlists->inflight); 1880 1881 /* XXX Magic delay for tgl */ 1882 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 1883 1884 WRITE_ONCE(execlists->pending[0], NULL); 1885 } else { 1886 if (GEM_WARN_ON(!*execlists->active)) { 1887 execlists->error_interrupt |= ERROR_CSB; 1888 break; 1889 } 1890 1891 /* port0 completed, advanced to port1 */ 1892 trace_ports(execlists, "completed", execlists->active); 1893 1894 /* 1895 * We rely on the hardware being strongly 1896 * ordered, that the breadcrumb write is 1897 * coherent (visible from the CPU) before the 1898 * user interrupt is processed. One might assume 1899 * that the breadcrumb write being before the 1900 * user interrupt and the CS event for the context 1901 * switch would therefore be before the CS event 1902 * itself... 1903 */ 1904 if (GEM_SHOW_DEBUG() && 1905 !__i915_request_is_complete(*execlists->active)) { 1906 struct i915_request *rq = *execlists->active; 1907 const u32 *regs __maybe_unused = 1908 rq->context->lrc_reg_state; 1909 1910 ENGINE_TRACE(engine, 1911 "context completed before request!\n"); 1912 ENGINE_TRACE(engine, 1913 "ring:{start:0x%08x, head:%04x, tail:%04x, ctl:%08x, mode:%08x}\n", 1914 ENGINE_READ(engine, RING_START), 1915 ENGINE_READ(engine, RING_HEAD) & HEAD_ADDR, 1916 ENGINE_READ(engine, RING_TAIL) & TAIL_ADDR, 1917 ENGINE_READ(engine, RING_CTL), 1918 ENGINE_READ(engine, RING_MI_MODE)); 1919 ENGINE_TRACE(engine, 1920 "rq:{start:%08x, head:%04x, tail:%04x, seqno:%llx:%d, hwsp:%d}, ", 1921 i915_ggtt_offset(rq->ring->vma), 1922 rq->head, rq->tail, 1923 rq->fence.context, 1924 lower_32_bits(rq->fence.seqno), 1925 hwsp_seqno(rq)); 1926 ENGINE_TRACE(engine, 1927 "ctx:{start:%08x, head:%04x, tail:%04x}, ", 1928 regs[CTX_RING_START], 1929 regs[CTX_RING_HEAD], 1930 regs[CTX_RING_TAIL]); 1931 } 1932 1933 *inactive++ = *execlists->active++; 1934 1935 GEM_BUG_ON(execlists->active - execlists->inflight > 1936 execlists_num_ports(execlists)); 1937 } 1938 } while (head != tail); 1939 1940 /* 1941 * Gen11 has proven to fail wrt global observation point between 1942 * entry and tail update, failing on the ordering and thus 1943 * we see an old entry in the context status buffer. 1944 * 1945 * Forcibly evict out entries for the next gpu csb update, 1946 * to increase the odds that we get a fresh entries with non 1947 * working hardware. The cost for doing so comes out mostly with 1948 * the wash as hardware, working or not, will need to do the 1949 * invalidation before. 1950 */ 1951 invalidate_csb_entries(&buf[0], &buf[num_entries - 1]); 1952 1953 /* 1954 * We assume that any event reflects a change in context flow 1955 * and merits a fresh timeslice. We reinstall the timer after 1956 * inspecting the queue to see if we need to resumbit. 1957 */ 1958 if (*prev != *execlists->active) /* elide lite-restores */ 1959 new_timeslice(execlists); 1960 1961 return inactive; 1962 } 1963 1964 static void post_process_csb(struct i915_request **port, 1965 struct i915_request **last) 1966 { 1967 while (port != last) 1968 execlists_schedule_out(*port++); 1969 } 1970 1971 static void __execlists_hold(struct i915_request *rq) 1972 { 1973 LIST_HEAD(list); 1974 1975 do { 1976 struct i915_dependency *p; 1977 1978 if (i915_request_is_active(rq)) 1979 __i915_request_unsubmit(rq); 1980 1981 clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 1982 list_move_tail(&rq->sched.link, &rq->engine->active.hold); 1983 i915_request_set_hold(rq); 1984 RQ_TRACE(rq, "on hold\n"); 1985 1986 for_each_waiter(p, rq) { 1987 struct i915_request *w = 1988 container_of(p->waiter, typeof(*w), sched); 1989 1990 if (p->flags & I915_DEPENDENCY_WEAK) 1991 continue; 1992 1993 /* Leave semaphores spinning on the other engines */ 1994 if (w->engine != rq->engine) 1995 continue; 1996 1997 if (!i915_request_is_ready(w)) 1998 continue; 1999 2000 if (__i915_request_is_complete(w)) 2001 continue; 2002 2003 if (i915_request_on_hold(w)) 2004 continue; 2005 2006 list_move_tail(&w->sched.link, &list); 2007 } 2008 2009 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 2010 } while (rq); 2011 } 2012 2013 static bool execlists_hold(struct intel_engine_cs *engine, 2014 struct i915_request *rq) 2015 { 2016 if (i915_request_on_hold(rq)) 2017 return false; 2018 2019 spin_lock_irq(&engine->active.lock); 2020 2021 if (__i915_request_is_complete(rq)) { /* too late! */ 2022 rq = NULL; 2023 goto unlock; 2024 } 2025 2026 /* 2027 * Transfer this request onto the hold queue to prevent it 2028 * being resumbitted to HW (and potentially completed) before we have 2029 * released it. Since we may have already submitted following 2030 * requests, we need to remove those as well. 2031 */ 2032 GEM_BUG_ON(i915_request_on_hold(rq)); 2033 GEM_BUG_ON(rq->engine != engine); 2034 __execlists_hold(rq); 2035 GEM_BUG_ON(list_empty(&engine->active.hold)); 2036 2037 unlock: 2038 spin_unlock_irq(&engine->active.lock); 2039 return rq; 2040 } 2041 2042 static bool hold_request(const struct i915_request *rq) 2043 { 2044 struct i915_dependency *p; 2045 bool result = false; 2046 2047 /* 2048 * If one of our ancestors is on hold, we must also be on hold, 2049 * otherwise we will bypass it and execute before it. 2050 */ 2051 rcu_read_lock(); 2052 for_each_signaler(p, rq) { 2053 const struct i915_request *s = 2054 container_of(p->signaler, typeof(*s), sched); 2055 2056 if (s->engine != rq->engine) 2057 continue; 2058 2059 result = i915_request_on_hold(s); 2060 if (result) 2061 break; 2062 } 2063 rcu_read_unlock(); 2064 2065 return result; 2066 } 2067 2068 static void __execlists_unhold(struct i915_request *rq) 2069 { 2070 LIST_HEAD(list); 2071 2072 do { 2073 struct i915_dependency *p; 2074 2075 RQ_TRACE(rq, "hold release\n"); 2076 2077 GEM_BUG_ON(!i915_request_on_hold(rq)); 2078 GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit)); 2079 2080 i915_request_clear_hold(rq); 2081 list_move_tail(&rq->sched.link, 2082 i915_sched_lookup_priolist(rq->engine, 2083 rq_prio(rq))); 2084 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2085 2086 /* Also release any children on this engine that are ready */ 2087 for_each_waiter(p, rq) { 2088 struct i915_request *w = 2089 container_of(p->waiter, typeof(*w), sched); 2090 2091 if (p->flags & I915_DEPENDENCY_WEAK) 2092 continue; 2093 2094 /* Propagate any change in error status */ 2095 if (rq->fence.error) 2096 i915_request_set_error_once(w, rq->fence.error); 2097 2098 if (w->engine != rq->engine) 2099 continue; 2100 2101 if (!i915_request_on_hold(w)) 2102 continue; 2103 2104 /* Check that no other parents are also on hold */ 2105 if (hold_request(w)) 2106 continue; 2107 2108 list_move_tail(&w->sched.link, &list); 2109 } 2110 2111 rq = list_first_entry_or_null(&list, typeof(*rq), sched.link); 2112 } while (rq); 2113 } 2114 2115 static void execlists_unhold(struct intel_engine_cs *engine, 2116 struct i915_request *rq) 2117 { 2118 spin_lock_irq(&engine->active.lock); 2119 2120 /* 2121 * Move this request back to the priority queue, and all of its 2122 * children and grandchildren that were suspended along with it. 2123 */ 2124 __execlists_unhold(rq); 2125 2126 if (rq_prio(rq) > engine->execlists.queue_priority_hint) { 2127 engine->execlists.queue_priority_hint = rq_prio(rq); 2128 tasklet_hi_schedule(&engine->execlists.tasklet); 2129 } 2130 2131 spin_unlock_irq(&engine->active.lock); 2132 } 2133 2134 struct execlists_capture { 2135 struct work_struct work; 2136 struct i915_request *rq; 2137 struct i915_gpu_coredump *error; 2138 }; 2139 2140 static void execlists_capture_work(struct work_struct *work) 2141 { 2142 struct execlists_capture *cap = container_of(work, typeof(*cap), work); 2143 const gfp_t gfp = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN; 2144 struct intel_engine_cs *engine = cap->rq->engine; 2145 struct intel_gt_coredump *gt = cap->error->gt; 2146 struct intel_engine_capture_vma *vma; 2147 2148 /* Compress all the objects attached to the request, slow! */ 2149 vma = intel_engine_coredump_add_request(gt->engine, cap->rq, gfp); 2150 if (vma) { 2151 struct i915_vma_compress *compress = 2152 i915_vma_capture_prepare(gt); 2153 2154 intel_engine_coredump_add_vma(gt->engine, vma, compress); 2155 i915_vma_capture_finish(gt, compress); 2156 } 2157 2158 gt->simulated = gt->engine->simulated; 2159 cap->error->simulated = gt->simulated; 2160 2161 /* Publish the error state, and announce it to the world */ 2162 i915_error_state_store(cap->error); 2163 i915_gpu_coredump_put(cap->error); 2164 2165 /* Return this request and all that depend upon it for signaling */ 2166 execlists_unhold(engine, cap->rq); 2167 i915_request_put(cap->rq); 2168 2169 kfree(cap); 2170 } 2171 2172 static struct execlists_capture *capture_regs(struct intel_engine_cs *engine) 2173 { 2174 const gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN; 2175 struct execlists_capture *cap; 2176 2177 cap = kmalloc(sizeof(*cap), gfp); 2178 if (!cap) 2179 return NULL; 2180 2181 cap->error = i915_gpu_coredump_alloc(engine->i915, gfp); 2182 if (!cap->error) 2183 goto err_cap; 2184 2185 cap->error->gt = intel_gt_coredump_alloc(engine->gt, gfp); 2186 if (!cap->error->gt) 2187 goto err_gpu; 2188 2189 cap->error->gt->engine = intel_engine_coredump_alloc(engine, gfp); 2190 if (!cap->error->gt->engine) 2191 goto err_gt; 2192 2193 cap->error->gt->engine->hung = true; 2194 2195 return cap; 2196 2197 err_gt: 2198 kfree(cap->error->gt); 2199 err_gpu: 2200 kfree(cap->error); 2201 err_cap: 2202 kfree(cap); 2203 return NULL; 2204 } 2205 2206 static struct i915_request * 2207 active_context(struct intel_engine_cs *engine, u32 ccid) 2208 { 2209 const struct intel_engine_execlists * const el = &engine->execlists; 2210 struct i915_request * const *port, *rq; 2211 2212 /* 2213 * Use the most recent result from process_csb(), but just in case 2214 * we trigger an error (via interrupt) before the first CS event has 2215 * been written, peek at the next submission. 2216 */ 2217 2218 for (port = el->active; (rq = *port); port++) { 2219 if (rq->context->lrc.ccid == ccid) { 2220 ENGINE_TRACE(engine, 2221 "ccid:%x found at active:%zd\n", 2222 ccid, port - el->active); 2223 return rq; 2224 } 2225 } 2226 2227 for (port = el->pending; (rq = *port); port++) { 2228 if (rq->context->lrc.ccid == ccid) { 2229 ENGINE_TRACE(engine, 2230 "ccid:%x found at pending:%zd\n", 2231 ccid, port - el->pending); 2232 return rq; 2233 } 2234 } 2235 2236 ENGINE_TRACE(engine, "ccid:%x not found\n", ccid); 2237 return NULL; 2238 } 2239 2240 static u32 active_ccid(struct intel_engine_cs *engine) 2241 { 2242 return ENGINE_READ_FW(engine, RING_EXECLIST_STATUS_HI); 2243 } 2244 2245 static void execlists_capture(struct intel_engine_cs *engine) 2246 { 2247 struct execlists_capture *cap; 2248 2249 if (!IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)) 2250 return; 2251 2252 /* 2253 * We need to _quickly_ capture the engine state before we reset. 2254 * We are inside an atomic section (softirq) here and we are delaying 2255 * the forced preemption event. 2256 */ 2257 cap = capture_regs(engine); 2258 if (!cap) 2259 return; 2260 2261 spin_lock_irq(&engine->active.lock); 2262 cap->rq = active_context(engine, active_ccid(engine)); 2263 if (cap->rq) { 2264 cap->rq = active_request(cap->rq->context->timeline, cap->rq); 2265 cap->rq = i915_request_get_rcu(cap->rq); 2266 } 2267 spin_unlock_irq(&engine->active.lock); 2268 if (!cap->rq) 2269 goto err_free; 2270 2271 /* 2272 * Remove the request from the execlists queue, and take ownership 2273 * of the request. We pass it to our worker who will _slowly_ compress 2274 * all the pages the _user_ requested for debugging their batch, after 2275 * which we return it to the queue for signaling. 2276 * 2277 * By removing them from the execlists queue, we also remove the 2278 * requests from being processed by __unwind_incomplete_requests() 2279 * during the intel_engine_reset(), and so they will *not* be replayed 2280 * afterwards. 2281 * 2282 * Note that because we have not yet reset the engine at this point, 2283 * it is possible for the request that we have identified as being 2284 * guilty, did in fact complete and we will then hit an arbitration 2285 * point allowing the outstanding preemption to succeed. The likelihood 2286 * of that is very low (as capturing of the engine registers should be 2287 * fast enough to run inside an irq-off atomic section!), so we will 2288 * simply hold that request accountable for being non-preemptible 2289 * long enough to force the reset. 2290 */ 2291 if (!execlists_hold(engine, cap->rq)) 2292 goto err_rq; 2293 2294 INIT_WORK(&cap->work, execlists_capture_work); 2295 schedule_work(&cap->work); 2296 return; 2297 2298 err_rq: 2299 i915_request_put(cap->rq); 2300 err_free: 2301 i915_gpu_coredump_put(cap->error); 2302 kfree(cap); 2303 } 2304 2305 static void execlists_reset(struct intel_engine_cs *engine, const char *msg) 2306 { 2307 const unsigned int bit = I915_RESET_ENGINE + engine->id; 2308 unsigned long *lock = &engine->gt->reset.flags; 2309 2310 if (!intel_has_reset_engine(engine->gt)) 2311 return; 2312 2313 if (test_and_set_bit(bit, lock)) 2314 return; 2315 2316 ENGINE_TRACE(engine, "reset for %s\n", msg); 2317 2318 /* Mark this tasklet as disabled to avoid waiting for it to complete */ 2319 tasklet_disable_nosync(&engine->execlists.tasklet); 2320 2321 ring_set_paused(engine, 1); /* Freeze the current request in place */ 2322 execlists_capture(engine); 2323 intel_engine_reset(engine, msg); 2324 2325 tasklet_enable(&engine->execlists.tasklet); 2326 clear_and_wake_up_bit(bit, lock); 2327 } 2328 2329 static bool preempt_timeout(const struct intel_engine_cs *const engine) 2330 { 2331 const struct timer_list *t = &engine->execlists.preempt; 2332 2333 if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT) 2334 return false; 2335 2336 if (!timer_expired(t)) 2337 return false; 2338 2339 return engine->execlists.pending[0]; 2340 } 2341 2342 /* 2343 * Check the unread Context Status Buffers and manage the submission of new 2344 * contexts to the ELSP accordingly. 2345 */ 2346 static void execlists_submission_tasklet(struct tasklet_struct *t) 2347 { 2348 struct intel_engine_cs * const engine = 2349 from_tasklet(engine, t, execlists.tasklet); 2350 struct i915_request *post[2 * EXECLIST_MAX_PORTS]; 2351 struct i915_request **inactive; 2352 2353 rcu_read_lock(); 2354 inactive = process_csb(engine, post); 2355 GEM_BUG_ON(inactive - post > ARRAY_SIZE(post)); 2356 2357 if (unlikely(preempt_timeout(engine))) { 2358 cancel_timer(&engine->execlists.preempt); 2359 engine->execlists.error_interrupt |= ERROR_PREEMPT; 2360 } 2361 2362 if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) { 2363 const char *msg; 2364 2365 /* Generate the error message in priority wrt to the user! */ 2366 if (engine->execlists.error_interrupt & GENMASK(15, 0)) 2367 msg = "CS error"; /* thrown by a user payload */ 2368 else if (engine->execlists.error_interrupt & ERROR_CSB) 2369 msg = "invalid CSB event"; 2370 else if (engine->execlists.error_interrupt & ERROR_PREEMPT) 2371 msg = "preemption time out"; 2372 else 2373 msg = "internal error"; 2374 2375 engine->execlists.error_interrupt = 0; 2376 execlists_reset(engine, msg); 2377 } 2378 2379 if (!engine->execlists.pending[0]) { 2380 execlists_dequeue_irq(engine); 2381 start_timeslice(engine); 2382 } 2383 2384 post_process_csb(post, inactive); 2385 rcu_read_unlock(); 2386 } 2387 2388 static void __execlists_kick(struct intel_engine_execlists *execlists) 2389 { 2390 /* Kick the tasklet for some interrupt coalescing and reset handling */ 2391 tasklet_hi_schedule(&execlists->tasklet); 2392 } 2393 2394 #define execlists_kick(t, member) \ 2395 __execlists_kick(container_of(t, struct intel_engine_execlists, member)) 2396 2397 static void execlists_timeslice(struct timer_list *timer) 2398 { 2399 execlists_kick(timer, timer); 2400 } 2401 2402 static void execlists_preempt(struct timer_list *timer) 2403 { 2404 execlists_kick(timer, preempt); 2405 } 2406 2407 static void queue_request(struct intel_engine_cs *engine, 2408 struct i915_request *rq) 2409 { 2410 GEM_BUG_ON(!list_empty(&rq->sched.link)); 2411 list_add_tail(&rq->sched.link, 2412 i915_sched_lookup_priolist(engine, rq_prio(rq))); 2413 set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); 2414 } 2415 2416 static bool submit_queue(struct intel_engine_cs *engine, 2417 const struct i915_request *rq) 2418 { 2419 struct intel_engine_execlists *execlists = &engine->execlists; 2420 2421 if (rq_prio(rq) <= execlists->queue_priority_hint) 2422 return false; 2423 2424 execlists->queue_priority_hint = rq_prio(rq); 2425 return true; 2426 } 2427 2428 static bool ancestor_on_hold(const struct intel_engine_cs *engine, 2429 const struct i915_request *rq) 2430 { 2431 GEM_BUG_ON(i915_request_on_hold(rq)); 2432 return !list_empty(&engine->active.hold) && hold_request(rq); 2433 } 2434 2435 static void execlists_submit_request(struct i915_request *request) 2436 { 2437 struct intel_engine_cs *engine = request->engine; 2438 unsigned long flags; 2439 2440 /* Will be called from irq-context when using foreign fences. */ 2441 spin_lock_irqsave(&engine->active.lock, flags); 2442 2443 if (unlikely(ancestor_on_hold(engine, request))) { 2444 RQ_TRACE(request, "ancestor on hold\n"); 2445 list_add_tail(&request->sched.link, &engine->active.hold); 2446 i915_request_set_hold(request); 2447 } else { 2448 queue_request(engine, request); 2449 2450 GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); 2451 GEM_BUG_ON(list_empty(&request->sched.link)); 2452 2453 if (submit_queue(engine, request)) 2454 __execlists_kick(&engine->execlists); 2455 } 2456 2457 spin_unlock_irqrestore(&engine->active.lock, flags); 2458 } 2459 2460 static int 2461 __execlists_context_pre_pin(struct intel_context *ce, 2462 struct intel_engine_cs *engine, 2463 struct i915_gem_ww_ctx *ww, void **vaddr) 2464 { 2465 int err; 2466 2467 err = lrc_pre_pin(ce, engine, ww, vaddr); 2468 if (err) 2469 return err; 2470 2471 if (!__test_and_set_bit(CONTEXT_INIT_BIT, &ce->flags)) { 2472 lrc_init_state(ce, engine, *vaddr); 2473 2474 __i915_gem_object_flush_map(ce->state->obj, 0, engine->context_size); 2475 } 2476 2477 return 0; 2478 } 2479 2480 static int execlists_context_pre_pin(struct intel_context *ce, 2481 struct i915_gem_ww_ctx *ww, 2482 void **vaddr) 2483 { 2484 return __execlists_context_pre_pin(ce, ce->engine, ww, vaddr); 2485 } 2486 2487 static int execlists_context_pin(struct intel_context *ce, void *vaddr) 2488 { 2489 return lrc_pin(ce, ce->engine, vaddr); 2490 } 2491 2492 static int execlists_context_alloc(struct intel_context *ce) 2493 { 2494 return lrc_alloc(ce, ce->engine); 2495 } 2496 2497 static const struct intel_context_ops execlists_context_ops = { 2498 .flags = COPS_HAS_INFLIGHT, 2499 2500 .alloc = execlists_context_alloc, 2501 2502 .pre_pin = execlists_context_pre_pin, 2503 .pin = execlists_context_pin, 2504 .unpin = lrc_unpin, 2505 .post_unpin = lrc_post_unpin, 2506 2507 .enter = intel_context_enter_engine, 2508 .exit = intel_context_exit_engine, 2509 2510 .reset = lrc_reset, 2511 .destroy = lrc_destroy, 2512 }; 2513 2514 static int emit_pdps(struct i915_request *rq) 2515 { 2516 const struct intel_engine_cs * const engine = rq->engine; 2517 struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(rq->context->vm); 2518 int err, i; 2519 u32 *cs; 2520 2521 GEM_BUG_ON(intel_vgpu_active(rq->engine->i915)); 2522 2523 /* 2524 * Beware ye of the dragons, this sequence is magic! 2525 * 2526 * Small changes to this sequence can cause anything from 2527 * GPU hangs to forcewake errors and machine lockups! 2528 */ 2529 2530 cs = intel_ring_begin(rq, 2); 2531 if (IS_ERR(cs)) 2532 return PTR_ERR(cs); 2533 2534 *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; 2535 *cs++ = MI_NOOP; 2536 intel_ring_advance(rq, cs); 2537 2538 /* Flush any residual operations from the context load */ 2539 err = engine->emit_flush(rq, EMIT_FLUSH); 2540 if (err) 2541 return err; 2542 2543 /* Magic required to prevent forcewake errors! */ 2544 err = engine->emit_flush(rq, EMIT_INVALIDATE); 2545 if (err) 2546 return err; 2547 2548 cs = intel_ring_begin(rq, 4 * GEN8_3LVL_PDPES + 2); 2549 if (IS_ERR(cs)) 2550 return PTR_ERR(cs); 2551 2552 /* Ensure the LRI have landed before we invalidate & continue */ 2553 *cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED; 2554 for (i = GEN8_3LVL_PDPES; i--; ) { 2555 const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i); 2556 u32 base = engine->mmio_base; 2557 2558 *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, i)); 2559 *cs++ = upper_32_bits(pd_daddr); 2560 *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(base, i)); 2561 *cs++ = lower_32_bits(pd_daddr); 2562 } 2563 *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; 2564 intel_ring_advance(rq, cs); 2565 2566 intel_ring_advance(rq, cs); 2567 2568 return 0; 2569 } 2570 2571 static int execlists_request_alloc(struct i915_request *request) 2572 { 2573 int ret; 2574 2575 GEM_BUG_ON(!intel_context_is_pinned(request->context)); 2576 2577 /* 2578 * Flush enough space to reduce the likelihood of waiting after 2579 * we start building the request - in which case we will just 2580 * have to repeat work. 2581 */ 2582 request->reserved_space += EXECLISTS_REQUEST_SIZE; 2583 2584 /* 2585 * Note that after this point, we have committed to using 2586 * this request as it is being used to both track the 2587 * state of engine initialisation and liveness of the 2588 * golden renderstate above. Think twice before you try 2589 * to cancel/unwind this request now. 2590 */ 2591 2592 if (!i915_vm_is_4lvl(request->context->vm)) { 2593 ret = emit_pdps(request); 2594 if (ret) 2595 return ret; 2596 } 2597 2598 /* Unconditionally invalidate GPU caches and TLBs. */ 2599 ret = request->engine->emit_flush(request, EMIT_INVALIDATE); 2600 if (ret) 2601 return ret; 2602 2603 request->reserved_space -= EXECLISTS_REQUEST_SIZE; 2604 return 0; 2605 } 2606 2607 static void reset_csb_pointers(struct intel_engine_cs *engine) 2608 { 2609 struct intel_engine_execlists * const execlists = &engine->execlists; 2610 const unsigned int reset_value = execlists->csb_size - 1; 2611 2612 ring_set_paused(engine, 0); 2613 2614 /* 2615 * Sometimes Icelake forgets to reset its pointers on a GPU reset. 2616 * Bludgeon them with a mmio update to be sure. 2617 */ 2618 ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR, 2619 0xffff << 16 | reset_value << 8 | reset_value); 2620 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 2621 2622 /* 2623 * After a reset, the HW starts writing into CSB entry [0]. We 2624 * therefore have to set our HEAD pointer back one entry so that 2625 * the *first* entry we check is entry 0. To complicate this further, 2626 * as we don't wait for the first interrupt after reset, we have to 2627 * fake the HW write to point back to the last entry so that our 2628 * inline comparison of our cached head position against the last HW 2629 * write works even before the first interrupt. 2630 */ 2631 execlists->csb_head = reset_value; 2632 WRITE_ONCE(*execlists->csb_write, reset_value); 2633 wmb(); /* Make sure this is visible to HW (paranoia?) */ 2634 2635 /* Check that the GPU does indeed update the CSB entries! */ 2636 memset(execlists->csb_status, -1, (reset_value + 1) * sizeof(u64)); 2637 invalidate_csb_entries(&execlists->csb_status[0], 2638 &execlists->csb_status[reset_value]); 2639 2640 /* Once more for luck and our trusty paranoia */ 2641 ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR, 2642 0xffff << 16 | reset_value << 8 | reset_value); 2643 ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR); 2644 2645 GEM_BUG_ON(READ_ONCE(*execlists->csb_write) != reset_value); 2646 } 2647 2648 static void sanitize_hwsp(struct intel_engine_cs *engine) 2649 { 2650 struct intel_timeline *tl; 2651 2652 list_for_each_entry(tl, &engine->status_page.timelines, engine_link) 2653 intel_timeline_reset_seqno(tl); 2654 } 2655 2656 static void execlists_sanitize(struct intel_engine_cs *engine) 2657 { 2658 GEM_BUG_ON(execlists_active(&engine->execlists)); 2659 2660 /* 2661 * Poison residual state on resume, in case the suspend didn't! 2662 * 2663 * We have to assume that across suspend/resume (or other loss 2664 * of control) that the contents of our pinned buffers has been 2665 * lost, replaced by garbage. Since this doesn't always happen, 2666 * let's poison such state so that we more quickly spot when 2667 * we falsely assume it has been preserved. 2668 */ 2669 if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) 2670 memset(engine->status_page.addr, POISON_INUSE, PAGE_SIZE); 2671 2672 reset_csb_pointers(engine); 2673 2674 /* 2675 * The kernel_context HWSP is stored in the status_page. As above, 2676 * that may be lost on resume/initialisation, and so we need to 2677 * reset the value in the HWSP. 2678 */ 2679 sanitize_hwsp(engine); 2680 2681 /* And scrub the dirty cachelines for the HWSP */ 2682 clflush_cache_range(engine->status_page.addr, PAGE_SIZE); 2683 } 2684 2685 static void enable_error_interrupt(struct intel_engine_cs *engine) 2686 { 2687 u32 status; 2688 2689 engine->execlists.error_interrupt = 0; 2690 ENGINE_WRITE(engine, RING_EMR, ~0u); 2691 ENGINE_WRITE(engine, RING_EIR, ~0u); /* clear all existing errors */ 2692 2693 status = ENGINE_READ(engine, RING_ESR); 2694 if (unlikely(status)) { 2695 drm_err(&engine->i915->drm, 2696 "engine '%s' resumed still in error: %08x\n", 2697 engine->name, status); 2698 __intel_gt_reset(engine->gt, engine->mask); 2699 } 2700 2701 /* 2702 * On current gen8+, we have 2 signals to play with 2703 * 2704 * - I915_ERROR_INSTUCTION (bit 0) 2705 * 2706 * Generate an error if the command parser encounters an invalid 2707 * instruction 2708 * 2709 * This is a fatal error. 2710 * 2711 * - CP_PRIV (bit 2) 2712 * 2713 * Generate an error on privilege violation (where the CP replaces 2714 * the instruction with a no-op). This also fires for writes into 2715 * read-only scratch pages. 2716 * 2717 * This is a non-fatal error, parsing continues. 2718 * 2719 * * there are a few others defined for odd HW that we do not use 2720 * 2721 * Since CP_PRIV fires for cases where we have chosen to ignore the 2722 * error (as the HW is validating and suppressing the mistakes), we 2723 * only unmask the instruction error bit. 2724 */ 2725 ENGINE_WRITE(engine, RING_EMR, ~I915_ERROR_INSTRUCTION); 2726 } 2727 2728 static void enable_execlists(struct intel_engine_cs *engine) 2729 { 2730 u32 mode; 2731 2732 assert_forcewakes_active(engine->uncore, FORCEWAKE_ALL); 2733 2734 intel_engine_set_hwsp_writemask(engine, ~0u); /* HWSTAM */ 2735 2736 if (INTEL_GEN(engine->i915) >= 11) 2737 mode = _MASKED_BIT_ENABLE(GEN11_GFX_DISABLE_LEGACY_MODE); 2738 else 2739 mode = _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE); 2740 ENGINE_WRITE_FW(engine, RING_MODE_GEN7, mode); 2741 2742 ENGINE_WRITE_FW(engine, RING_MI_MODE, _MASKED_BIT_DISABLE(STOP_RING)); 2743 2744 ENGINE_WRITE_FW(engine, 2745 RING_HWS_PGA, 2746 i915_ggtt_offset(engine->status_page.vma)); 2747 ENGINE_POSTING_READ(engine, RING_HWS_PGA); 2748 2749 enable_error_interrupt(engine); 2750 } 2751 2752 static int execlists_resume(struct intel_engine_cs *engine) 2753 { 2754 intel_mocs_init_engine(engine); 2755 intel_breadcrumbs_reset(engine->breadcrumbs); 2756 2757 enable_execlists(engine); 2758 2759 return 0; 2760 } 2761 2762 static void execlists_reset_prepare(struct intel_engine_cs *engine) 2763 { 2764 struct intel_engine_execlists * const execlists = &engine->execlists; 2765 2766 ENGINE_TRACE(engine, "depth<-%d\n", 2767 atomic_read(&execlists->tasklet.count)); 2768 2769 /* 2770 * Prevent request submission to the hardware until we have 2771 * completed the reset in i915_gem_reset_finish(). If a request 2772 * is completed by one engine, it may then queue a request 2773 * to a second via its execlists->tasklet *just* as we are 2774 * calling engine->resume() and also writing the ELSP. 2775 * Turning off the execlists->tasklet until the reset is over 2776 * prevents the race. 2777 */ 2778 __tasklet_disable_sync_once(&execlists->tasklet); 2779 GEM_BUG_ON(!reset_in_progress(execlists)); 2780 2781 /* 2782 * We stop engines, otherwise we might get failed reset and a 2783 * dead gpu (on elk). Also as modern gpu as kbl can suffer 2784 * from system hang if batchbuffer is progressing when 2785 * the reset is issued, regardless of READY_TO_RESET ack. 2786 * Thus assume it is best to stop engines on all gens 2787 * where we have a gpu reset. 2788 * 2789 * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES) 2790 * 2791 * FIXME: Wa for more modern gens needs to be validated 2792 */ 2793 ring_set_paused(engine, 1); 2794 intel_engine_stop_cs(engine); 2795 2796 engine->execlists.reset_ccid = active_ccid(engine); 2797 } 2798 2799 static struct i915_request ** 2800 reset_csb(struct intel_engine_cs *engine, struct i915_request **inactive) 2801 { 2802 struct intel_engine_execlists * const execlists = &engine->execlists; 2803 2804 mb(); /* paranoia: read the CSB pointers from after the reset */ 2805 clflush(execlists->csb_write); 2806 mb(); 2807 2808 inactive = process_csb(engine, inactive); /* drain preemption events */ 2809 2810 /* Following the reset, we need to reload the CSB read/write pointers */ 2811 reset_csb_pointers(engine); 2812 2813 return inactive; 2814 } 2815 2816 static void 2817 execlists_reset_active(struct intel_engine_cs *engine, bool stalled) 2818 { 2819 struct intel_context *ce; 2820 struct i915_request *rq; 2821 u32 head; 2822 2823 /* 2824 * Save the currently executing context, even if we completed 2825 * its request, it was still running at the time of the 2826 * reset and will have been clobbered. 2827 */ 2828 rq = active_context(engine, engine->execlists.reset_ccid); 2829 if (!rq) 2830 return; 2831 2832 ce = rq->context; 2833 GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); 2834 2835 if (__i915_request_is_complete(rq)) { 2836 /* Idle context; tidy up the ring so we can restart afresh */ 2837 head = intel_ring_wrap(ce->ring, rq->tail); 2838 goto out_replay; 2839 } 2840 2841 /* We still have requests in-flight; the engine should be active */ 2842 GEM_BUG_ON(!intel_engine_pm_is_awake(engine)); 2843 2844 /* Context has requests still in-flight; it should not be idle! */ 2845 GEM_BUG_ON(i915_active_is_idle(&ce->active)); 2846 2847 rq = active_request(ce->timeline, rq); 2848 head = intel_ring_wrap(ce->ring, rq->head); 2849 GEM_BUG_ON(head == ce->ring->tail); 2850 2851 /* 2852 * If this request hasn't started yet, e.g. it is waiting on a 2853 * semaphore, we need to avoid skipping the request or else we 2854 * break the signaling chain. However, if the context is corrupt 2855 * the request will not restart and we will be stuck with a wedged 2856 * device. It is quite often the case that if we issue a reset 2857 * while the GPU is loading the context image, that the context 2858 * image becomes corrupt. 2859 * 2860 * Otherwise, if we have not started yet, the request should replay 2861 * perfectly and we do not need to flag the result as being erroneous. 2862 */ 2863 if (!__i915_request_has_started(rq)) 2864 goto out_replay; 2865 2866 /* 2867 * If the request was innocent, we leave the request in the ELSP 2868 * and will try to replay it on restarting. The context image may 2869 * have been corrupted by the reset, in which case we may have 2870 * to service a new GPU hang, but more likely we can continue on 2871 * without impact. 2872 * 2873 * If the request was guilty, we presume the context is corrupt 2874 * and have to at least restore the RING register in the context 2875 * image back to the expected values to skip over the guilty request. 2876 */ 2877 __i915_request_reset(rq, stalled); 2878 2879 /* 2880 * We want a simple context + ring to execute the breadcrumb update. 2881 * We cannot rely on the context being intact across the GPU hang, 2882 * so clear it and rebuild just what we need for the breadcrumb. 2883 * All pending requests for this context will be zapped, and any 2884 * future request will be after userspace has had the opportunity 2885 * to recreate its own state. 2886 */ 2887 out_replay: 2888 ENGINE_TRACE(engine, "replay {head:%04x, tail:%04x}\n", 2889 head, ce->ring->tail); 2890 lrc_reset_regs(ce, engine); 2891 ce->lrc.lrca = lrc_update_regs(ce, engine, head); 2892 } 2893 2894 static void execlists_reset_csb(struct intel_engine_cs *engine, bool stalled) 2895 { 2896 struct intel_engine_execlists * const execlists = &engine->execlists; 2897 struct i915_request *post[2 * EXECLIST_MAX_PORTS]; 2898 struct i915_request **inactive; 2899 2900 rcu_read_lock(); 2901 inactive = reset_csb(engine, post); 2902 2903 execlists_reset_active(engine, true); 2904 2905 inactive = cancel_port_requests(execlists, inactive); 2906 post_process_csb(post, inactive); 2907 rcu_read_unlock(); 2908 } 2909 2910 static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) 2911 { 2912 unsigned long flags; 2913 2914 ENGINE_TRACE(engine, "\n"); 2915 2916 /* Process the csb, find the guilty context and throw away */ 2917 execlists_reset_csb(engine, stalled); 2918 2919 /* Push back any incomplete requests for replay after the reset. */ 2920 rcu_read_lock(); 2921 spin_lock_irqsave(&engine->active.lock, flags); 2922 __unwind_incomplete_requests(engine); 2923 spin_unlock_irqrestore(&engine->active.lock, flags); 2924 rcu_read_unlock(); 2925 } 2926 2927 static void nop_submission_tasklet(struct tasklet_struct *t) 2928 { 2929 struct intel_engine_cs * const engine = 2930 from_tasklet(engine, t, execlists.tasklet); 2931 2932 /* The driver is wedged; don't process any more events. */ 2933 WRITE_ONCE(engine->execlists.queue_priority_hint, INT_MIN); 2934 } 2935 2936 static void execlists_reset_cancel(struct intel_engine_cs *engine) 2937 { 2938 struct intel_engine_execlists * const execlists = &engine->execlists; 2939 struct i915_request *rq, *rn; 2940 struct rb_node *rb; 2941 unsigned long flags; 2942 2943 ENGINE_TRACE(engine, "\n"); 2944 2945 /* 2946 * Before we call engine->cancel_requests(), we should have exclusive 2947 * access to the submission state. This is arranged for us by the 2948 * caller disabling the interrupt generation, the tasklet and other 2949 * threads that may then access the same state, giving us a free hand 2950 * to reset state. However, we still need to let lockdep be aware that 2951 * we know this state may be accessed in hardirq context, so we 2952 * disable the irq around this manipulation and we want to keep 2953 * the spinlock focused on its duties and not accidentally conflate 2954 * coverage to the submission's irq state. (Similarly, although we 2955 * shouldn't need to disable irq around the manipulation of the 2956 * submission's irq state, we also wish to remind ourselves that 2957 * it is irq state.) 2958 */ 2959 execlists_reset_csb(engine, true); 2960 2961 rcu_read_lock(); 2962 spin_lock_irqsave(&engine->active.lock, flags); 2963 2964 /* Mark all executing requests as skipped. */ 2965 list_for_each_entry(rq, &engine->active.requests, sched.link) 2966 i915_request_put(i915_request_mark_eio(rq)); 2967 intel_engine_signal_breadcrumbs(engine); 2968 2969 /* Flush the queued requests to the timeline list (for retiring). */ 2970 while ((rb = rb_first_cached(&execlists->queue))) { 2971 struct i915_priolist *p = to_priolist(rb); 2972 2973 priolist_for_each_request_consume(rq, rn, p) { 2974 if (i915_request_mark_eio(rq)) { 2975 __i915_request_submit(rq); 2976 i915_request_put(rq); 2977 } 2978 } 2979 2980 rb_erase_cached(&p->node, &execlists->queue); 2981 i915_priolist_free(p); 2982 } 2983 2984 /* On-hold requests will be flushed to timeline upon their release */ 2985 list_for_each_entry(rq, &engine->active.hold, sched.link) 2986 i915_request_put(i915_request_mark_eio(rq)); 2987 2988 /* Cancel all attached virtual engines */ 2989 while ((rb = rb_first_cached(&execlists->virtual))) { 2990 struct virtual_engine *ve = 2991 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 2992 2993 rb_erase_cached(rb, &execlists->virtual); 2994 RB_CLEAR_NODE(rb); 2995 2996 spin_lock(&ve->base.active.lock); 2997 rq = fetch_and_zero(&ve->request); 2998 if (rq) { 2999 if (i915_request_mark_eio(rq)) { 3000 rq->engine = engine; 3001 __i915_request_submit(rq); 3002 i915_request_put(rq); 3003 } 3004 i915_request_put(rq); 3005 3006 ve->base.execlists.queue_priority_hint = INT_MIN; 3007 } 3008 spin_unlock(&ve->base.active.lock); 3009 } 3010 3011 /* Remaining _unready_ requests will be nop'ed when submitted */ 3012 3013 execlists->queue_priority_hint = INT_MIN; 3014 execlists->queue = RB_ROOT_CACHED; 3015 3016 GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet)); 3017 execlists->tasklet.callback = nop_submission_tasklet; 3018 3019 spin_unlock_irqrestore(&engine->active.lock, flags); 3020 rcu_read_unlock(); 3021 } 3022 3023 static void execlists_reset_finish(struct intel_engine_cs *engine) 3024 { 3025 struct intel_engine_execlists * const execlists = &engine->execlists; 3026 3027 /* 3028 * After a GPU reset, we may have requests to replay. Do so now while 3029 * we still have the forcewake to be sure that the GPU is not allowed 3030 * to sleep before we restart and reload a context. 3031 * 3032 * If the GPU reset fails, the engine may still be alive with requests 3033 * inflight. We expect those to complete, or for the device to be 3034 * reset as the next level of recovery, and as a final resort we 3035 * will declare the device wedged. 3036 */ 3037 GEM_BUG_ON(!reset_in_progress(execlists)); 3038 3039 /* And kick in case we missed a new request submission. */ 3040 if (__tasklet_enable(&execlists->tasklet)) 3041 __execlists_kick(execlists); 3042 3043 ENGINE_TRACE(engine, "depth->%d\n", 3044 atomic_read(&execlists->tasklet.count)); 3045 } 3046 3047 static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine) 3048 { 3049 ENGINE_WRITE(engine, RING_IMR, 3050 ~(engine->irq_enable_mask | engine->irq_keep_mask)); 3051 ENGINE_POSTING_READ(engine, RING_IMR); 3052 } 3053 3054 static void gen8_logical_ring_disable_irq(struct intel_engine_cs *engine) 3055 { 3056 ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask); 3057 } 3058 3059 static void execlists_park(struct intel_engine_cs *engine) 3060 { 3061 cancel_timer(&engine->execlists.timer); 3062 cancel_timer(&engine->execlists.preempt); 3063 } 3064 3065 static bool can_preempt(struct intel_engine_cs *engine) 3066 { 3067 if (INTEL_GEN(engine->i915) > 8) 3068 return true; 3069 3070 /* GPGPU on bdw requires extra w/a; not implemented */ 3071 return engine->class != RENDER_CLASS; 3072 } 3073 3074 static void execlists_set_default_submission(struct intel_engine_cs *engine) 3075 { 3076 engine->submit_request = execlists_submit_request; 3077 engine->schedule = i915_schedule; 3078 engine->execlists.tasklet.callback = execlists_submission_tasklet; 3079 3080 engine->reset.prepare = execlists_reset_prepare; 3081 engine->reset.rewind = execlists_reset_rewind; 3082 engine->reset.cancel = execlists_reset_cancel; 3083 engine->reset.finish = execlists_reset_finish; 3084 3085 engine->park = execlists_park; 3086 engine->unpark = NULL; 3087 3088 engine->flags |= I915_ENGINE_SUPPORTS_STATS; 3089 if (!intel_vgpu_active(engine->i915)) { 3090 engine->flags |= I915_ENGINE_HAS_SEMAPHORES; 3091 if (can_preempt(engine)) { 3092 engine->flags |= I915_ENGINE_HAS_PREEMPTION; 3093 if (IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION)) 3094 engine->flags |= I915_ENGINE_HAS_TIMESLICES; 3095 } 3096 } 3097 3098 if (intel_engine_has_preemption(engine)) 3099 engine->emit_bb_start = gen8_emit_bb_start; 3100 else 3101 engine->emit_bb_start = gen8_emit_bb_start_noarb; 3102 } 3103 3104 static void execlists_shutdown(struct intel_engine_cs *engine) 3105 { 3106 /* Synchronise with residual timers and any softirq they raise */ 3107 del_timer_sync(&engine->execlists.timer); 3108 del_timer_sync(&engine->execlists.preempt); 3109 tasklet_kill(&engine->execlists.tasklet); 3110 } 3111 3112 static void execlists_release(struct intel_engine_cs *engine) 3113 { 3114 engine->sanitize = NULL; /* no longer in control, nothing to sanitize */ 3115 3116 execlists_shutdown(engine); 3117 3118 intel_engine_cleanup_common(engine); 3119 lrc_fini_wa_ctx(engine); 3120 } 3121 3122 static void 3123 logical_ring_default_vfuncs(struct intel_engine_cs *engine) 3124 { 3125 /* Default vfuncs which can be overridden by each engine. */ 3126 3127 engine->resume = execlists_resume; 3128 3129 engine->cops = &execlists_context_ops; 3130 engine->request_alloc = execlists_request_alloc; 3131 3132 engine->emit_flush = gen8_emit_flush_xcs; 3133 engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb; 3134 engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_xcs; 3135 if (INTEL_GEN(engine->i915) >= 12) { 3136 engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_xcs; 3137 engine->emit_flush = gen12_emit_flush_xcs; 3138 } 3139 engine->set_default_submission = execlists_set_default_submission; 3140 3141 if (INTEL_GEN(engine->i915) < 11) { 3142 engine->irq_enable = gen8_logical_ring_enable_irq; 3143 engine->irq_disable = gen8_logical_ring_disable_irq; 3144 } else { 3145 /* 3146 * TODO: On Gen11 interrupt masks need to be clear 3147 * to allow C6 entry. Keep interrupts enabled at 3148 * and take the hit of generating extra interrupts 3149 * until a more refined solution exists. 3150 */ 3151 } 3152 } 3153 3154 static void logical_ring_default_irqs(struct intel_engine_cs *engine) 3155 { 3156 unsigned int shift = 0; 3157 3158 if (INTEL_GEN(engine->i915) < 11) { 3159 const u8 irq_shifts[] = { 3160 [RCS0] = GEN8_RCS_IRQ_SHIFT, 3161 [BCS0] = GEN8_BCS_IRQ_SHIFT, 3162 [VCS0] = GEN8_VCS0_IRQ_SHIFT, 3163 [VCS1] = GEN8_VCS1_IRQ_SHIFT, 3164 [VECS0] = GEN8_VECS_IRQ_SHIFT, 3165 }; 3166 3167 shift = irq_shifts[engine->id]; 3168 } 3169 3170 engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << shift; 3171 engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift; 3172 engine->irq_keep_mask |= GT_CS_MASTER_ERROR_INTERRUPT << shift; 3173 engine->irq_keep_mask |= GT_WAIT_SEMAPHORE_INTERRUPT << shift; 3174 } 3175 3176 static void rcs_submission_override(struct intel_engine_cs *engine) 3177 { 3178 switch (INTEL_GEN(engine->i915)) { 3179 case 12: 3180 engine->emit_flush = gen12_emit_flush_rcs; 3181 engine->emit_fini_breadcrumb = gen12_emit_fini_breadcrumb_rcs; 3182 break; 3183 case 11: 3184 engine->emit_flush = gen11_emit_flush_rcs; 3185 engine->emit_fini_breadcrumb = gen11_emit_fini_breadcrumb_rcs; 3186 break; 3187 default: 3188 engine->emit_flush = gen8_emit_flush_rcs; 3189 engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs; 3190 break; 3191 } 3192 } 3193 3194 int intel_execlists_submission_setup(struct intel_engine_cs *engine) 3195 { 3196 struct intel_engine_execlists * const execlists = &engine->execlists; 3197 struct drm_i915_private *i915 = engine->i915; 3198 struct intel_uncore *uncore = engine->uncore; 3199 u32 base = engine->mmio_base; 3200 3201 tasklet_setup(&engine->execlists.tasklet, execlists_submission_tasklet); 3202 timer_setup(&engine->execlists.timer, execlists_timeslice, 0); 3203 timer_setup(&engine->execlists.preempt, execlists_preempt, 0); 3204 3205 logical_ring_default_vfuncs(engine); 3206 logical_ring_default_irqs(engine); 3207 3208 if (engine->class == RENDER_CLASS) 3209 rcs_submission_override(engine); 3210 3211 lrc_init_wa_ctx(engine); 3212 3213 if (HAS_LOGICAL_RING_ELSQ(i915)) { 3214 execlists->submit_reg = uncore->regs + 3215 i915_mmio_reg_offset(RING_EXECLIST_SQ_CONTENTS(base)); 3216 execlists->ctrl_reg = uncore->regs + 3217 i915_mmio_reg_offset(RING_EXECLIST_CONTROL(base)); 3218 } else { 3219 execlists->submit_reg = uncore->regs + 3220 i915_mmio_reg_offset(RING_ELSP(base)); 3221 } 3222 3223 execlists->csb_status = 3224 (u64 *)&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX]; 3225 3226 execlists->csb_write = 3227 &engine->status_page.addr[intel_hws_csb_write_index(i915)]; 3228 3229 if (INTEL_GEN(i915) < 11) 3230 execlists->csb_size = GEN8_CSB_ENTRIES; 3231 else 3232 execlists->csb_size = GEN11_CSB_ENTRIES; 3233 3234 engine->context_tag = GENMASK(BITS_PER_LONG - 2, 0); 3235 if (INTEL_GEN(engine->i915) >= 11) { 3236 execlists->ccid |= engine->instance << (GEN11_ENGINE_INSTANCE_SHIFT - 32); 3237 execlists->ccid |= engine->class << (GEN11_ENGINE_CLASS_SHIFT - 32); 3238 } 3239 3240 /* Finally, take ownership and responsibility for cleanup! */ 3241 engine->sanitize = execlists_sanitize; 3242 engine->release = execlists_release; 3243 3244 return 0; 3245 } 3246 3247 static struct list_head *virtual_queue(struct virtual_engine *ve) 3248 { 3249 return &ve->base.execlists.default_priolist.requests; 3250 } 3251 3252 static void rcu_virtual_context_destroy(struct work_struct *wrk) 3253 { 3254 struct virtual_engine *ve = 3255 container_of(wrk, typeof(*ve), rcu.work); 3256 unsigned int n; 3257 3258 GEM_BUG_ON(ve->context.inflight); 3259 3260 /* Preempt-to-busy may leave a stale request behind. */ 3261 if (unlikely(ve->request)) { 3262 struct i915_request *old; 3263 3264 spin_lock_irq(&ve->base.active.lock); 3265 3266 old = fetch_and_zero(&ve->request); 3267 if (old) { 3268 GEM_BUG_ON(!__i915_request_is_complete(old)); 3269 __i915_request_submit(old); 3270 i915_request_put(old); 3271 } 3272 3273 spin_unlock_irq(&ve->base.active.lock); 3274 } 3275 3276 /* 3277 * Flush the tasklet in case it is still running on another core. 3278 * 3279 * This needs to be done before we remove ourselves from the siblings' 3280 * rbtrees as in the case it is running in parallel, it may reinsert 3281 * the rb_node into a sibling. 3282 */ 3283 tasklet_kill(&ve->base.execlists.tasklet); 3284 3285 /* Decouple ourselves from the siblings, no more access allowed. */ 3286 for (n = 0; n < ve->num_siblings; n++) { 3287 struct intel_engine_cs *sibling = ve->siblings[n]; 3288 struct rb_node *node = &ve->nodes[sibling->id].rb; 3289 3290 if (RB_EMPTY_NODE(node)) 3291 continue; 3292 3293 spin_lock_irq(&sibling->active.lock); 3294 3295 /* Detachment is lazily performed in the execlists tasklet */ 3296 if (!RB_EMPTY_NODE(node)) 3297 rb_erase_cached(node, &sibling->execlists.virtual); 3298 3299 spin_unlock_irq(&sibling->active.lock); 3300 } 3301 GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet)); 3302 GEM_BUG_ON(!list_empty(virtual_queue(ve))); 3303 3304 lrc_fini(&ve->context); 3305 intel_context_fini(&ve->context); 3306 3307 intel_breadcrumbs_free(ve->base.breadcrumbs); 3308 intel_engine_free_request_pool(&ve->base); 3309 3310 kfree(ve->bonds); 3311 kfree(ve); 3312 } 3313 3314 static void virtual_context_destroy(struct kref *kref) 3315 { 3316 struct virtual_engine *ve = 3317 container_of(kref, typeof(*ve), context.ref); 3318 3319 GEM_BUG_ON(!list_empty(&ve->context.signals)); 3320 3321 /* 3322 * When destroying the virtual engine, we have to be aware that 3323 * it may still be in use from an hardirq/softirq context causing 3324 * the resubmission of a completed request (background completion 3325 * due to preempt-to-busy). Before we can free the engine, we need 3326 * to flush the submission code and tasklets that are still potentially 3327 * accessing the engine. Flushing the tasklets requires process context, 3328 * and since we can guard the resubmit onto the engine with an RCU read 3329 * lock, we can delegate the free of the engine to an RCU worker. 3330 */ 3331 INIT_RCU_WORK(&ve->rcu, rcu_virtual_context_destroy); 3332 queue_rcu_work(system_wq, &ve->rcu); 3333 } 3334 3335 static void virtual_engine_initial_hint(struct virtual_engine *ve) 3336 { 3337 int swp; 3338 3339 /* 3340 * Pick a random sibling on starting to help spread the load around. 3341 * 3342 * New contexts are typically created with exactly the same order 3343 * of siblings, and often started in batches. Due to the way we iterate 3344 * the array of sibling when submitting requests, sibling[0] is 3345 * prioritised for dequeuing. If we make sure that sibling[0] is fairly 3346 * randomised across the system, we also help spread the load by the 3347 * first engine we inspect being different each time. 3348 * 3349 * NB This does not force us to execute on this engine, it will just 3350 * typically be the first we inspect for submission. 3351 */ 3352 swp = prandom_u32_max(ve->num_siblings); 3353 if (swp) 3354 swap(ve->siblings[swp], ve->siblings[0]); 3355 } 3356 3357 static int virtual_context_alloc(struct intel_context *ce) 3358 { 3359 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3360 3361 return lrc_alloc(ce, ve->siblings[0]); 3362 } 3363 3364 static int virtual_context_pre_pin(struct intel_context *ce, 3365 struct i915_gem_ww_ctx *ww, 3366 void **vaddr) 3367 { 3368 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3369 3370 /* Note: we must use a real engine class for setting up reg state */ 3371 return __execlists_context_pre_pin(ce, ve->siblings[0], ww, vaddr); 3372 } 3373 3374 static int virtual_context_pin(struct intel_context *ce, void *vaddr) 3375 { 3376 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3377 3378 return lrc_pin(ce, ve->siblings[0], vaddr); 3379 } 3380 3381 static void virtual_context_enter(struct intel_context *ce) 3382 { 3383 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3384 unsigned int n; 3385 3386 for (n = 0; n < ve->num_siblings; n++) 3387 intel_engine_pm_get(ve->siblings[n]); 3388 3389 intel_timeline_enter(ce->timeline); 3390 } 3391 3392 static void virtual_context_exit(struct intel_context *ce) 3393 { 3394 struct virtual_engine *ve = container_of(ce, typeof(*ve), context); 3395 unsigned int n; 3396 3397 intel_timeline_exit(ce->timeline); 3398 3399 for (n = 0; n < ve->num_siblings; n++) 3400 intel_engine_pm_put(ve->siblings[n]); 3401 } 3402 3403 static const struct intel_context_ops virtual_context_ops = { 3404 .flags = COPS_HAS_INFLIGHT, 3405 3406 .alloc = virtual_context_alloc, 3407 3408 .pre_pin = virtual_context_pre_pin, 3409 .pin = virtual_context_pin, 3410 .unpin = lrc_unpin, 3411 .post_unpin = lrc_post_unpin, 3412 3413 .enter = virtual_context_enter, 3414 .exit = virtual_context_exit, 3415 3416 .destroy = virtual_context_destroy, 3417 }; 3418 3419 static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) 3420 { 3421 struct i915_request *rq; 3422 intel_engine_mask_t mask; 3423 3424 rq = READ_ONCE(ve->request); 3425 if (!rq) 3426 return 0; 3427 3428 /* The rq is ready for submission; rq->execution_mask is now stable. */ 3429 mask = rq->execution_mask; 3430 if (unlikely(!mask)) { 3431 /* Invalid selection, submit to a random engine in error */ 3432 i915_request_set_error_once(rq, -ENODEV); 3433 mask = ve->siblings[0]->mask; 3434 } 3435 3436 ENGINE_TRACE(&ve->base, "rq=%llx:%lld, mask=%x, prio=%d\n", 3437 rq->fence.context, rq->fence.seqno, 3438 mask, ve->base.execlists.queue_priority_hint); 3439 3440 return mask; 3441 } 3442 3443 static void virtual_submission_tasklet(struct tasklet_struct *t) 3444 { 3445 struct virtual_engine * const ve = 3446 from_tasklet(ve, t, base.execlists.tasklet); 3447 const int prio = READ_ONCE(ve->base.execlists.queue_priority_hint); 3448 intel_engine_mask_t mask; 3449 unsigned int n; 3450 3451 rcu_read_lock(); 3452 mask = virtual_submission_mask(ve); 3453 rcu_read_unlock(); 3454 if (unlikely(!mask)) 3455 return; 3456 3457 for (n = 0; n < ve->num_siblings; n++) { 3458 struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]); 3459 struct ve_node * const node = &ve->nodes[sibling->id]; 3460 struct rb_node **parent, *rb; 3461 bool first; 3462 3463 if (!READ_ONCE(ve->request)) 3464 break; /* already handled by a sibling's tasklet */ 3465 3466 spin_lock_irq(&sibling->active.lock); 3467 3468 if (unlikely(!(mask & sibling->mask))) { 3469 if (!RB_EMPTY_NODE(&node->rb)) { 3470 rb_erase_cached(&node->rb, 3471 &sibling->execlists.virtual); 3472 RB_CLEAR_NODE(&node->rb); 3473 } 3474 3475 goto unlock_engine; 3476 } 3477 3478 if (unlikely(!RB_EMPTY_NODE(&node->rb))) { 3479 /* 3480 * Cheat and avoid rebalancing the tree if we can 3481 * reuse this node in situ. 3482 */ 3483 first = rb_first_cached(&sibling->execlists.virtual) == 3484 &node->rb; 3485 if (prio == node->prio || (prio > node->prio && first)) 3486 goto submit_engine; 3487 3488 rb_erase_cached(&node->rb, &sibling->execlists.virtual); 3489 } 3490 3491 rb = NULL; 3492 first = true; 3493 parent = &sibling->execlists.virtual.rb_root.rb_node; 3494 while (*parent) { 3495 struct ve_node *other; 3496 3497 rb = *parent; 3498 other = rb_entry(rb, typeof(*other), rb); 3499 if (prio > other->prio) { 3500 parent = &rb->rb_left; 3501 } else { 3502 parent = &rb->rb_right; 3503 first = false; 3504 } 3505 } 3506 3507 rb_link_node(&node->rb, rb, parent); 3508 rb_insert_color_cached(&node->rb, 3509 &sibling->execlists.virtual, 3510 first); 3511 3512 submit_engine: 3513 GEM_BUG_ON(RB_EMPTY_NODE(&node->rb)); 3514 node->prio = prio; 3515 if (first && prio > sibling->execlists.queue_priority_hint) 3516 tasklet_hi_schedule(&sibling->execlists.tasklet); 3517 3518 unlock_engine: 3519 spin_unlock_irq(&sibling->active.lock); 3520 3521 if (intel_context_inflight(&ve->context)) 3522 break; 3523 } 3524 } 3525 3526 static void virtual_submit_request(struct i915_request *rq) 3527 { 3528 struct virtual_engine *ve = to_virtual_engine(rq->engine); 3529 unsigned long flags; 3530 3531 ENGINE_TRACE(&ve->base, "rq=%llx:%lld\n", 3532 rq->fence.context, 3533 rq->fence.seqno); 3534 3535 GEM_BUG_ON(ve->base.submit_request != virtual_submit_request); 3536 3537 spin_lock_irqsave(&ve->base.active.lock, flags); 3538 3539 /* By the time we resubmit a request, it may be completed */ 3540 if (__i915_request_is_complete(rq)) { 3541 __i915_request_submit(rq); 3542 goto unlock; 3543 } 3544 3545 if (ve->request) { /* background completion from preempt-to-busy */ 3546 GEM_BUG_ON(!__i915_request_is_complete(ve->request)); 3547 __i915_request_submit(ve->request); 3548 i915_request_put(ve->request); 3549 } 3550 3551 ve->base.execlists.queue_priority_hint = rq_prio(rq); 3552 ve->request = i915_request_get(rq); 3553 3554 GEM_BUG_ON(!list_empty(virtual_queue(ve))); 3555 list_move_tail(&rq->sched.link, virtual_queue(ve)); 3556 3557 tasklet_hi_schedule(&ve->base.execlists.tasklet); 3558 3559 unlock: 3560 spin_unlock_irqrestore(&ve->base.active.lock, flags); 3561 } 3562 3563 static struct ve_bond * 3564 virtual_find_bond(struct virtual_engine *ve, 3565 const struct intel_engine_cs *master) 3566 { 3567 int i; 3568 3569 for (i = 0; i < ve->num_bonds; i++) { 3570 if (ve->bonds[i].master == master) 3571 return &ve->bonds[i]; 3572 } 3573 3574 return NULL; 3575 } 3576 3577 static void 3578 virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) 3579 { 3580 struct virtual_engine *ve = to_virtual_engine(rq->engine); 3581 intel_engine_mask_t allowed, exec; 3582 struct ve_bond *bond; 3583 3584 allowed = ~to_request(signal)->engine->mask; 3585 3586 bond = virtual_find_bond(ve, to_request(signal)->engine); 3587 if (bond) 3588 allowed &= bond->sibling_mask; 3589 3590 /* Restrict the bonded request to run on only the available engines */ 3591 exec = READ_ONCE(rq->execution_mask); 3592 while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed)) 3593 ; 3594 3595 /* Prevent the master from being re-run on the bonded engines */ 3596 to_request(signal)->execution_mask &= ~allowed; 3597 } 3598 3599 struct intel_context * 3600 intel_execlists_create_virtual(struct intel_engine_cs **siblings, 3601 unsigned int count) 3602 { 3603 struct virtual_engine *ve; 3604 unsigned int n; 3605 int err; 3606 3607 if (count == 0) 3608 return ERR_PTR(-EINVAL); 3609 3610 if (count == 1) 3611 return intel_context_create(siblings[0]); 3612 3613 ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL); 3614 if (!ve) 3615 return ERR_PTR(-ENOMEM); 3616 3617 ve->base.i915 = siblings[0]->i915; 3618 ve->base.gt = siblings[0]->gt; 3619 ve->base.uncore = siblings[0]->uncore; 3620 ve->base.id = -1; 3621 3622 ve->base.class = OTHER_CLASS; 3623 ve->base.uabi_class = I915_ENGINE_CLASS_INVALID; 3624 ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; 3625 ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; 3626 3627 /* 3628 * The decision on whether to submit a request using semaphores 3629 * depends on the saturated state of the engine. We only compute 3630 * this during HW submission of the request, and we need for this 3631 * state to be globally applied to all requests being submitted 3632 * to this engine. Virtual engines encompass more than one physical 3633 * engine and so we cannot accurately tell in advance if one of those 3634 * engines is already saturated and so cannot afford to use a semaphore 3635 * and be pessimized in priority for doing so -- if we are the only 3636 * context using semaphores after all other clients have stopped, we 3637 * will be starved on the saturated system. Such a global switch for 3638 * semaphores is less than ideal, but alas is the current compromise. 3639 */ 3640 ve->base.saturated = ALL_ENGINES; 3641 3642 snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); 3643 3644 intel_engine_init_active(&ve->base, ENGINE_VIRTUAL); 3645 intel_engine_init_execlists(&ve->base); 3646 3647 ve->base.cops = &virtual_context_ops; 3648 ve->base.request_alloc = execlists_request_alloc; 3649 3650 ve->base.schedule = i915_schedule; 3651 ve->base.submit_request = virtual_submit_request; 3652 ve->base.bond_execute = virtual_bond_execute; 3653 3654 INIT_LIST_HEAD(virtual_queue(ve)); 3655 ve->base.execlists.queue_priority_hint = INT_MIN; 3656 tasklet_setup(&ve->base.execlists.tasklet, virtual_submission_tasklet); 3657 3658 intel_context_init(&ve->context, &ve->base); 3659 3660 ve->base.breadcrumbs = intel_breadcrumbs_create(NULL); 3661 if (!ve->base.breadcrumbs) { 3662 err = -ENOMEM; 3663 goto err_put; 3664 } 3665 3666 for (n = 0; n < count; n++) { 3667 struct intel_engine_cs *sibling = siblings[n]; 3668 3669 GEM_BUG_ON(!is_power_of_2(sibling->mask)); 3670 if (sibling->mask & ve->base.mask) { 3671 DRM_DEBUG("duplicate %s entry in load balancer\n", 3672 sibling->name); 3673 err = -EINVAL; 3674 goto err_put; 3675 } 3676 3677 /* 3678 * The virtual engine implementation is tightly coupled to 3679 * the execlists backend -- we push out request directly 3680 * into a tree inside each physical engine. We could support 3681 * layering if we handle cloning of the requests and 3682 * submitting a copy into each backend. 3683 */ 3684 if (sibling->execlists.tasklet.callback != 3685 execlists_submission_tasklet) { 3686 err = -ENODEV; 3687 goto err_put; 3688 } 3689 3690 GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb)); 3691 RB_CLEAR_NODE(&ve->nodes[sibling->id].rb); 3692 3693 ve->siblings[ve->num_siblings++] = sibling; 3694 ve->base.mask |= sibling->mask; 3695 3696 /* 3697 * All physical engines must be compatible for their emission 3698 * functions (as we build the instructions during request 3699 * construction and do not alter them before submission 3700 * on the physical engine). We use the engine class as a guide 3701 * here, although that could be refined. 3702 */ 3703 if (ve->base.class != OTHER_CLASS) { 3704 if (ve->base.class != sibling->class) { 3705 DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n", 3706 sibling->class, ve->base.class); 3707 err = -EINVAL; 3708 goto err_put; 3709 } 3710 continue; 3711 } 3712 3713 ve->base.class = sibling->class; 3714 ve->base.uabi_class = sibling->uabi_class; 3715 snprintf(ve->base.name, sizeof(ve->base.name), 3716 "v%dx%d", ve->base.class, count); 3717 ve->base.context_size = sibling->context_size; 3718 3719 ve->base.emit_bb_start = sibling->emit_bb_start; 3720 ve->base.emit_flush = sibling->emit_flush; 3721 ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb; 3722 ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb; 3723 ve->base.emit_fini_breadcrumb_dw = 3724 sibling->emit_fini_breadcrumb_dw; 3725 3726 ve->base.flags = sibling->flags; 3727 } 3728 3729 ve->base.flags |= I915_ENGINE_IS_VIRTUAL; 3730 3731 virtual_engine_initial_hint(ve); 3732 return &ve->context; 3733 3734 err_put: 3735 intel_context_put(&ve->context); 3736 return ERR_PTR(err); 3737 } 3738 3739 struct intel_context * 3740 intel_execlists_clone_virtual(struct intel_engine_cs *src) 3741 { 3742 struct virtual_engine *se = to_virtual_engine(src); 3743 struct intel_context *dst; 3744 3745 dst = intel_execlists_create_virtual(se->siblings, 3746 se->num_siblings); 3747 if (IS_ERR(dst)) 3748 return dst; 3749 3750 if (se->num_bonds) { 3751 struct virtual_engine *de = to_virtual_engine(dst->engine); 3752 3753 de->bonds = kmemdup(se->bonds, 3754 sizeof(*se->bonds) * se->num_bonds, 3755 GFP_KERNEL); 3756 if (!de->bonds) { 3757 intel_context_put(dst); 3758 return ERR_PTR(-ENOMEM); 3759 } 3760 3761 de->num_bonds = se->num_bonds; 3762 } 3763 3764 return dst; 3765 } 3766 3767 int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine, 3768 const struct intel_engine_cs *master, 3769 const struct intel_engine_cs *sibling) 3770 { 3771 struct virtual_engine *ve = to_virtual_engine(engine); 3772 struct ve_bond *bond; 3773 int n; 3774 3775 /* Sanity check the sibling is part of the virtual engine */ 3776 for (n = 0; n < ve->num_siblings; n++) 3777 if (sibling == ve->siblings[n]) 3778 break; 3779 if (n == ve->num_siblings) 3780 return -EINVAL; 3781 3782 bond = virtual_find_bond(ve, master); 3783 if (bond) { 3784 bond->sibling_mask |= sibling->mask; 3785 return 0; 3786 } 3787 3788 bond = krealloc(ve->bonds, 3789 sizeof(*bond) * (ve->num_bonds + 1), 3790 GFP_KERNEL); 3791 if (!bond) 3792 return -ENOMEM; 3793 3794 bond[ve->num_bonds].master = master; 3795 bond[ve->num_bonds].sibling_mask = sibling->mask; 3796 3797 ve->bonds = bond; 3798 ve->num_bonds++; 3799 3800 return 0; 3801 } 3802 3803 void intel_execlists_show_requests(struct intel_engine_cs *engine, 3804 struct drm_printer *m, 3805 void (*show_request)(struct drm_printer *m, 3806 const struct i915_request *rq, 3807 const char *prefix, 3808 int indent), 3809 unsigned int max) 3810 { 3811 const struct intel_engine_execlists *execlists = &engine->execlists; 3812 struct i915_request *rq, *last; 3813 unsigned long flags; 3814 unsigned int count; 3815 struct rb_node *rb; 3816 3817 spin_lock_irqsave(&engine->active.lock, flags); 3818 3819 last = NULL; 3820 count = 0; 3821 list_for_each_entry(rq, &engine->active.requests, sched.link) { 3822 if (count++ < max - 1) 3823 show_request(m, rq, "\t\t", 0); 3824 else 3825 last = rq; 3826 } 3827 if (last) { 3828 if (count > max) { 3829 drm_printf(m, 3830 "\t\t...skipping %d executing requests...\n", 3831 count - max); 3832 } 3833 show_request(m, last, "\t\t", 0); 3834 } 3835 3836 if (execlists->queue_priority_hint != INT_MIN) 3837 drm_printf(m, "\t\tQueue priority hint: %d\n", 3838 READ_ONCE(execlists->queue_priority_hint)); 3839 3840 last = NULL; 3841 count = 0; 3842 for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) { 3843 struct i915_priolist *p = rb_entry(rb, typeof(*p), node); 3844 3845 priolist_for_each_request(rq, p) { 3846 if (count++ < max - 1) 3847 show_request(m, rq, "\t\t", 0); 3848 else 3849 last = rq; 3850 } 3851 } 3852 if (last) { 3853 if (count > max) { 3854 drm_printf(m, 3855 "\t\t...skipping %d queued requests...\n", 3856 count - max); 3857 } 3858 show_request(m, last, "\t\t", 0); 3859 } 3860 3861 last = NULL; 3862 count = 0; 3863 for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) { 3864 struct virtual_engine *ve = 3865 rb_entry(rb, typeof(*ve), nodes[engine->id].rb); 3866 struct i915_request *rq = READ_ONCE(ve->request); 3867 3868 if (rq) { 3869 if (count++ < max - 1) 3870 show_request(m, rq, "\t\t", 0); 3871 else 3872 last = rq; 3873 } 3874 } 3875 if (last) { 3876 if (count > max) { 3877 drm_printf(m, 3878 "\t\t...skipping %d virtual requests...\n", 3879 count - max); 3880 } 3881 show_request(m, last, "\t\t", 0); 3882 } 3883 3884 spin_unlock_irqrestore(&engine->active.lock, flags); 3885 } 3886 3887 bool 3888 intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine) 3889 { 3890 return engine->set_default_submission == 3891 execlists_set_default_submission; 3892 } 3893 3894 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) 3895 #include "selftest_execlists.c" 3896 #endif 3897