12f5947dfSChristoph Hellwig================= 22f5947dfSChristoph HellwigKVM VCPU Requests 32f5947dfSChristoph Hellwig================= 42f5947dfSChristoph Hellwig 52f5947dfSChristoph HellwigOverview 62f5947dfSChristoph Hellwig======== 72f5947dfSChristoph Hellwig 82f5947dfSChristoph HellwigKVM supports an internal API enabling threads to request a VCPU thread to 92f5947dfSChristoph Hellwigperform some activity. For example, a thread may request a VCPU to flush 102f5947dfSChristoph Hellwigits TLB with a VCPU request. The API consists of the following functions:: 112f5947dfSChristoph Hellwig 122f5947dfSChristoph Hellwig /* Check if any requests are pending for VCPU @vcpu. */ 132f5947dfSChristoph Hellwig bool kvm_request_pending(struct kvm_vcpu *vcpu); 142f5947dfSChristoph Hellwig 152f5947dfSChristoph Hellwig /* Check if VCPU @vcpu has request @req pending. */ 162f5947dfSChristoph Hellwig bool kvm_test_request(int req, struct kvm_vcpu *vcpu); 172f5947dfSChristoph Hellwig 182f5947dfSChristoph Hellwig /* Clear request @req for VCPU @vcpu. */ 192f5947dfSChristoph Hellwig void kvm_clear_request(int req, struct kvm_vcpu *vcpu); 202f5947dfSChristoph Hellwig 212f5947dfSChristoph Hellwig /* 222f5947dfSChristoph Hellwig * Check if VCPU @vcpu has request @req pending. When the request is 232f5947dfSChristoph Hellwig * pending it will be cleared and a memory barrier, which pairs with 242f5947dfSChristoph Hellwig * another in kvm_make_request(), will be issued. 252f5947dfSChristoph Hellwig */ 262f5947dfSChristoph Hellwig bool kvm_check_request(int req, struct kvm_vcpu *vcpu); 272f5947dfSChristoph Hellwig 282f5947dfSChristoph Hellwig /* 292f5947dfSChristoph Hellwig * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs 302f5947dfSChristoph Hellwig * with another in kvm_check_request(), prior to setting the request. 312f5947dfSChristoph Hellwig */ 322f5947dfSChristoph Hellwig void kvm_make_request(int req, struct kvm_vcpu *vcpu); 332f5947dfSChristoph Hellwig 342f5947dfSChristoph Hellwig /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */ 352f5947dfSChristoph Hellwig bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req); 362f5947dfSChristoph Hellwig 372f5947dfSChristoph HellwigTypically a requester wants the VCPU to perform the activity as soon 382f5947dfSChristoph Hellwigas possible after making the request. This means most requests 392f5947dfSChristoph Hellwig(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(), 402f5947dfSChristoph Hellwigand kvm_make_all_cpus_request() has the kicking of all VCPUs built 412f5947dfSChristoph Hellwiginto it. 422f5947dfSChristoph Hellwig 432f5947dfSChristoph HellwigVCPU Kicks 442f5947dfSChristoph Hellwig---------- 452f5947dfSChristoph Hellwig 462f5947dfSChristoph HellwigThe goal of a VCPU kick is to bring a VCPU thread out of guest mode in 472f5947dfSChristoph Hellwigorder to perform some KVM maintenance. To do so, an IPI is sent, forcing 482f5947dfSChristoph Hellwiga guest mode exit. However, a VCPU thread may not be in guest mode at the 492f5947dfSChristoph Hellwigtime of the kick. Therefore, depending on the mode and state of the VCPU 502f5947dfSChristoph Hellwigthread, there are two other actions a kick may take. All three actions 512f5947dfSChristoph Hellwigare listed below: 522f5947dfSChristoph Hellwig 532f5947dfSChristoph Hellwig1) Send an IPI. This forces a guest mode exit. 542f5947dfSChristoph Hellwig2) Waking a sleeping VCPU. Sleeping VCPUs are VCPU threads outside guest 552f5947dfSChristoph Hellwig mode that wait on waitqueues. Waking them removes the threads from 562f5947dfSChristoph Hellwig the waitqueues, allowing the threads to run again. This behavior 572f5947dfSChristoph Hellwig may be suppressed, see KVM_REQUEST_NO_WAKEUP below. 582f5947dfSChristoph Hellwig3) Nothing. When the VCPU is not in guest mode and the VCPU thread is not 592f5947dfSChristoph Hellwig sleeping, then there is nothing to do. 602f5947dfSChristoph Hellwig 612f5947dfSChristoph HellwigVCPU Mode 622f5947dfSChristoph Hellwig--------- 632f5947dfSChristoph Hellwig 642f5947dfSChristoph HellwigVCPUs have a mode state, ``vcpu->mode``, that is used to track whether the 652f5947dfSChristoph Hellwigguest is running in guest mode or not, as well as some specific 662f5947dfSChristoph Hellwigoutside guest mode states. The architecture may use ``vcpu->mode`` to 672f5947dfSChristoph Hellwigensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"), 682f5947dfSChristoph Hellwigas well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and 692f5947dfSChristoph Hellwigeven to ensure IPI acknowledgements are waited upon (see "Waiting for 702f5947dfSChristoph HellwigAcknowledgements"). The following modes are defined: 712f5947dfSChristoph Hellwig 722f5947dfSChristoph HellwigOUTSIDE_GUEST_MODE 732f5947dfSChristoph Hellwig 742f5947dfSChristoph Hellwig The VCPU thread is outside guest mode. 752f5947dfSChristoph Hellwig 762f5947dfSChristoph HellwigIN_GUEST_MODE 772f5947dfSChristoph Hellwig 782f5947dfSChristoph Hellwig The VCPU thread is in guest mode. 792f5947dfSChristoph Hellwig 802f5947dfSChristoph HellwigEXITING_GUEST_MODE 812f5947dfSChristoph Hellwig 822f5947dfSChristoph Hellwig The VCPU thread is transitioning from IN_GUEST_MODE to 832f5947dfSChristoph Hellwig OUTSIDE_GUEST_MODE. 842f5947dfSChristoph Hellwig 852f5947dfSChristoph HellwigREADING_SHADOW_PAGE_TABLES 862f5947dfSChristoph Hellwig 872f5947dfSChristoph Hellwig The VCPU thread is outside guest mode, but it wants the sender of 882f5947dfSChristoph Hellwig certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU 892f5947dfSChristoph Hellwig thread is done reading the page tables. 902f5947dfSChristoph Hellwig 912f5947dfSChristoph HellwigVCPU Request Internals 922f5947dfSChristoph Hellwig====================== 932f5947dfSChristoph Hellwig 942f5947dfSChristoph HellwigVCPU requests are simply bit indices of the ``vcpu->requests`` bitmap. 952f5947dfSChristoph HellwigThis means general bitops, like those documented in [atomic-ops]_ could 962f5947dfSChristoph Hellwigalso be used, e.g. :: 972f5947dfSChristoph Hellwig 982f5947dfSChristoph Hellwig clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests); 992f5947dfSChristoph Hellwig 1002f5947dfSChristoph HellwigHowever, VCPU request users should refrain from doing so, as it would 1012f5947dfSChristoph Hellwigbreak the abstraction. The first 8 bits are reserved for architecture 1022f5947dfSChristoph Hellwigindependent requests, all additional bits are available for architecture 1032f5947dfSChristoph Hellwigdependent requests. 1042f5947dfSChristoph Hellwig 1052f5947dfSChristoph HellwigArchitecture Independent Requests 1062f5947dfSChristoph Hellwig--------------------------------- 1072f5947dfSChristoph Hellwig 1082f5947dfSChristoph HellwigKVM_REQ_TLB_FLUSH 1092f5947dfSChristoph Hellwig 1102f5947dfSChristoph Hellwig KVM's common MMU notifier may need to flush all of a guest's TLB 1112f5947dfSChristoph Hellwig entries, calling kvm_flush_remote_tlbs() to do so. Architectures that 1122f5947dfSChristoph Hellwig choose to use the common kvm_flush_remote_tlbs() implementation will 1132f5947dfSChristoph Hellwig need to handle this VCPU request. 1142f5947dfSChristoph Hellwig 1152f5947dfSChristoph HellwigKVM_REQ_MMU_RELOAD 1162f5947dfSChristoph Hellwig 1172f5947dfSChristoph Hellwig When shadow page tables are used and memory slots are removed it's 1182f5947dfSChristoph Hellwig necessary to inform each VCPU to completely refresh the tables. This 1192f5947dfSChristoph Hellwig request is used for that. 1202f5947dfSChristoph Hellwig 1212f5947dfSChristoph HellwigKVM_REQ_PENDING_TIMER 1222f5947dfSChristoph Hellwig 1232f5947dfSChristoph Hellwig This request may be made from a timer handler run on the host on behalf 1242f5947dfSChristoph Hellwig of a VCPU. It informs the VCPU thread to inject a timer interrupt. 1252f5947dfSChristoph Hellwig 1262f5947dfSChristoph HellwigKVM_REQ_UNHALT 1272f5947dfSChristoph Hellwig 1282f5947dfSChristoph Hellwig This request may be made from the KVM common function kvm_vcpu_block(), 1292f5947dfSChristoph Hellwig which is used to emulate an instruction that causes a CPU to halt until 1302f5947dfSChristoph Hellwig one of an architectural specific set of events and/or interrupts is 1312f5947dfSChristoph Hellwig received (determined by checking kvm_arch_vcpu_runnable()). When that 1322f5947dfSChristoph Hellwig event or interrupt arrives kvm_vcpu_block() makes the request. This is 1332f5947dfSChristoph Hellwig in contrast to when kvm_vcpu_block() returns due to any other reason, 1342f5947dfSChristoph Hellwig such as a pending signal, which does not indicate the VCPU's halt 1352f5947dfSChristoph Hellwig emulation should stop, and therefore does not make the request. 1362f5947dfSChristoph Hellwig 1372f5947dfSChristoph HellwigKVM_REQUEST_MASK 1382f5947dfSChristoph Hellwig---------------- 1392f5947dfSChristoph Hellwig 1402f5947dfSChristoph HellwigVCPU requests should be masked by KVM_REQUEST_MASK before using them with 1412f5947dfSChristoph Hellwigbitops. This is because only the lower 8 bits are used to represent the 1422f5947dfSChristoph Hellwigrequest's number. The upper bits are used as flags. Currently only two 1432f5947dfSChristoph Hellwigflags are defined. 1442f5947dfSChristoph Hellwig 1452f5947dfSChristoph HellwigVCPU Request Flags 1462f5947dfSChristoph Hellwig------------------ 1472f5947dfSChristoph Hellwig 1482f5947dfSChristoph HellwigKVM_REQUEST_NO_WAKEUP 1492f5947dfSChristoph Hellwig 1502f5947dfSChristoph Hellwig This flag is applied to requests that only need immediate attention 1512f5947dfSChristoph Hellwig from VCPUs running in guest mode. That is, sleeping VCPUs do not need 1522f5947dfSChristoph Hellwig to be awaken for these requests. Sleeping VCPUs will handle the 1532f5947dfSChristoph Hellwig requests when they are awaken later for some other reason. 1542f5947dfSChristoph Hellwig 1552f5947dfSChristoph HellwigKVM_REQUEST_WAIT 1562f5947dfSChristoph Hellwig 1572f5947dfSChristoph Hellwig When requests with this flag are made with kvm_make_all_cpus_request(), 1582f5947dfSChristoph Hellwig then the caller will wait for each VCPU to acknowledge its IPI before 1592f5947dfSChristoph Hellwig proceeding. This flag only applies to VCPUs that would receive IPIs. 1602f5947dfSChristoph Hellwig If, for example, the VCPU is sleeping, so no IPI is necessary, then 1612f5947dfSChristoph Hellwig the requesting thread does not wait. This means that this flag may be 1622f5947dfSChristoph Hellwig safely combined with KVM_REQUEST_NO_WAKEUP. See "Waiting for 1632f5947dfSChristoph Hellwig Acknowledgements" for more information about requests with 1642f5947dfSChristoph Hellwig KVM_REQUEST_WAIT. 1652f5947dfSChristoph Hellwig 1662f5947dfSChristoph HellwigVCPU Requests with Associated State 1672f5947dfSChristoph Hellwig=================================== 1682f5947dfSChristoph Hellwig 1692f5947dfSChristoph HellwigRequesters that want the receiving VCPU to handle new state need to ensure 1702f5947dfSChristoph Hellwigthe newly written state is observable to the receiving VCPU thread's CPU 1712f5947dfSChristoph Hellwigby the time it observes the request. This means a write memory barrier 1722f5947dfSChristoph Hellwigmust be inserted after writing the new state and before setting the VCPU 1732f5947dfSChristoph Hellwigrequest bit. Additionally, on the receiving VCPU thread's side, a 1742f5947dfSChristoph Hellwigcorresponding read barrier must be inserted after reading the request bit 1752f5947dfSChristoph Hellwigand before proceeding to read the new state associated with it. See 1762f5947dfSChristoph Hellwigscenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation 1772f5947dfSChristoph Hellwig[memory-barriers]_. 1782f5947dfSChristoph Hellwig 1792f5947dfSChristoph HellwigThe pair of functions, kvm_check_request() and kvm_make_request(), provide 1802f5947dfSChristoph Hellwigthe memory barriers, allowing this requirement to be handled internally by 1812f5947dfSChristoph Hellwigthe API. 1822f5947dfSChristoph Hellwig 1832f5947dfSChristoph HellwigEnsuring Requests Are Seen 1842f5947dfSChristoph Hellwig========================== 1852f5947dfSChristoph Hellwig 1862f5947dfSChristoph HellwigWhen making requests to VCPUs, we want to avoid the receiving VCPU 1872f5947dfSChristoph Hellwigexecuting in guest mode for an arbitrary long time without handling the 1882f5947dfSChristoph Hellwigrequest. We can be sure this won't happen as long as we ensure the VCPU 1892f5947dfSChristoph Hellwigthread checks kvm_request_pending() before entering guest mode and that a 1902f5947dfSChristoph Hellwigkick will send an IPI to force an exit from guest mode when necessary. 1912f5947dfSChristoph HellwigExtra care must be taken to cover the period after the VCPU thread's last 1922f5947dfSChristoph Hellwigkvm_request_pending() check and before it has entered guest mode, as kick 1932f5947dfSChristoph HellwigIPIs will only trigger guest mode exits for VCPU threads that are in guest 1942f5947dfSChristoph Hellwigmode or at least have already disabled interrupts in order to prepare to 1952f5947dfSChristoph Hellwigenter guest mode. This means that an optimized implementation (see "IPI 1962f5947dfSChristoph HellwigReduction") must be certain when it's safe to not send the IPI. One 1972f5947dfSChristoph Hellwigsolution, which all architectures except s390 apply, is to: 1982f5947dfSChristoph Hellwig 1992f5947dfSChristoph Hellwig- set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and 2002f5947dfSChristoph Hellwig the last kvm_request_pending() check; 2012f5947dfSChristoph Hellwig- enable interrupts atomically when entering the guest. 2022f5947dfSChristoph Hellwig 2032f5947dfSChristoph HellwigThis solution also requires memory barriers to be placed carefully in both 2042f5947dfSChristoph Hellwigthe requesting thread and the receiving VCPU. With the memory barriers we 2052f5947dfSChristoph Hellwigcan exclude the possibility of a VCPU thread observing 2062f5947dfSChristoph Hellwig!kvm_request_pending() on its last check and then not receiving an IPI for 2072f5947dfSChristoph Hellwigthe next request made of it, even if the request is made immediately after 2082f5947dfSChristoph Hellwigthe check. This is done by way of the Dekker memory barrier pattern 2092f5947dfSChristoph Hellwig(scenario 10 of [lwn-mb]_). As the Dekker pattern requires two variables, 2102f5947dfSChristoph Hellwigthis solution pairs ``vcpu->mode`` with ``vcpu->requests``. Substituting 2112f5947dfSChristoph Hellwigthem into the pattern gives:: 2122f5947dfSChristoph Hellwig 2132f5947dfSChristoph Hellwig CPU1 CPU2 2142f5947dfSChristoph Hellwig ================= ================= 2152f5947dfSChristoph Hellwig local_irq_disable(); 2162f5947dfSChristoph Hellwig WRITE_ONCE(vcpu->mode, IN_GUEST_MODE); kvm_make_request(REQ, vcpu); 2172f5947dfSChristoph Hellwig smp_mb(); smp_mb(); 2182f5947dfSChristoph Hellwig if (kvm_request_pending(vcpu)) { if (READ_ONCE(vcpu->mode) == 2192f5947dfSChristoph Hellwig IN_GUEST_MODE) { 2202f5947dfSChristoph Hellwig ...abort guest entry... ...send IPI... 2212f5947dfSChristoph Hellwig } } 2222f5947dfSChristoph Hellwig 2232f5947dfSChristoph HellwigAs stated above, the IPI is only useful for VCPU threads in guest mode or 2242f5947dfSChristoph Hellwigthat have already disabled interrupts. This is why this specific case of 2252f5947dfSChristoph Hellwigthe Dekker pattern has been extended to disable interrupts before setting 2262f5947dfSChristoph Hellwig``vcpu->mode`` to IN_GUEST_MODE. WRITE_ONCE() and READ_ONCE() are used to 2272f5947dfSChristoph Hellwigpedantically implement the memory barrier pattern, guaranteeing the 2282f5947dfSChristoph Hellwigcompiler doesn't interfere with ``vcpu->mode``'s carefully planned 2292f5947dfSChristoph Hellwigaccesses. 2302f5947dfSChristoph Hellwig 2312f5947dfSChristoph HellwigIPI Reduction 2322f5947dfSChristoph Hellwig------------- 2332f5947dfSChristoph Hellwig 2342f5947dfSChristoph HellwigAs only one IPI is needed to get a VCPU to check for any/all requests, 2352f5947dfSChristoph Hellwigthen they may be coalesced. This is easily done by having the first IPI 2362f5947dfSChristoph Hellwigsending kick also change the VCPU mode to something !IN_GUEST_MODE. The 2372f5947dfSChristoph Hellwigtransitional state, EXITING_GUEST_MODE, is used for this purpose. 2382f5947dfSChristoph Hellwig 2392f5947dfSChristoph HellwigWaiting for Acknowledgements 2402f5947dfSChristoph Hellwig---------------------------- 2412f5947dfSChristoph Hellwig 2422f5947dfSChristoph HellwigSome requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to 2432f5947dfSChristoph Hellwigbe sent, and the acknowledgements to be waited upon, even when the target 2442f5947dfSChristoph HellwigVCPU threads are in modes other than IN_GUEST_MODE. For example, one case 2452f5947dfSChristoph Hellwigis when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which 2462f5947dfSChristoph Hellwigis set after disabling interrupts. To support these cases, the 2472f5947dfSChristoph HellwigKVM_REQUEST_WAIT flag changes the condition for sending an IPI from 2482f5947dfSChristoph Hellwigchecking that the VCPU is IN_GUEST_MODE to checking that it is not 2492f5947dfSChristoph HellwigOUTSIDE_GUEST_MODE. 2502f5947dfSChristoph Hellwig 2512f5947dfSChristoph HellwigRequest-less VCPU Kicks 2522f5947dfSChristoph Hellwig----------------------- 2532f5947dfSChristoph Hellwig 2542f5947dfSChristoph HellwigAs the determination of whether or not to send an IPI depends on the 2552f5947dfSChristoph Hellwigtwo-variable Dekker memory barrier pattern, then it's clear that 2562f5947dfSChristoph Hellwigrequest-less VCPU kicks are almost never correct. Without the assurance 2572f5947dfSChristoph Hellwigthat a non-IPI generating kick will still result in an action by the 2582f5947dfSChristoph Hellwigreceiving VCPU, as the final kvm_request_pending() check does for 2592f5947dfSChristoph Hellwigrequest-accompanying kicks, then the kick may not do anything useful at 2602f5947dfSChristoph Hellwigall. If, for instance, a request-less kick was made to a VCPU that was 2612f5947dfSChristoph Hellwigjust about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then 2622f5947dfSChristoph Hellwigthe VCPU thread may continue its entry without actually having done 2632f5947dfSChristoph Hellwigwhatever it was the kick was meant to initiate. 2642f5947dfSChristoph Hellwig 2652f5947dfSChristoph HellwigOne exception is x86's posted interrupt mechanism. In this case, however, 2662f5947dfSChristoph Hellwigeven the request-less VCPU kick is coupled with the same 2672f5947dfSChristoph Hellwiglocal_irq_disable() + smp_mb() pattern described above; the ON bit 2682f5947dfSChristoph Hellwig(Outstanding Notification) in the posted interrupt descriptor takes the 2692f5947dfSChristoph Hellwigrole of ``vcpu->requests``. When sending a posted interrupt, PIR.ON is 2702f5947dfSChristoph Hellwigset before reading ``vcpu->mode``; dually, in the VCPU thread, 2712f5947dfSChristoph Hellwigvmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to 2722f5947dfSChristoph HellwigIN_GUEST_MODE. 2732f5947dfSChristoph Hellwig 2742f5947dfSChristoph HellwigAdditional Considerations 2752f5947dfSChristoph Hellwig========================= 2762f5947dfSChristoph Hellwig 2772f5947dfSChristoph HellwigSleeping VCPUs 2782f5947dfSChristoph Hellwig-------------- 2792f5947dfSChristoph Hellwig 2802f5947dfSChristoph HellwigVCPU threads may need to consider requests before and/or after calling 2812f5947dfSChristoph Hellwigfunctions that may put them to sleep, e.g. kvm_vcpu_block(). Whether they 2822f5947dfSChristoph Hellwigdo or not, and, if they do, which requests need consideration, is 2832f5947dfSChristoph Hellwigarchitecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable() 2842f5947dfSChristoph Hellwigto check if it should awaken. One reason to do so is to provide 2852f5947dfSChristoph Hellwigarchitectures a function where requests may be checked if necessary. 2862f5947dfSChristoph Hellwig 2872f5947dfSChristoph HellwigClearing Requests 2882f5947dfSChristoph Hellwig----------------- 2892f5947dfSChristoph Hellwig 2902f5947dfSChristoph HellwigGenerally it only makes sense for the receiving VCPU thread to clear a 2912f5947dfSChristoph Hellwigrequest. However, in some circumstances, such as when the requesting 2922f5947dfSChristoph Hellwigthread and the receiving VCPU thread are executed serially, such as when 2932f5947dfSChristoph Hellwigthey are the same thread, or when they are using some form of concurrency 2942f5947dfSChristoph Hellwigcontrol to temporarily execute synchronously, then it's possible to know 2952f5947dfSChristoph Hellwigthat the request may be cleared immediately, rather than waiting for the 2962f5947dfSChristoph Hellwigreceiving VCPU thread to handle the request in VCPU RUN. The only current 2972f5947dfSChristoph Hellwigexamples of this are kvm_vcpu_block() calls made by VCPUs to block 2982f5947dfSChristoph Hellwigthemselves. A possible side-effect of that call is to make the 2992f5947dfSChristoph HellwigKVM_REQ_UNHALT request, which may then be cleared immediately when the 3002f5947dfSChristoph HellwigVCPU returns from the call. 3012f5947dfSChristoph Hellwig 3022f5947dfSChristoph HellwigReferences 3032f5947dfSChristoph Hellwig========== 3042f5947dfSChristoph Hellwig 3052f5947dfSChristoph Hellwig.. [atomic-ops] Documentation/core-api/atomic_ops.rst 3062f5947dfSChristoph Hellwig.. [memory-barriers] Documentation/memory-barriers.txt 3072f5947dfSChristoph Hellwig.. [lwn-mb] https://lwn.net/Articles/573436/ 308