virt/kvm/vcpu-requests.rst

2f5947dfSChristoph Hellwig=================
2f5947dfSChristoph HellwigKVM VCPU Requests
2f5947dfSChristoph Hellwig=================
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigOverview
2f5947dfSChristoph Hellwig========
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigKVM supports an internal API enabling threads to request a VCPU thread to
2f5947dfSChristoph Hellwigperform some activity.  For example, a thread may request a VCPU to flush
2f5947dfSChristoph Hellwigits TLB with a VCPU request.  The API consists of the following functions::
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  /* Check if any requests are pending for VCPU @vcpu. */
2f5947dfSChristoph Hellwig  bool kvm_request_pending(struct kvm_vcpu *vcpu);
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  /* Check if VCPU @vcpu has request @req pending. */
2f5947dfSChristoph Hellwig  bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  /* Clear request @req for VCPU @vcpu. */
2f5947dfSChristoph Hellwig  void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  /*
2f5947dfSChristoph Hellwig   * Check if VCPU @vcpu has request @req pending. When the request is
2f5947dfSChristoph Hellwig   * pending it will be cleared and a memory barrier, which pairs with
2f5947dfSChristoph Hellwig   * another in kvm_make_request(), will be issued.
2f5947dfSChristoph Hellwig   */
2f5947dfSChristoph Hellwig  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  /*
2f5947dfSChristoph Hellwig   * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
2f5947dfSChristoph Hellwig   * with another in kvm_check_request(), prior to setting the request.
2f5947dfSChristoph Hellwig   */
2f5947dfSChristoph Hellwig  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
2f5947dfSChristoph Hellwig  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigTypically a requester wants the VCPU to perform the activity as soon
2f5947dfSChristoph Hellwigas possible after making the request.  This means most requests
2f5947dfSChristoph Hellwig(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
2f5947dfSChristoph Hellwigand kvm_make_all_cpus_request() has the kicking of all VCPUs built
2f5947dfSChristoph Hellwiginto it.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigVCPU Kicks
2f5947dfSChristoph Hellwig----------
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigThe goal of a VCPU kick is to bring a VCPU thread out of guest mode in
2f5947dfSChristoph Hellwigorder to perform some KVM maintenance.  To do so, an IPI is sent, forcing
2f5947dfSChristoph Hellwiga guest mode exit.  However, a VCPU thread may not be in guest mode at the
2f5947dfSChristoph Hellwigtime of the kick.  Therefore, depending on the mode and state of the VCPU
2f5947dfSChristoph Hellwigthread, there are two other actions a kick may take.  All three actions
2f5947dfSChristoph Hellwigare listed below:
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig1) Send an IPI.  This forces a guest mode exit.
2f5947dfSChristoph Hellwig2) Waking a sleeping VCPU.  Sleeping VCPUs are VCPU threads outside guest
2f5947dfSChristoph Hellwig   mode that wait on waitqueues.  Waking them removes the threads from
2f5947dfSChristoph Hellwig   the waitqueues, allowing the threads to run again.  This behavior
2f5947dfSChristoph Hellwig   may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
2f5947dfSChristoph Hellwig3) Nothing.  When the VCPU is not in guest mode and the VCPU thread is not
2f5947dfSChristoph Hellwig   sleeping, then there is nothing to do.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigVCPU Mode
2f5947dfSChristoph Hellwig---------
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigVCPUs have a mode state, ``vcpu->mode``, that is used to track whether the
2f5947dfSChristoph Hellwigguest is running in guest mode or not, as well as some specific
2f5947dfSChristoph Hellwigoutside guest mode states.  The architecture may use ``vcpu->mode`` to
2f5947dfSChristoph Hellwigensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"),
2f5947dfSChristoph Hellwigas well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and
2f5947dfSChristoph Hellwigeven to ensure IPI acknowledgements are waited upon (see "Waiting for
2f5947dfSChristoph HellwigAcknowledgements").  The following modes are defined:
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigOUTSIDE_GUEST_MODE
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  The VCPU thread is outside guest mode.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigIN_GUEST_MODE
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  The VCPU thread is in guest mode.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigEXITING_GUEST_MODE
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  The VCPU thread is transitioning from IN_GUEST_MODE to
2f5947dfSChristoph Hellwig  OUTSIDE_GUEST_MODE.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigREADING_SHADOW_PAGE_TABLES
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  The VCPU thread is outside guest mode, but it wants the sender of
2f5947dfSChristoph Hellwig  certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU
2f5947dfSChristoph Hellwig  thread is done reading the page tables.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigVCPU Request Internals
2f5947dfSChristoph Hellwig======================
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigVCPU requests are simply bit indices of the ``vcpu->requests`` bitmap.
2f5947dfSChristoph HellwigThis means general bitops, like those documented in [atomic-ops]_ could
2f5947dfSChristoph Hellwigalso be used, e.g. ::
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigHowever, VCPU request users should refrain from doing so, as it would
2f5947dfSChristoph Hellwigbreak the abstraction.  The first 8 bits are reserved for architecture
2f5947dfSChristoph Hellwigindependent requests, all additional bits are available for architecture
2f5947dfSChristoph Hellwigdependent requests.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigArchitecture Independent Requests
2f5947dfSChristoph Hellwig---------------------------------
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigKVM_REQ_TLB_FLUSH
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  KVM's common MMU notifier may need to flush all of a guest's TLB
2f5947dfSChristoph Hellwig  entries, calling kvm_flush_remote_tlbs() to do so.  Architectures that
2f5947dfSChristoph Hellwig  choose to use the common kvm_flush_remote_tlbs() implementation will
2f5947dfSChristoph Hellwig  need to handle this VCPU request.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigKVM_REQ_MMU_RELOAD
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  When shadow page tables are used and memory slots are removed it's
2f5947dfSChristoph Hellwig  necessary to inform each VCPU to completely refresh the tables.  This
2f5947dfSChristoph Hellwig  request is used for that.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigKVM_REQ_PENDING_TIMER
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  This request may be made from a timer handler run on the host on behalf
2f5947dfSChristoph Hellwig  of a VCPU.  It informs the VCPU thread to inject a timer interrupt.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigKVM_REQ_UNHALT
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  This request may be made from the KVM common function kvm_vcpu_block(),
2f5947dfSChristoph Hellwig  which is used to emulate an instruction that causes a CPU to halt until
2f5947dfSChristoph Hellwig  one of an architectural specific set of events and/or interrupts is
2f5947dfSChristoph Hellwig  received (determined by checking kvm_arch_vcpu_runnable()).  When that
2f5947dfSChristoph Hellwig  event or interrupt arrives kvm_vcpu_block() makes the request.  This is
2f5947dfSChristoph Hellwig  in contrast to when kvm_vcpu_block() returns due to any other reason,
2f5947dfSChristoph Hellwig  such as a pending signal, which does not indicate the VCPU's halt
2f5947dfSChristoph Hellwig  emulation should stop, and therefore does not make the request.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigKVM_REQUEST_MASK
2f5947dfSChristoph Hellwig----------------
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigVCPU requests should be masked by KVM_REQUEST_MASK before using them with
2f5947dfSChristoph Hellwigbitops.  This is because only the lower 8 bits are used to represent the
2f5947dfSChristoph Hellwigrequest's number.  The upper bits are used as flags.  Currently only two
2f5947dfSChristoph Hellwigflags are defined.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigVCPU Request Flags
2f5947dfSChristoph Hellwig------------------
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigKVM_REQUEST_NO_WAKEUP
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  This flag is applied to requests that only need immediate attention
2f5947dfSChristoph Hellwig  from VCPUs running in guest mode.  That is, sleeping VCPUs do not need
2f5947dfSChristoph Hellwig  to be awaken for these requests.  Sleeping VCPUs will handle the
2f5947dfSChristoph Hellwig  requests when they are awaken later for some other reason.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigKVM_REQUEST_WAIT
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  When requests with this flag are made with kvm_make_all_cpus_request(),
2f5947dfSChristoph Hellwig  then the caller will wait for each VCPU to acknowledge its IPI before
2f5947dfSChristoph Hellwig  proceeding.  This flag only applies to VCPUs that would receive IPIs.
2f5947dfSChristoph Hellwig  If, for example, the VCPU is sleeping, so no IPI is necessary, then
2f5947dfSChristoph Hellwig  the requesting thread does not wait.  This means that this flag may be
2f5947dfSChristoph Hellwig  safely combined with KVM_REQUEST_NO_WAKEUP.  See "Waiting for
2f5947dfSChristoph Hellwig  Acknowledgements" for more information about requests with
2f5947dfSChristoph Hellwig  KVM_REQUEST_WAIT.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigVCPU Requests with Associated State
2f5947dfSChristoph Hellwig===================================
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigRequesters that want the receiving VCPU to handle new state need to ensure
2f5947dfSChristoph Hellwigthe newly written state is observable to the receiving VCPU thread's CPU
2f5947dfSChristoph Hellwigby the time it observes the request.  This means a write memory barrier
2f5947dfSChristoph Hellwigmust be inserted after writing the new state and before setting the VCPU
2f5947dfSChristoph Hellwigrequest bit.  Additionally, on the receiving VCPU thread's side, a
2f5947dfSChristoph Hellwigcorresponding read barrier must be inserted after reading the request bit
2f5947dfSChristoph Hellwigand before proceeding to read the new state associated with it.  See
2f5947dfSChristoph Hellwigscenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
2f5947dfSChristoph Hellwig[memory-barriers]_.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigThe pair of functions, kvm_check_request() and kvm_make_request(), provide
2f5947dfSChristoph Hellwigthe memory barriers, allowing this requirement to be handled internally by
2f5947dfSChristoph Hellwigthe API.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigEnsuring Requests Are Seen
2f5947dfSChristoph Hellwig==========================
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigWhen making requests to VCPUs, we want to avoid the receiving VCPU
2f5947dfSChristoph Hellwigexecuting in guest mode for an arbitrary long time without handling the
2f5947dfSChristoph Hellwigrequest.  We can be sure this won't happen as long as we ensure the VCPU
2f5947dfSChristoph Hellwigthread checks kvm_request_pending() before entering guest mode and that a
2f5947dfSChristoph Hellwigkick will send an IPI to force an exit from guest mode when necessary.
2f5947dfSChristoph HellwigExtra care must be taken to cover the period after the VCPU thread's last
2f5947dfSChristoph Hellwigkvm_request_pending() check and before it has entered guest mode, as kick
2f5947dfSChristoph HellwigIPIs will only trigger guest mode exits for VCPU threads that are in guest
2f5947dfSChristoph Hellwigmode or at least have already disabled interrupts in order to prepare to
2f5947dfSChristoph Hellwigenter guest mode.  This means that an optimized implementation (see "IPI
2f5947dfSChristoph HellwigReduction") must be certain when it's safe to not send the IPI.  One
2f5947dfSChristoph Hellwigsolution, which all architectures except s390 apply, is to:
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig- set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and
2f5947dfSChristoph Hellwig  the last kvm_request_pending() check;
2f5947dfSChristoph Hellwig- enable interrupts atomically when entering the guest.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigThis solution also requires memory barriers to be placed carefully in both
2f5947dfSChristoph Hellwigthe requesting thread and the receiving VCPU.  With the memory barriers we
2f5947dfSChristoph Hellwigcan exclude the possibility of a VCPU thread observing
2f5947dfSChristoph Hellwig!kvm_request_pending() on its last check and then not receiving an IPI for
2f5947dfSChristoph Hellwigthe next request made of it, even if the request is made immediately after
2f5947dfSChristoph Hellwigthe check.  This is done by way of the Dekker memory barrier pattern
2f5947dfSChristoph Hellwig(scenario 10 of [lwn-mb]_).  As the Dekker pattern requires two variables,
2f5947dfSChristoph Hellwigthis solution pairs ``vcpu->mode`` with ``vcpu->requests``.  Substituting
2f5947dfSChristoph Hellwigthem into the pattern gives::
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig  CPU1                                    CPU2
2f5947dfSChristoph Hellwig  =================                       =================
2f5947dfSChristoph Hellwig  local_irq_disable();
2f5947dfSChristoph Hellwig  WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);  kvm_make_request(REQ, vcpu);
2f5947dfSChristoph Hellwig  smp_mb();                               smp_mb();
2f5947dfSChristoph Hellwig  if (kvm_request_pending(vcpu)) {        if (READ_ONCE(vcpu->mode) ==
2f5947dfSChristoph Hellwig                                              IN_GUEST_MODE) {
2f5947dfSChristoph Hellwig      ...abort guest entry...                 ...send IPI...
2f5947dfSChristoph Hellwig  }                                       }
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigAs stated above, the IPI is only useful for VCPU threads in guest mode or
2f5947dfSChristoph Hellwigthat have already disabled interrupts.  This is why this specific case of
2f5947dfSChristoph Hellwigthe Dekker pattern has been extended to disable interrupts before setting
2f5947dfSChristoph Hellwig``vcpu->mode`` to IN_GUEST_MODE.  WRITE_ONCE() and READ_ONCE() are used to
2f5947dfSChristoph Hellwigpedantically implement the memory barrier pattern, guaranteeing the
2f5947dfSChristoph Hellwigcompiler doesn't interfere with ``vcpu->mode``'s carefully planned
2f5947dfSChristoph Hellwigaccesses.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigIPI Reduction
2f5947dfSChristoph Hellwig-------------
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigAs only one IPI is needed to get a VCPU to check for any/all requests,
2f5947dfSChristoph Hellwigthen they may be coalesced.  This is easily done by having the first IPI
2f5947dfSChristoph Hellwigsending kick also change the VCPU mode to something !IN_GUEST_MODE.  The
2f5947dfSChristoph Hellwigtransitional state, EXITING_GUEST_MODE, is used for this purpose.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigWaiting for Acknowledgements
2f5947dfSChristoph Hellwig----------------------------
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigSome requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
2f5947dfSChristoph Hellwigbe sent, and the acknowledgements to be waited upon, even when the target
2f5947dfSChristoph HellwigVCPU threads are in modes other than IN_GUEST_MODE.  For example, one case
2f5947dfSChristoph Hellwigis when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
2f5947dfSChristoph Hellwigis set after disabling interrupts.  To support these cases, the
2f5947dfSChristoph HellwigKVM_REQUEST_WAIT flag changes the condition for sending an IPI from
2f5947dfSChristoph Hellwigchecking that the VCPU is IN_GUEST_MODE to checking that it is not
2f5947dfSChristoph HellwigOUTSIDE_GUEST_MODE.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigRequest-less VCPU Kicks
2f5947dfSChristoph Hellwig-----------------------
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigAs the determination of whether or not to send an IPI depends on the
2f5947dfSChristoph Hellwigtwo-variable Dekker memory barrier pattern, then it's clear that
2f5947dfSChristoph Hellwigrequest-less VCPU kicks are almost never correct.  Without the assurance
2f5947dfSChristoph Hellwigthat a non-IPI generating kick will still result in an action by the
2f5947dfSChristoph Hellwigreceiving VCPU, as the final kvm_request_pending() check does for
2f5947dfSChristoph Hellwigrequest-accompanying kicks, then the kick may not do anything useful at
2f5947dfSChristoph Hellwigall.  If, for instance, a request-less kick was made to a VCPU that was
2f5947dfSChristoph Hellwigjust about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
2f5947dfSChristoph Hellwigthe VCPU thread may continue its entry without actually having done
2f5947dfSChristoph Hellwigwhatever it was the kick was meant to initiate.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigOne exception is x86's posted interrupt mechanism.  In this case, however,
2f5947dfSChristoph Hellwigeven the request-less VCPU kick is coupled with the same
2f5947dfSChristoph Hellwiglocal_irq_disable() + smp_mb() pattern described above; the ON bit
2f5947dfSChristoph Hellwig(Outstanding Notification) in the posted interrupt descriptor takes the
2f5947dfSChristoph Hellwigrole of ``vcpu->requests``.  When sending a posted interrupt, PIR.ON is
2f5947dfSChristoph Hellwigset before reading ``vcpu->mode``; dually, in the VCPU thread,
2f5947dfSChristoph Hellwigvmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to
2f5947dfSChristoph HellwigIN_GUEST_MODE.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigAdditional Considerations
2f5947dfSChristoph Hellwig=========================
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigSleeping VCPUs
2f5947dfSChristoph Hellwig--------------
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigVCPU threads may need to consider requests before and/or after calling
2f5947dfSChristoph Hellwigfunctions that may put them to sleep, e.g. kvm_vcpu_block().  Whether they
2f5947dfSChristoph Hellwigdo or not, and, if they do, which requests need consideration, is
2f5947dfSChristoph Hellwigarchitecture dependent.  kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
2f5947dfSChristoph Hellwigto check if it should awaken.  One reason to do so is to provide
2f5947dfSChristoph Hellwigarchitectures a function where requests may be checked if necessary.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigClearing Requests
2f5947dfSChristoph Hellwig-----------------
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigGenerally it only makes sense for the receiving VCPU thread to clear a
2f5947dfSChristoph Hellwigrequest.  However, in some circumstances, such as when the requesting
2f5947dfSChristoph Hellwigthread and the receiving VCPU thread are executed serially, such as when
2f5947dfSChristoph Hellwigthey are the same thread, or when they are using some form of concurrency
2f5947dfSChristoph Hellwigcontrol to temporarily execute synchronously, then it's possible to know
2f5947dfSChristoph Hellwigthat the request may be cleared immediately, rather than waiting for the
2f5947dfSChristoph Hellwigreceiving VCPU thread to handle the request in VCPU RUN.  The only current
2f5947dfSChristoph Hellwigexamples of this are kvm_vcpu_block() calls made by VCPUs to block
2f5947dfSChristoph Hellwigthemselves.  A possible side-effect of that call is to make the
2f5947dfSChristoph HellwigKVM_REQ_UNHALT request, which may then be cleared immediately when the
2f5947dfSChristoph HellwigVCPU returns from the call.
2f5947dfSChristoph Hellwig
2f5947dfSChristoph HellwigReferences
2f5947dfSChristoph Hellwig==========
2f5947dfSChristoph Hellwig
2f5947dfSChristoph Hellwig.. [atomic-ops] Documentation/core-api/atomic_ops.rst
2f5947dfSChristoph Hellwig.. [memory-barriers] Documentation/memory-barriers.txt
2f5947dfSChristoph Hellwig.. [lwn-mb] https://lwn.net/Articles/573436/