cf2181ae | 16-Sep-2024 |
Jonathan Cameron <Jonathan.Cameron@huawei.com> |
hw/acpi: Make storage of node id uint32_t to reduce fragility
>From review of generic port introduction.
The value is handled as a uint32_t so store it in that type. The value cannot in reality exc
hw/acpi: Make storage of node id uint32_t to reduce fragility
>From review of generic port introduction.
The value is handled as a uint32_t so store it in that type. The value cannot in reality exceed MAX_NODES which is currently 128 but if the types are matched there is no need to rely on that restriction.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240916174237.1843213-1-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
a82fe829 | 16-Sep-2024 |
Jonathan Cameron <Jonathan.Cameron@huawei.com> |
hw/acpi: Generic Port Affinity Structure support
These are very similar to the recently added Generic Initiators but instead of representing an initiator of memory traffic they represent an edge poi
hw/acpi: Generic Port Affinity Structure support
These are very similar to the recently added Generic Initiators but instead of representing an initiator of memory traffic they represent an edge point beyond which may lie either targets or initiators. Here we add these ports such that they may be targets of hmat_lb records to describe the latency and bandwidth from host side initiators to the port. A discoverable mechanism such as UEFI CDAT read from CXL devices and switches is used to discover the remainder of the path, and the OS can build up full latency and bandwidth numbers as need for work and data placement decisions.
Acked-by: Markus Armbruster <armbru@redhat.com> Tested-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240916174122.1843197-1-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
f74e7822 | 16-Sep-2024 |
Jonathan Cameron <Jonathan.Cameron@huawei.com> |
acpi/pci: Move Generic Initiator object handling into acpi/pci.*
Whilst ACPI SRAT Generic Initiator Afinity Structures are able to refer to both PCI and ACPI Device Handles, the QEMU implementation
acpi/pci: Move Generic Initiator object handling into acpi/pci.*
Whilst ACPI SRAT Generic Initiator Afinity Structures are able to refer to both PCI and ACPI Device Handles, the QEMU implementation only implements the PCI Device Handle case. For now move the code into the existing hw/acpi/pci.c file and header. If support for ACPI Device Handles is added in the future, perhaps this will be moved again.
Also push the struct AcpiGenericInitiator down into the c file as not used outside pci.c.
Suggested-by: Igor Mammedov <imammedo@redhat.com> Tested-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240916171017.1841767-7-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
efdb43b8 | 16-Jul-2024 |
Salil Mehta <salil.mehta@huawei.com> |
hw/acpi: Update CPUs AML with cpu-(ctrl)dev change
CPUs Control device(\\_SB.PCI0) register interface for the x86 arch is IO port based and existing CPUs AML code assumes _CRS objects would evaluate
hw/acpi: Update CPUs AML with cpu-(ctrl)dev change
CPUs Control device(\\_SB.PCI0) register interface for the x86 arch is IO port based and existing CPUs AML code assumes _CRS objects would evaluate to a system resource which describes IO Port address. But on ARM arch CPUs control device(\\_SB.PRES) register interface is memory-mapped hence _CRS object should evaluate to system resource which describes memory-mapped base address. Update build CPUs AML function to accept both IO/MEMORY region spaces and accordingly update the _CRS object.
Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-6-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
549c9a9d | 16-Jul-2024 |
Salil Mehta <salil.mehta@huawei.com> |
hw/acpi: Update GED _EVT method AML with CPU scan
OSPM evaluates _EVT method to map the event. The CPU hotplug event eventually results in start of the CPU scan. Scan figures out the CPU and the kin
hw/acpi: Update GED _EVT method AML with CPU scan
OSPM evaluates _EVT method to map the event. The CPU hotplug event eventually results in start of the CPU scan. Scan figures out the CPU and the kind of event(plug/unplug) and notifies it back to the guest. Update the GED AML _EVT method with the call to method \\_SB.CPUS.CSCN (via \\_SB.GED.CSCN)
Architecture specific code [1] might initialize its CPUs AML code by calling common function build_cpus_aml() like below for ARM:
build_cpus_aml(scope, ms, opts, xx_madt_cpu_entry, memmap[VIRT_CPUHP_ACPI].base, "\\_SB", "\\_SB.GED.CSCN", AML_SYSTEM_MEMORY);
[1] https://lore.kernel.org/qemu-devel/20240613233639.202896-13-salil.mehta@huawei.com/
Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-5-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
06f1f495 | 16-Jul-2024 |
Salil Mehta <salil.mehta@huawei.com> |
hw/acpi: Update ACPI GED framework to support vCPU Hotplug
ACPI GED (as described in the ACPI 6.4 spec) uses an interrupt listed in the _CRS object of GED to intimate OSPM about an event. Later then
hw/acpi: Update ACPI GED framework to support vCPU Hotplug
ACPI GED (as described in the ACPI 6.4 spec) uses an interrupt listed in the _CRS object of GED to intimate OSPM about an event. Later then demultiplexes the notified event by evaluating ACPI _EVT method to know the type of event. Use ACPI GED to also notify the guest kernel about any CPU hot(un)plug events.
Note, GED interface is used by many hotplug events like memory hotplug, NVDIMM hotplug and non-hotplug events like system power down event. Each of these can be selected using a bit in the 32 bit GED IO interface. A bit has been reserved for the CPU hotplug event.
ACPI CPU hotplug related initialization should only happen if ACPI_CPU_HOTPLUG support has been enabled for particular architecture. Add cpu_hotplug_hw_init() stub to avoid compilation break.
Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20240716111502.202344-4-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Igor Mammedov <imammedo@redhat.com>
show more ...
|
0a5b5acd | 08-Mar-2024 |
Ankit Agrawal <ankita@nvidia.com> |
hw/acpi: Implement the SRAT GI affinity structure
ACPI spec provides a scheme to associate "Generic Initiators" [1] (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with integr
hw/acpi: Implement the SRAT GI affinity structure
ACPI spec provides a scheme to associate "Generic Initiators" [1] (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with integrated compute or DMA engines GPUs) with Proximity Domains. This is achieved using Generic Initiator Affinity Structure in SRAT. During bootup, Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA node for each unique PXM ID encountered. Qemu currently do not implement these structures while building SRAT.
Add GI structures while building VM ACPI SRAT. The association between device and node are stored using acpi-generic-initiator object. Lookup presence of all such objects and use them to build these structures.
The structure needs a PCI device handle [2] that consists of the device BDF. The vfio-pci device corresponding to the acpi-generic-initiator object is located to determine the BDF.
[1] ACPI Spec 6.3, Section 5.2.16.6 [2] ACPI Spec 6.3, Table 5.80
Cc: Jonathan Cameron <qemu-devel@nongnu.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cedric Le Goater <clg@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-3-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
b64b7ed8 | 08-Mar-2024 |
Ankit Agrawal <ankita@nvidia.com> |
qom: new object to associate device to NUMA node
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows partitioning of the GPU device resources (including device memory) into sever
qom: new object to associate device to NUMA node
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows partitioning of the GPU device resources (including device memory) into several (upto 8) isolated instances. Each of the partitioned memory needs a dedicated NUMA node to operate. The partitions are not fixed and they can be created/deleted at runtime.
Unfortunately Linux OS does not provide a means to dynamically create/destroy NUMA nodes and such feature implementation is not expected to be trivial. The nodes that OS discovers at the boot time while parsing SRAT remains fixed. So we utilize the Generic Initiator (GI) Affinity structures that allows association between nodes and devices. Multiple GI structures per BDF is possible, allowing creation of multiple nodes by exposing unique PXM in each of these structures.
Implement the mechanism to build the GI affinity structures as Qemu currently does not. Introduce a new acpi-generic-initiator object to allow host admin link a device with an associated NUMA node. Qemu maintains this association and use this object to build the requisite GI Affinity Structure.
When multiple NUMA nodes are associated with a device, it is required to create those many number of acpi-generic-initiator objects, each representing a unique device:node association.
Following is one of a decoded GI affinity structure in VM ACPI SRAT. [0C8h 0200 1] Subtable Type : 05 [Generic Initiator Affinity] [0C9h 0201 1] Length : 20
[0CAh 0202 1] Reserved1 : 00 [0CBh 0203 1] Device Handle Type : 01 [0CCh 0204 4] Proximity Domain : 00000007 [0D0h 0208 16] Device Handle : 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 [0E0h 0224 4] Flags (decoded below) : 00000001 Enabled : 1 [0E4h 0228 4] Reserved2 : 00000000
[0E8h 0232 1] Subtable Type : 05 [Generic Initiator Affinity] [0E9h 0233 1] Length : 20
An admin can provide a range of acpi-generic-initiator objects, each associating a device (by providing the id through pci-dev argument) to the desired NUMA node (using the node argument). Currently, only PCI device is supported.
For the grace hopper system, create a range of 8 nodes and associate that with the device using the acpi-generic-initiator object. While a configuration of less than 8 nodes per device is allowed, such configuration will prevent utilization of the feature to the fullest. The following sample creates 8 nodes per PCI device for a VM with 2 PCI devices and link them to the respecitve PCI device using acpi-generic-initiator objects:
-numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \ -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \ -numa node,nodeid=8 -numa node,nodeid=9 \ -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \ -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \ -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \ -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \ -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \ -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \ -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \ -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \ -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \
-numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \ -numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \ -numa node,nodeid=16 -numa node,nodeid=17 \ -device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \ -object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \ -object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \ -object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \ -object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \ -object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \ -object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \ -object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \ -object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \
Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1] Cc: Jonathan Cameron <qemu-devel@nongnu.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-2-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
06680402 | 19-Feb-2024 |
Philippe Mathieu-Daudé <philmd@linaro.org> |
hw/acpi/ich9_tco: Include missing 'migration/vmstate.h' header
We need the VMStateDescription structure definition from "migration/vmstate.h" in order to declare vmstate_tco_io_sts.
Signed-off-by:
hw/acpi/ich9_tco: Include missing 'migration/vmstate.h' header
We need the VMStateDescription structure definition from "migration/vmstate.h" in order to declare vmstate_tco_io_sts.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20240219141412.71418-4-philmd@linaro.org>
show more ...
|
b8492bd4 | 26-Jan-2024 |
Philippe Mathieu-Daudé <philmd@linaro.org> |
hw/acpi/cpu: Use CPUState typedef
QEMU coding style recommend using structure typedefs: https://www.qemu.org/docs/master/devel/style.html#typedefs
Signed-off-by: Philippe Mathieu-Daudé <philmd@lina
hw/acpi/cpu: Use CPUState typedef
QEMU coding style recommend using structure typedefs: https://www.qemu.org/docs/master/devel/style.html#typedefs
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20240126220407.95022-2-philmd@linaro.org>
show more ...
|
c461f3e3 | 08-Sep-2023 |
Bernhard Beschow <shentey@gmail.com> |
hw/acpi/acpi_dev_interface: Remove now unused madt_cpu virtual method
This virtual method was always set to the x86-specific pc_madt_cpu_entry(), even in piix4 which is also used in MIPS. The previo
hw/acpi/acpi_dev_interface: Remove now unused madt_cpu virtual method
This virtual method was always set to the x86-specific pc_madt_cpu_entry(), even in piix4 which is also used in MIPS. The previous changes use pc_madt_cpu_entry() otherwise, so madt_cpu can be dropped.
Since pc_madt_cpu_entry() is now only used in x86-specific code, the stub in hw/acpi/acpi-x86-stub can be removed as well.
Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230908084234.17642-4-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|