#
44fa20c9 |
| 18-Sep-2023 |
Cédric Le Goater <clg@redhat.com> |
spapr: Remove support for NVIDIA V100 GPU with NVLink2
NVLink2 support was removed from the PPC PowerNV platform and VFIO in Linux 5.13 with commits :
562d1e207d32 ("powerpc/powernv: remove the n
spapr: Remove support for NVIDIA V100 GPU with NVLink2
NVLink2 support was removed from the PPC PowerNV platform and VFIO in Linux 5.13 with commits :
562d1e207d32 ("powerpc/powernv: remove the nvlink support") b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2")
This was 2.5 years ago. Do the same in QEMU with a revert of commit ec132efaa81f ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some adjustements are required on the NUMA part.
Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20230918091717.149950-1-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|
Revision tags: v8.0.0, v7.2.0, v7.0.0 |
|
#
0f9668e0 |
| 23-Mar-2022 |
Marc-André Lureau <marcandre.lureau@redhat.com> |
Remove qemu-common.h include from most units
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com> Signed-off-by: Paolo B
Remove qemu-common.h include from most units
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
b21e2380 |
| 15-Mar-2022 |
Markus Armbruster <armbru@redhat.com> |
Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, i
Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors.
This commit only touches allocations with size arguments of the form sizeof(T).
Patch created mechanically with:
$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \ --macro-file scripts/cocci-macro-file.h FILES...
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20220315144156.1595462-4-armbru@redhat.com> Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
show more ...
|
#
16282937 |
| 01-Mar-2022 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
hw/ppc/spapr_numa.c: simplify spapr_numa_write_assoc_lookup_arrays()
We can get the job done in spapr_numa_write_assoc_lookup_arrays() a bit cleaner:
- 'cur_index = int_buf = g_malloc0(..)' is doin
hw/ppc/spapr_numa.c: simplify spapr_numa_write_assoc_lookup_arrays()
We can get the job done in spapr_numa_write_assoc_lookup_arrays() a bit cleaner:
- 'cur_index = int_buf = g_malloc0(..)' is doing a g_malloc0() in the 'int_buf' pointer and making 'cur_index' point to 'int_buf' all in a single line. No problem with that, but splitting into 2 lines is clearer to follow
- use g_autofree in 'int_buf' to avoid a g_free() call later on
- 'buf_len' is only being used to store the size of 'int_buf' malloc. Remove the var and just use the value in g_malloc0() directly
- remove the 'ret' var and just return the result of fdt_setprop()
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220228175004.8862-12-danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
show more ...
|
Revision tags: v6.2.0 |
|
#
1fde73bc |
| 10-Nov-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: fix FORM1 distance-less nodes
Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA nodes in which the distance info is absent in machine_state->numa_state->nodes. Thi
spapr_numa.c: fix FORM1 distance-less nodes
Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA nodes in which the distance info is absent in machine_state->numa_state->nodes. This happens when QEMU adds a default NUMA node and when the user adds NUMA nodes without specifying the distances.
During the discussions of the forementioned patch [1] it was found that FORM1 guests were behaving in a strange way in the same scenario, with the kernel seeing the distances between the nodes as '160', as we can see in this example with 4 NUMA nodes without distance information:
$ numactl -H available: 4 nodes (0-3) (...) node distances: node 0 1 2 3 0: 10 160 160 160 1: 160 10 160 160 2: 160 160 10 160 3: 160 160 160 10
Turns out that we have the same problem with FORM1 guests - we are calculating associativity domain using zeroed values. And as it also turns out, the solution from 71e6fae3a99 applies to FORM1 as well.
This patch creates a wrapper called 'get_numa_distance' that contains the logic used in FORM2 to define node distances when this information is absent. This helper is then used in all places where we need to read distance information from machine_state->numa_state->nodes. That way we'll guarantee that the NUMA node distance is always being curated before being used.
After this patch, the FORM1 guest mentioned above will have the following topology:
$ numactl -H available: 4 nodes (0-3) (...) node distances: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 20 2: 20 20 10 20 3: 20 20 20 10
This is compatible with what FORM2 guests and other archs do in this case.
[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01960.html
Fixes: 690fbe4295d5 ("spapr_numa: consider user input when defining associativity") CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> CC: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
show more ...
|
#
71e6fae3 |
| 05-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
spapr_numa.c: FORM2 table handle nodes with no distance info
A configuration that specifies multiple nodes without distance info results in the non-local points in the FORM2 matrix having a distance
spapr_numa.c: FORM2 table handle nodes with no distance info
A configuration that specifies multiple nodes without distance info results in the non-local points in the FORM2 matrix having a distance of 0. This causes Linux to complain "Invalid distance value range" because a node distance is smaller than the local distance.
Fix this by building a simple local / remote fallback for points where distance information is missing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20211105135137.1584840-1-npiggin@gmail.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
28d86252 |
| 22-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: fixes in spapr_numa_FORM2_write_rtas_tables()
This patch has a handful of modifications for the recent added FORM2 support:
- to not allocate more than the necessary size in 'distance
spapr_numa.c: fixes in spapr_numa_FORM2_write_rtas_tables()
This patch has a handful of modifications for the recent added FORM2 support:
- to not allocate more than the necessary size in 'distance_table'. At this moment the array is oversized due to allocating uint32_t for all elements, when most of them fits in an uint8_t. Fix it by changing the array to uint8_t and allocating the exact size;
- use stl_be_p() to store the uint32_t at the start of 'distance_table';
- use sizeof(uint32_t) to skip the uint32_t length when populating the distances;
- use the NUMA_DISTANCE_MIN macro from sysemu/numa.h to avoid hardcoding the local distance value.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210922122852.130054-2-danielhb413@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
0d5ba481 |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: handle auto NUMA node with no distance info
numa_complete_configuration() in hw/core/numa.c always adds a NUMA node for the pSeries machine if none was specified, but without node dist
spapr_numa.c: handle auto NUMA node with no distance info
numa_complete_configuration() in hw/core/numa.c always adds a NUMA node for the pSeries machine if none was specified, but without node distance information for the single node created.
NUMA FORM1 affinity code didn't rely on numa_state information to do its job, but FORM2 does. As is now, this is the result of a pSeries guest with NUMA FORM2 affinity when no NUMA nodes is specified:
$ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB node 0 free: 15681 MB No distance information available.
This can be amended in spapr_numa_FORM2_write_rtas_tables(). We're enforcing that the local distance (the distance to the node to itself) is always 10. This allows for the proper creation of the NUMA distance tables, fixing the output of 'numactl -H' in the guest:
$ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB node 0 free: 15685 MB node distances: node 0 0: 10
CC: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-8-danielhb413@gmail.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
e0eb84d4 |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: FORM2 NUMA affinity support
The main feature of FORM2 affinity support is the separation of NUMA distances from ibm,associativity information. This allows for a more flexible and strai
spapr_numa.c: FORM2 NUMA affinity support
The main feature of FORM2 affinity support is the separation of NUMA distances from ibm,associativity information. This allows for a more flexible and straightforward NUMA distance assignment without relying on complex associations between several levels of NUMA via ibm,associativity matches. Another feature is its extensibility. This base support contains the facilities for NUMA distance assignment, but in the future more facilities will be added for latency, performance, bandwidth and so on.
This patch implements the base FORM2 affinity support as follows:
- the use of FORM2 associativity is indicated by using bit 2 of byte 5 of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1 or FORM2 affinity. Setting both forms will default to FORM2. We're not advertising FORM2 for pseries-6.1 and older machine versions to prevent guest visible changes in those;
- ibm,associativity-reference-points has a new semantic. Instead of being used to calculate distances via NUMA levels, it's now used to indicate the primary domain index in the ibm,associativity domain of each resource. In our case it's set to {0x4}, matching the position where we already place logical_domain_id;
- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table and ibm,numa-distance-table. The index table is used to list all the NUMA logical domains of the platform, in ascending order, and allows for spartial NUMA configurations (although QEMU ATM doesn't support that). ibm,numa-distance-table is an array that contains all the distances from the first NUMA node to all other nodes, then the second NUMA node distances to all other nodes and so on;
- get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity() now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest selected FORM2 affinity during CAS.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-7-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
5dab5abe |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr: move FORM1 verifications to post CAS
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less) NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM device that has
spapr: move FORM1 verifications to post CAS
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less) NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM device that has a different latency than the original NUMA node from the regular memory. FORM2 is also able to deal with asymmetric NUMA distances gracefully, something that our FORM1 implementation doesn't do.
Move these FORM1 verifications to a new function and wait until after CAS, when we're sure that we're sticking with FORM1, to enforce them.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-6-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
a165ac67 |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array
Introducing a new NUMA affinity, FORM2, requires a new mechanism to switch between affinity modes after CAS. Also, we want FORM2 data struc
spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array
Introducing a new NUMA affinity, FORM2, requires a new mechanism to switch between affinity modes after CAS. Also, we want FORM2 data structures and functions to be completely separated from the existing FORM1 code, allowing us to avoid adding new code that inherits the existing complexity of FORM1.
The idea of switching values used by the write_dt() functions in spapr_numa.c was already introduced in the previous patch, and the same approach will be used when dealing with the FORM1 and FORM2 arrays.
We can accomplish that by that by renaming the existing numa_assoc_array to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity data. A new helper get_associativity() is then introduced to be used by the write_dt() functions to retrieve the current ibm,associativity array of a given node, after considering affinity selection that might have been done during CAS. All code that was using numa_assoc_array now needs to retrieve the array by calling this function.
This will allow for an easier plug of FORM2 data later on.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-5-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
3a6e4ce6 |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: parametrize FORM1 macros
The next preliminary step to introduce NUMA FORM2 affinity is to make the existing code independent of FORM1 macros and values, i.e. MAX_DISTANCE_REF_POINTS, N
spapr_numa.c: parametrize FORM1 macros
The next preliminary step to introduce NUMA FORM2 affinity is to make the existing code independent of FORM1 macros and values, i.e. MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch accomplishes that by doing the following:
- move the NUMA related macros from spapr.h to spapr_numa.c where they are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used to refer to the maximum number of NUMA nodes, including GPU nodes, that the machine can support;
- MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific macros are used in FORM1 init functions;
- code that uses MAX_DISTANCE_REF_POINTS now retrieves the max_dist_ref_points value using get_max_dist_ref_points(). NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE is replaced by get_vcpu_assoc_size(). These functions are used by the generic device tree functions and h_home_node_associativity() and will allow them to switch between FORM1 and FORM2 without changing their core logic.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-4-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
afa3b3c9 |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: scrap 'legacy_numa' concept
When first introduced, 'legacy_numa' was a way to refer to guests that either wouldn't be affected by associativity domain calculations, namely the ones wit
spapr_numa.c: scrap 'legacy_numa' concept
When first introduced, 'legacy_numa' was a way to refer to guests that either wouldn't be affected by associativity domain calculations, namely the ones with only 1 NUMA node, and pre 5.2 guests that shouldn't be affected by it because it would be an userspace change. Calling these cases 'legacy_numa' was a convenient way to label these cases.
We're about to introduce a new NUMA affinity, FORM2, and this concept of 'legacy_numa' is now a bit misleading because, although it is called 'legacy' it is in fact a FORM1 exclusive contraint.
This patch removes spapr_machine_using_legacy_numa() and open code the conditions in each caller. While we're at it, move the chunk inside spapr_numa_FORM1_affinity_init() that sets all numa_assoc_array domains with 'node_id' to spapr_numa_define_FORM1_domains(). This chunk was being executed if !pre_5_2_numa_associativity and num_nodes => 1, the same conditions in which spapr_numa_define_FORM1_domains() is called shortly after.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
d98dbe2a |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: split FORM1 code into helpers
The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies and doesn't need be concerned with all the legacy support for older pseries FORM1
spapr_numa.c: split FORM1 code into helpers
The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies and doesn't need be concerned with all the legacy support for older pseries FORM1 guests.
We're also not going to calculate associativity domains based on numa distance (via spapr_numa_define_associativity_domains) since the distances will be written directly into new DT properties.
Let's split FORM1 code into its own functions to allow for easier insertion of FORM2 logic later on.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
Revision tags: v6.1.0 |
|
#
f4ceebde |
| 13-Feb-2021 |
Peter Maydell <peter.maydell@linaro.org> |
Merge remote-tracking branch 'remotes/vivier/tags/m68k-for-6.0-pull-request' into staging Pull request m68k-20210212 Move bootinfo headers to include/standard-headers/asm-m68k A
Merge remote-tracking branch 'remotes/vivier/tags/m68k-for-6.0-pull-request' into staging Pull request m68k-20210212 Move bootinfo headers to include/standard-headers/asm-m68k Add M68K_FEATURE_MSP, M68K_FEATURE_MOVEC, M68K_FEATURE_M68010 Add 68060 CR BUSCR and PCR (unimplemented) CPU types and features cleanup # gpg: Signature made Fri 12 Feb 2021 21:14:28 GMT # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier/tags/m68k-for-6.0-pull-request: m68k: import bootinfo headers from linux m68k: add MSP detection support for stack pointer swap helpers m68k: MOVEC insn. should generate exception if wrong CR is accessed m68k: add missing BUSCR/PCR CR defines, and BUSCR/PCR/CAAR CR to m68k_move_to/from m68k: improve comments on m68k_move_to/from helpers m68k: cascade m68k_features by m680xx_cpu_initfn() to improve readability m68k: improve cpu instantiation comments Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
show more ...
|
#
83339e21 |
| 10-Feb-2021 |
Peter Maydell <peter.maydell@linaro.org> |
Merge remote-tracking branch 'remotes/stefanha-gitlab/tags/block-pull-request' into staging Pull request v4: * Add PCI_EXPRESS Kconfig dependency to fix s390x in "multi-process
Merge remote-tracking branch 'remotes/stefanha-gitlab/tags/block-pull-request' into staging Pull request v4: * Add PCI_EXPRESS Kconfig dependency to fix s390x in "multi-process: setup PCI host bridge for remote device" [Philippe and Thomas] # gpg: Signature made Wed 10 Feb 2021 09:26:14 GMT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha-gitlab/tags/block-pull-request: (27 commits) docs: fix Parallels Image "dirty bitmap" section multi-process: perform device reset in the remote process multi-process: Retrieve PCI info from remote process multi-process: create IOHUB object to handle irq multi-process: Synchronize remote memory multi-process: PCI BAR read/write handling for proxy & remote endpoints multi-process: Forward PCI config space acceses to the remote process multi-process: add proxy communication functions multi-process: introduce proxy object multi-process: setup memory manager for remote device multi-process: Associate fd of a PCIDevice with its object multi-process: Initialize message handler in remote device multi-process: define MPQemuMsg format and transmission functions io: add qio_channel_readv_full_all_eof & qio_channel_readv_full_all helpers io: add qio_channel_writev_full_all helper multi-process: setup a machine object for remote device process multi-process: setup PCI host bridge for remote device multi-process: Add config option for multi-process QEMU memory: alloc RAM from file at offset multi-process: add configure and usage information ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
show more ...
|
#
b01fec36 |
| 28-Jan-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: fix ibm,max-associativity-domains calculation The current logic for calculating 'maxdomain' making it a sum of numa_state->num_nodes with spapr->gpu_numa_id. spapr->gpu_num
spapr_numa.c: fix ibm,max-associativity-domains calculation The current logic for calculating 'maxdomain' making it a sum of numa_state->num_nodes with spapr->gpu_numa_id. spapr->gpu_numa_id is used as a index to determine the next available NUMA id that a given NVGPU can use. The problem is that the initial value of gpu_numa_id, for any topology that has more than one NUMA node, is equal to numa_state->num_nodes. This means that our maxdomain will always be, at least, twice the amount of existing NUMA nodes. This means that a guest with 4 NUMA nodes will end up with the following max-associativity-domains: rtas/ibm,max-associativity-domains 00000004 00000008 00000008 00000008 00000008 This overtuning of maxdomains doesn't go unnoticed in the guest, being detected in SLUB during boot: dmesg | grep SLUB [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=8 SLUB is detecting 8 total nodes, with 4 nodes being online. This patch fixes ibm,max-associativity-domains by considering the amount of NVGPUs NUMA nodes presented in the guest, instead of just spapr->gpu_numa_id. Reported-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-4-danielhb413@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
66407069 |
| 28-Jan-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helper We'll need to check the initial value given to spapr->gpu_numa_id when building the rtas DT, so put it in a helper for easi
spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helper We'll need to check the initial value given to spapr->gpu_numa_id when building the rtas DT, so put it in a helper for easier access and to avoid repetition. Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
3b880445 |
| 28-Jan-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr: move spapr_machine_using_legacy_numa() to spapr_numa.c This function is used only in spapr_numa.c. Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <grou
spapr: move spapr_machine_using_legacy_numa() to spapr_numa.c This function is used only in spapr_numa.c. Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
Revision tags: v5.2.0 |
|
#
4a7c0bd9 |
| 09-Oct-2020 |
Peter Maydell <peter.maydell@linaro.org> |
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.2-20201009' into staging ppc patch queue 2020-10-09 Here's the next set of ppc related patches for qemu-5.2. There are
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.2-20201009' into staging ppc patch queue 2020-10-09 Here's the next set of ppc related patches for qemu-5.2. There are two main things here: * Cleanups to error handling in spapr from Greg Kurz * Improvements to NUMA handling for spapr from Daniel Barboza There are also a handful of other bugfixes. # gpg: Signature made Fri 09 Oct 2020 07:02:29 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-5.2-20201009: specs/ppc-spapr-numa: update with new NUMA support spapr_numa: consider user input when defining associativity spapr_numa: change reference-points and maxdomain settings spapr_numa: forbid asymmetrical NUMA setups spapr: add spapr_machine_using_legacy_numa() helper ppc/pnv: Increase max firmware size spapr: Add a return value to spapr_check_pagesize() spapr: Add a return value to spapr_nvdimm_validate() spapr: Simplify error handling in spapr_cpu_core_realize() spapr: Add a return value to spapr_set_vcpu_id() spapr: Simplify error handling in prop_get_fdt() spapr: Add a return value to spapr_drc_attach() spapr: Simplify error handling in spapr_vio_busdev_realize() spapr: Simplify error handling in do_client_architecture_support() spapr: Get rid of cas_check_pvr() error reporting spapr: Simplify error handling in callers of ppc_set_compat() ppc: Fix return value in cpu_post_load() error path ppc: Add a return value to ppc_set_compat() and ppc_set_compat_all() spapr: Fix error leak in spapr_realize_vcpu() spapr: Handle HPT allocation failure in nested guest Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
show more ...
|
#
690fbe42 |
| 07-Oct-2020 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa: consider user input when defining associativity A new function called spapr_numa_define_associativity_domains() is created to calculate the associativity domains and change
spapr_numa: consider user input when defining associativity A new function called spapr_numa_define_associativity_domains() is created to calculate the associativity domains and change the associativity arrays considering user input. This is how the associativity domain between two NUMA nodes A and B is calculated: - get the distance D between them - get the correspondent NUMA level 'n_level' for D. This is done via a helper called spapr_numa_get_numa_level() - all associativity arrays were initialized with their own numa_ids, and we're calculating the distance in node_id ascending order, starting from node id 0 (the first node retrieved by numa_state). This will have a cascade effect in the algorithm because the associativity domains that node 0 defines will be carried over to other nodes, and node 1 associativities will be carried over after taking node 0 associativities into account, and so on. This happens because we'll assign assoc_src as the associativity domain of dst as well, for all NUMA levels beyond and including n_level. The PPC kernel expects the associativity domains of the first node (node id 0) to be always 0 [1], and this algorithm will grant that by default. Ultimately, all of this results in a best effort approximation for the actual NUMA distances the user input in the command line. Given the nature of how PAPR itself interprets NUMA distances versus the expectations risen by how ACPI SLIT works, there might be better algorithms but, in the end, it'll also result in another way to approximate what the user really wanted. To keep this commit message no longer than it already is, the next patch will update the existing documentation in ppc-spapr-numa.rst with more in depth details and design considerations/drawbacks. [1] https://lore.kernel.org/linuxppc-dev/5e8fbea3-8faf-0951-172a-b41a2138fbcf@gmail.com/ Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20201007172849.302240-5-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
491e884e |
| 07-Oct-2020 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa: change reference-points and maxdomain settings This is the first guest visible change introduced in spapr_numa.c. The previous settings of both reference-points and maxdo
spapr_numa: change reference-points and maxdomain settings This is the first guest visible change introduced in spapr_numa.c. The previous settings of both reference-points and maxdomains were too restrictive, but enough for the existing associativity we're setting in the resources. We'll change that in the following patches, populating the associativity arrays based on user input. For those changes to be effective, reference-points and maxdomains must be more flexible. After this patch, we'll have 4 distinct levels of NUMA (0x4, 0x3, 0x2, 0x1) and maxdomains will allow for any type of configuration the user intends to do - under the scope and limitations of PAPR itself, of course. Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20201007172849.302240-4-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
ee6635b2 |
| 07-Oct-2020 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa: forbid asymmetrical NUMA setups The pSeries machine does not support asymmetrical NUMA configurations. This doesn't make much of a different since we're not using user in
spapr_numa: forbid asymmetrical NUMA setups The pSeries machine does not support asymmetrical NUMA configurations. This doesn't make much of a different since we're not using user input for pSeries NUMA setup, but this will change in the next patches. To avoid breaking existing setups, gate this change by checking for legacy NUMA support. Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20201007172849.302240-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
2499453e |
| 11-Sep-2020 |
Peter Maydell <peter.maydell@linaro.org> |
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches: - qemu-img create: Fail gracefully when backing file is an empty string - Fixes related to filter block nodes ("Deal with filters" series) - block/nvme: Various cleanups required to use multiple queues - block/nvme: Use NvmeBar structure from "block/nvme.h" - file-win32: Fix "locking" option - iotests: Allow running from different directory # gpg: Signature made Thu 10 Sep 2020 10:11:19 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (65 commits) block/qcow2-cluster: Add missing "fallthrough" annotation block/nvme: Pair doorbell registers block/nvme: Use generic NvmeBar structure block/nvme: Group controller registers in NVMeRegs structure file-win32: Fix "locking" option iotests: Allow running from different directory iotests: Test committing to overridden backing iotests: Add test for commit in sub directory iotests: Add filter mirror test cases iotests: Add filter commit test cases iotests: Let complete_and_wait() work with commit iotests: Test that qcow2's data-file is flushed block: Leave BDS.backing_{file,format} constant block: Inline bdrv_co_block_status_from_*() blockdev: Fix active commit choice block: Drop backing_bs() qemu-img: Use child access functions nbd: Use CAF when looking for dirty bitmap commit: Deal with filters backup: Deal with filters ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
show more ...
|
#
9435a8b3 |
| 08-Sep-2020 |
Peter Maydell <peter.maydell@linaro.org> |
Merge remote-tracking branch 'remotes/kraxel/tags/sirius/ipxe-20200908-pull-request' into staging ipxe: update to aug 2020 snapshot. # gpg: Signature made Tue 08 Sep 2020 07:09:26 B
Merge remote-tracking branch 'remotes/kraxel/tags/sirius/ipxe-20200908-pull-request' into staging ipxe: update to aug 2020 snapshot. # gpg: Signature made Tue 08 Sep 2020 07:09:26 BST # gpg: using RSA key 4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full] # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" [full] # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full] # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/sirius/ipxe-20200908-pull-request: ipxe: update binaries ipxe: drop ia32 efi roms ipxe: update submodule Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
show more ...
|