History log of /openbmc/qemu/hw/ppc/spapr_numa.c (Results 1 – 25 of 35)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 44fa20c9 18-Sep-2023 Cédric Le Goater <clg@redhat.com>

spapr: Remove support for NVIDIA V100 GPU with NVLink2

NVLink2 support was removed from the PPC PowerNV platform and VFIO in
Linux 5.13 with commits :

562d1e207d32 ("powerpc/powernv: remove the n

spapr: Remove support for NVIDIA V100 GPU with NVLink2

NVLink2 support was removed from the PPC PowerNV platform and VFIO in
Linux 5.13 with commits :

562d1e207d32 ("powerpc/powernv: remove the nvlink support")
b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2")

This was 2.5 years ago. Do the same in QEMU with a revert of commit
ec132efaa81f ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some
adjustements are required on the NUMA part.

Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20230918091717.149950-1-clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

show more ...


Revision tags: v8.0.0, v7.2.0, v7.0.0
# 0f9668e0 23-Mar-2022 Marc-André Lureau <marcandre.lureau@redhat.com>

Remove qemu-common.h include from most units

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com>
Signed-off-by: Paolo B

Remove qemu-common.h include from most units

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# b21e2380 15-Mar-2022 Markus Armbruster <armbru@redhat.com>

Use g_new() & friends where that makes obvious sense

g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, i

Use g_new() & friends where that makes obvious sense

g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

This commit only touches allocations with size arguments of the form
sizeof(T).

Patch created mechanically with:

$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
--macro-file scripts/cocci-macro-file.h FILES...

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220315144156.1595462-4-armbru@redhat.com>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>

show more ...


# 16282937 01-Mar-2022 Daniel Henrique Barboza <danielhb413@gmail.com>

hw/ppc/spapr_numa.c: simplify spapr_numa_write_assoc_lookup_arrays()

We can get the job done in spapr_numa_write_assoc_lookup_arrays() a bit
cleaner:

- 'cur_index = int_buf = g_malloc0(..)' is doin

hw/ppc/spapr_numa.c: simplify spapr_numa_write_assoc_lookup_arrays()

We can get the job done in spapr_numa_write_assoc_lookup_arrays() a bit
cleaner:

- 'cur_index = int_buf = g_malloc0(..)' is doing a g_malloc0() in the
'int_buf' pointer and making 'cur_index' point to 'int_buf' all in a
single line. No problem with that, but splitting into 2 lines is clearer
to follow

- use g_autofree in 'int_buf' to avoid a g_free() call later on

- 'buf_len' is only being used to store the size of 'int_buf' malloc.
Remove the var and just use the value in g_malloc0() directly

- remove the 'ret' var and just return the result of fdt_setprop()

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-12-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

show more ...


Revision tags: v6.2.0
# 1fde73bc 10-Nov-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: fix FORM1 distance-less nodes

Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA
nodes in which the distance info is absent in
machine_state->numa_state->nodes. Thi

spapr_numa.c: fix FORM1 distance-less nodes

Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA
nodes in which the distance info is absent in
machine_state->numa_state->nodes. This happens when QEMU adds a default
NUMA node and when the user adds NUMA nodes without specifying the
distances.

During the discussions of the forementioned patch [1] it was found that
FORM1 guests were behaving in a strange way in the same scenario, with
the kernel seeing the distances between the nodes as '160', as we can
see in this example with 4 NUMA nodes without distance information:

$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node 0 1 2 3
0: 10 160 160 160
1: 160 10 160 160
2: 160 160 10 160
3: 160 160 160 10

Turns out that we have the same problem with FORM1 guests - we are
calculating associativity domain using zeroed values. And as it also
turns out, the solution from 71e6fae3a99 applies to FORM1 as well.

This patch creates a wrapper called 'get_numa_distance' that contains
the logic used in FORM2 to define node distances when this information
is absent. This helper is then used in all places where we need to read
distance information from machine_state->numa_state->nodes. That way
we'll guarantee that the NUMA node distance is always being curated
before being used.

After this patch, the FORM1 guest mentioned above will have the
following topology:

$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node 0 1 2 3
0: 10 20 20 20
1: 20 10 20 20
2: 20 20 10 20
3: 20 20 20 10

This is compatible with what FORM2 guests and other archs do in this
case.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01960.html

Fixes: 690fbe4295d5 ("spapr_numa: consider user input when defining associativity")
CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
CC: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

show more ...


# 71e6fae3 05-Nov-2021 Nicholas Piggin <npiggin@gmail.com>

spapr_numa.c: FORM2 table handle nodes with no distance info

A configuration that specifies multiple nodes without distance info
results in the non-local points in the FORM2 matrix having a distance

spapr_numa.c: FORM2 table handle nodes with no distance info

A configuration that specifies multiple nodes without distance info
results in the non-local points in the FORM2 matrix having a distance of
0. This causes Linux to complain "Invalid distance value range" because
a node distance is smaller than the local distance.

Fix this by building a simple local / remote fallback for points where
distance information is missing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20211105135137.1584840-1-npiggin@gmail.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 28d86252 22-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: fixes in spapr_numa_FORM2_write_rtas_tables()

This patch has a handful of modifications for the recent added
FORM2 support:

- to not allocate more than the necessary size in 'distance

spapr_numa.c: fixes in spapr_numa_FORM2_write_rtas_tables()

This patch has a handful of modifications for the recent added
FORM2 support:

- to not allocate more than the necessary size in 'distance_table'.
At this moment the array is oversized due to allocating uint32_t for
all elements, when most of them fits in an uint8_t. Fix it by
changing the array to uint8_t and allocating the exact size;

- use stl_be_p() to store the uint32_t at the start of 'distance_table';

- use sizeof(uint32_t) to skip the uint32_t length when populating the
distances;

- use the NUMA_DISTANCE_MIN macro from sysemu/numa.h to avoid hardcoding
the local distance value.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210922122852.130054-2-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 0d5ba481 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: handle auto NUMA node with no distance info

numa_complete_configuration() in hw/core/numa.c always adds a NUMA node
for the pSeries machine if none was specified, but without node dist

spapr_numa.c: handle auto NUMA node with no distance info

numa_complete_configuration() in hw/core/numa.c always adds a NUMA node
for the pSeries machine if none was specified, but without node distance
information for the single node created.

NUMA FORM1 affinity code didn't rely on numa_state information to do its
job, but FORM2 does. As is now, this is the result of a pSeries guest
with NUMA FORM2 affinity when no NUMA nodes is specified:

$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 16222 MB
node 0 free: 15681 MB
No distance information available.

This can be amended in spapr_numa_FORM2_write_rtas_tables(). We're
enforcing that the local distance (the distance to the node to itself) is
always 10. This allows for the proper creation of the NUMA distance tables,
fixing the output of 'numactl -H' in the guest:

$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 16222 MB
node 0 free: 15685 MB
node distances:
node 0
0: 10

CC: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-8-danielhb413@gmail.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# e0eb84d4 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: FORM2 NUMA affinity support

The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and strai

spapr_numa.c: FORM2 NUMA affinity support

The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and straightforward NUMA distance assignment without relying on
complex associations between several levels of NUMA via
ibm,associativity matches. Another feature is its extensibility. This base
support contains the facilities for NUMA distance assignment, but in the
future more facilities will be added for latency, performance, bandwidth
and so on.

This patch implements the base FORM2 affinity support as follows:

- the use of FORM2 associativity is indicated by using bit 2 of byte 5
of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
or FORM2 affinity. Setting both forms will default to FORM2. We're not
advertising FORM2 for pseries-6.1 and older machine versions to prevent
guest visible changes in those;

- ibm,associativity-reference-points has a new semantic. Instead of
being used to calculate distances via NUMA levels, it's now used to
indicate the primary domain index in the ibm,associativity domain of
each resource. In our case it's set to {0x4}, matching the position
where we already place logical_domain_id;

- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table
and ibm,numa-distance-table. The index table is used to list all the
NUMA logical domains of the platform, in ascending order, and allows for
spartial NUMA configurations (although QEMU ATM doesn't support that).
ibm,numa-distance-table is an array that contains all the distances from
the first NUMA node to all other nodes, then the second NUMA node
distances to all other nodes and so on;

- get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity()
now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest
selected FORM2 affinity during CAS.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 5dab5abe 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr: move FORM1 verifications to post CAS

FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has

spapr: move FORM1 verifications to post CAS

FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has a different latency than the original NUMA node from the
regular memory. FORM2 is also able to deal with asymmetric NUMA
distances gracefully, something that our FORM1 implementation doesn't
do.

Move these FORM1 verifications to a new function and wait until after
CAS, when we're sure that we're sticking with FORM1, to enforce them.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# a165ac67 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array

Introducing a new NUMA affinity, FORM2, requires a new mechanism to
switch between affinity modes after CAS. Also, we want FORM2 data
struc

spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array

Introducing a new NUMA affinity, FORM2, requires a new mechanism to
switch between affinity modes after CAS. Also, we want FORM2 data
structures and functions to be completely separated from the existing
FORM1 code, allowing us to avoid adding new code that inherits the
existing complexity of FORM1.

The idea of switching values used by the write_dt() functions in
spapr_numa.c was already introduced in the previous patch, and
the same approach will be used when dealing with the FORM1 and FORM2
arrays.

We can accomplish that by that by renaming the existing numa_assoc_array
to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity
data. A new helper get_associativity() is then introduced to be used by the
write_dt() functions to retrieve the current ibm,associativity array of
a given node, after considering affinity selection that might have been
done during CAS. All code that was using numa_assoc_array now needs to
retrieve the array by calling this function.

This will allow for an easier plug of FORM2 data later on.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 3a6e4ce6 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: parametrize FORM1 macros

The next preliminary step to introduce NUMA FORM2 affinity is to make
the existing code independent of FORM1 macros and values, i.e.
MAX_DISTANCE_REF_POINTS, N

spapr_numa.c: parametrize FORM1 macros

The next preliminary step to introduce NUMA FORM2 affinity is to make
the existing code independent of FORM1 macros and values, i.e.
MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch
accomplishes that by doing the following:

- move the NUMA related macros from spapr.h to spapr_numa.c where they
are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used
to refer to the maximum number of NUMA nodes, including GPU nodes, that
the machine can support;

- MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to
FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific
macros are used in FORM1 init functions;

- code that uses MAX_DISTANCE_REF_POINTS now retrieves the
max_dist_ref_points value using get_max_dist_ref_points().
NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE
is replaced by get_vcpu_assoc_size(). These functions are used by the
generic device tree functions and h_home_node_associativity() and will
allow them to switch between FORM1 and FORM2 without changing their core
logic.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# afa3b3c9 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: scrap 'legacy_numa' concept

When first introduced, 'legacy_numa' was a way to refer to guests that
either wouldn't be affected by associativity domain calculations, namely
the ones wit

spapr_numa.c: scrap 'legacy_numa' concept

When first introduced, 'legacy_numa' was a way to refer to guests that
either wouldn't be affected by associativity domain calculations, namely
the ones with only 1 NUMA node, and pre 5.2 guests that shouldn't be
affected by it because it would be an userspace change. Calling these
cases 'legacy_numa' was a convenient way to label these cases.

We're about to introduce a new NUMA affinity, FORM2, and this concept
of 'legacy_numa' is now a bit misleading because, although it is called
'legacy' it is in fact a FORM1 exclusive contraint.

This patch removes spapr_machine_using_legacy_numa() and open code the
conditions in each caller. While we're at it, move the chunk inside
spapr_numa_FORM1_affinity_init() that sets all numa_assoc_array domains
with 'node_id' to spapr_numa_define_FORM1_domains(). This chunk was
being executed if !pre_5_2_numa_associativity and num_nodes => 1, the
same conditions in which spapr_numa_define_FORM1_domains() is called
shortly after.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# d98dbe2a 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: split FORM1 code into helpers

The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies
and doesn't need be concerned with all the legacy support for older
pseries FORM1

spapr_numa.c: split FORM1 code into helpers

The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies
and doesn't need be concerned with all the legacy support for older
pseries FORM1 guests.

We're also not going to calculate associativity domains based on numa
distance (via spapr_numa_define_associativity_domains) since the
distances will be written directly into new DT properties.

Let's split FORM1 code into its own functions to allow for easier
insertion of FORM2 logic later on.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


Revision tags: v6.1.0
# f4ceebde 13-Feb-2021 Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/vivier/tags/m68k-for-6.0-pull-request' into staging

Pull request m68k-20210212

Move bootinfo headers to include/standard-headers/asm-m68k
A

Merge remote-tracking branch 'remotes/vivier/tags/m68k-for-6.0-pull-request' into staging

Pull request m68k-20210212

Move bootinfo headers to include/standard-headers/asm-m68k
Add M68K_FEATURE_MSP, M68K_FEATURE_MOVEC, M68K_FEATURE_M68010
Add 68060 CR BUSCR and PCR (unimplemented)
CPU types and features cleanup

# gpg: Signature made Fri 12 Feb 2021 21:14:28 GMT
# gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg: issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C

* remotes/vivier/tags/m68k-for-6.0-pull-request:
m68k: import bootinfo headers from linux
m68k: add MSP detection support for stack pointer swap helpers
m68k: MOVEC insn. should generate exception if wrong CR is accessed
m68k: add missing BUSCR/PCR CR defines, and BUSCR/PCR/CAAR CR to m68k_move_to/from
m68k: improve comments on m68k_move_to/from helpers
m68k: cascade m68k_features by m680xx_cpu_initfn() to improve readability
m68k: improve cpu instantiation comments

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

show more ...


# 83339e21 10-Feb-2021 Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/stefanha-gitlab/tags/block-pull-request' into staging

Pull request

v4:
* Add PCI_EXPRESS Kconfig dependency to fix s390x in "multi-process

Merge remote-tracking branch 'remotes/stefanha-gitlab/tags/block-pull-request' into staging

Pull request

v4:
* Add PCI_EXPRESS Kconfig dependency to fix s390x in "multi-process: setup PCI
host bridge for remote device" [Philippe and Thomas]

# gpg: Signature made Wed 10 Feb 2021 09:26:14 GMT
# gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha-gitlab/tags/block-pull-request: (27 commits)
docs: fix Parallels Image "dirty bitmap" section
multi-process: perform device reset in the remote process
multi-process: Retrieve PCI info from remote process
multi-process: create IOHUB object to handle irq
multi-process: Synchronize remote memory
multi-process: PCI BAR read/write handling for proxy & remote endpoints
multi-process: Forward PCI config space acceses to the remote process
multi-process: add proxy communication functions
multi-process: introduce proxy object
multi-process: setup memory manager for remote device
multi-process: Associate fd of a PCIDevice with its object
multi-process: Initialize message handler in remote device
multi-process: define MPQemuMsg format and transmission functions
io: add qio_channel_readv_full_all_eof & qio_channel_readv_full_all helpers
io: add qio_channel_writev_full_all helper
multi-process: setup a machine object for remote device process
multi-process: setup PCI host bridge for remote device
multi-process: Add config option for multi-process QEMU
memory: alloc RAM from file at offset
multi-process: add configure and usage information
...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

show more ...


# b01fec36 28-Jan-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: fix ibm,max-associativity-domains calculation

The current logic for calculating 'maxdomain' making it a sum of
numa_state->num_nodes with spapr->gpu_numa_id. spapr->gpu_num

spapr_numa.c: fix ibm,max-associativity-domains calculation

The current logic for calculating 'maxdomain' making it a sum of
numa_state->num_nodes with spapr->gpu_numa_id. spapr->gpu_numa_id is
used as a index to determine the next available NUMA id that a
given NVGPU can use.

The problem is that the initial value of gpu_numa_id, for any topology
that has more than one NUMA node, is equal to numa_state->num_nodes.
This means that our maxdomain will always be, at least, twice the
amount of existing NUMA nodes. This means that a guest with 4 NUMA
nodes will end up with the following max-associativity-domains:

rtas/ibm,max-associativity-domains
00000004 00000008 00000008 00000008 00000008

This overtuning of maxdomains doesn't go unnoticed in the guest, being
detected in SLUB during boot:

dmesg | grep SLUB
[ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=8

SLUB is detecting 8 total nodes, with 4 nodes being online.

This patch fixes ibm,max-associativity-domains by considering the amount
of NVGPUs NUMA nodes presented in the guest, instead of just
spapr->gpu_numa_id.

Reported-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-4-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 66407069 28-Jan-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helper

We'll need to check the initial value given to spapr->gpu_numa_id when
building the rtas DT, so put it in a helper for easi

spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helper

We'll need to check the initial value given to spapr->gpu_numa_id when
building the rtas DT, so put it in a helper for easier access and to
avoid repetition.

Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 3b880445 28-Jan-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr: move spapr_machine_using_legacy_numa() to spapr_numa.c

This function is used only in spapr_numa.c.

Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <grou

spapr: move spapr_machine_using_legacy_numa() to spapr_numa.c

This function is used only in spapr_numa.c.

Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


Revision tags: v5.2.0
# 4a7c0bd9 09-Oct-2020 Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.2-20201009' into staging

ppc patch queue 2020-10-09

Here's the next set of ppc related patches for qemu-5.2. There are

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.2-20201009' into staging

ppc patch queue 2020-10-09

Here's the next set of ppc related patches for qemu-5.2. There are
two main things here:

* Cleanups to error handling in spapr from Greg Kurz
* Improvements to NUMA handling for spapr from Daniel Barboza

There are also a handful of other bugfixes.

# gpg: Signature made Fri 09 Oct 2020 07:02:29 BST
# gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-5.2-20201009:
specs/ppc-spapr-numa: update with new NUMA support
spapr_numa: consider user input when defining associativity
spapr_numa: change reference-points and maxdomain settings
spapr_numa: forbid asymmetrical NUMA setups
spapr: add spapr_machine_using_legacy_numa() helper
ppc/pnv: Increase max firmware size
spapr: Add a return value to spapr_check_pagesize()
spapr: Add a return value to spapr_nvdimm_validate()
spapr: Simplify error handling in spapr_cpu_core_realize()
spapr: Add a return value to spapr_set_vcpu_id()
spapr: Simplify error handling in prop_get_fdt()
spapr: Add a return value to spapr_drc_attach()
spapr: Simplify error handling in spapr_vio_busdev_realize()
spapr: Simplify error handling in do_client_architecture_support()
spapr: Get rid of cas_check_pvr() error reporting
spapr: Simplify error handling in callers of ppc_set_compat()
ppc: Fix return value in cpu_post_load() error path
ppc: Add a return value to ppc_set_compat() and ppc_set_compat_all()
spapr: Fix error leak in spapr_realize_vcpu()
spapr: Handle HPT allocation failure in nested guest

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

show more ...


# 690fbe42 07-Oct-2020 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa: consider user input when defining associativity

A new function called spapr_numa_define_associativity_domains()
is created to calculate the associativity domains and change

spapr_numa: consider user input when defining associativity

A new function called spapr_numa_define_associativity_domains()
is created to calculate the associativity domains and change
the associativity arrays considering user input. This is how
the associativity domain between two NUMA nodes A and B is
calculated:

- get the distance D between them

- get the correspondent NUMA level 'n_level' for D. This is done
via a helper called spapr_numa_get_numa_level()

- all associativity arrays were initialized with their own
numa_ids, and we're calculating the distance in node_id ascending
order, starting from node id 0 (the first node retrieved by
numa_state). This will have a cascade effect in the algorithm because
the associativity domains that node 0 defines will be carried over to
other nodes, and node 1 associativities will be carried over after
taking node 0 associativities into account, and so on. This
happens because we'll assign assoc_src as the associativity domain
of dst as well, for all NUMA levels beyond and including n_level.

The PPC kernel expects the associativity domains of the first node
(node id 0) to be always 0 [1], and this algorithm will grant that
by default.

Ultimately, all of this results in a best effort approximation for
the actual NUMA distances the user input in the command line. Given
the nature of how PAPR itself interprets NUMA distances versus the
expectations risen by how ACPI SLIT works, there might be better
algorithms but, in the end, it'll also result in another way to
approximate what the user really wanted.

To keep this commit message no longer than it already is, the next
patch will update the existing documentation in ppc-spapr-numa.rst
with more in depth details and design considerations/drawbacks.

[1] https://lore.kernel.org/linuxppc-dev/5e8fbea3-8faf-0951-172a-b41a2138fbcf@gmail.com/

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 491e884e 07-Oct-2020 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa: change reference-points and maxdomain settings

This is the first guest visible change introduced in
spapr_numa.c. The previous settings of both reference-points
and maxdo

spapr_numa: change reference-points and maxdomain settings

This is the first guest visible change introduced in
spapr_numa.c. The previous settings of both reference-points
and maxdomains were too restrictive, but enough for the
existing associativity we're setting in the resources.

We'll change that in the following patches, populating the
associativity arrays based on user input. For those changes
to be effective, reference-points and maxdomains must be
more flexible. After this patch, we'll have 4 distinct
levels of NUMA (0x4, 0x3, 0x2, 0x1) and maxdomains will
allow for any type of configuration the user intends to
do - under the scope and limitations of PAPR itself, of
course.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# ee6635b2 07-Oct-2020 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa: forbid asymmetrical NUMA setups

The pSeries machine does not support asymmetrical NUMA
configurations. This doesn't make much of a different
since we're not using user in

spapr_numa: forbid asymmetrical NUMA setups

The pSeries machine does not support asymmetrical NUMA
configurations. This doesn't make much of a different
since we're not using user input for pSeries NUMA setup,
but this will change in the next patches.

To avoid breaking existing setups, gate this change by
checking for legacy NUMA support.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 2499453e 11-Sep-2020 Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qemu-img create: Fail gracefully when backing file is an empty string
- Fixes

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qemu-img create: Fail gracefully when backing file is an empty string
- Fixes related to filter block nodes ("Deal with filters" series)
- block/nvme: Various cleanups required to use multiple queues
- block/nvme: Use NvmeBar structure from "block/nvme.h"
- file-win32: Fix "locking" option
- iotests: Allow running from different directory

# gpg: Signature made Thu 10 Sep 2020 10:11:19 BST
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (65 commits)
block/qcow2-cluster: Add missing "fallthrough" annotation
block/nvme: Pair doorbell registers
block/nvme: Use generic NvmeBar structure
block/nvme: Group controller registers in NVMeRegs structure
file-win32: Fix "locking" option
iotests: Allow running from different directory
iotests: Test committing to overridden backing
iotests: Add test for commit in sub directory
iotests: Add filter mirror test cases
iotests: Add filter commit test cases
iotests: Let complete_and_wait() work with commit
iotests: Test that qcow2's data-file is flushed
block: Leave BDS.backing_{file,format} constant
block: Inline bdrv_co_block_status_from_*()
blockdev: Fix active commit choice
block: Drop backing_bs()
qemu-img: Use child access functions
nbd: Use CAF when looking for dirty bitmap
commit: Deal with filters
backup: Deal with filters
...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

show more ...


# 9435a8b3 08-Sep-2020 Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kraxel/tags/sirius/ipxe-20200908-pull-request' into staging

ipxe: update to aug 2020 snapshot.

# gpg: Signature made Tue 08 Sep 2020 07:09:26 B

Merge remote-tracking branch 'remotes/kraxel/tags/sirius/ipxe-20200908-pull-request' into staging

ipxe: update to aug 2020 snapshot.

# gpg: Signature made Tue 08 Sep 2020 07:09:26 BST
# gpg: using RSA key 4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full]
# gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" [full]
# gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full]
# Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/sirius/ipxe-20200908-pull-request:
ipxe: update binaries
ipxe: drop ia32 efi roms
ipxe: update submodule

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

show more ...


12