History log of /openbmc/linux/drivers/net/ethernet/ibm/ibmvnic.c (Results 1 – 25 of 1104)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45
# 6db541ae 09-Aug-2023 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Ensure login failure recovery is safe from other resets

If a login request fails, the recovery process should be protected
against parallel resets. It is a known issue that freeing and
regi

ibmvnic: Ensure login failure recovery is safe from other resets

If a login request fails, the recovery process should be protected
against parallel resets. It is a known issue that freeing and
registering CRQ's in quick succession can result in a failover CRQ from
the VIOS. Processing a failover during login recovery is dangerous for
two reasons:
1. This will result in two parallel initialization processes, this can
cause serious issues during login.
2. It is possible that the failover CRQ is received but never executed.
We get notified of a pending failover through a transport event CRQ.
The reset is not performed until a INIT CRQ request is received.
Previously, if CRQ init fails during login recovery, then the ibmvnic
irq is freed and the login process returned error. If failover_pending
is true (a transport event was received), then the ibmvnic device
would never be able to process the reset since it cannot receive the
CRQ_INIT request due to the irq being freed. This leaved the device
in a inoperable state.

Therefore, the login failure recovery process must be hardened against
these possible issues. Possible failovers (due to quick CRQ free and
init) must be avoided and any issues during re-initialization should be
dealt with instead of being propagated up the stack. This logic is
similar to that of ibmvnic_probe().

Fixes: dff515a3e71d ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-5-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 23cc5f66 09-Aug-2023 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Do partial reset on login failure

Perform a partial reset before sending a login request if any of the
following are true:
1. If a previous request times out. This can be dangerous because

ibmvnic: Do partial reset on login failure

Perform a partial reset before sending a login request if any of the
following are true:
1. If a previous request times out. This can be dangerous because the
VIOS could still receive the old login request at any point after
the timeout. Therefore, it is best to re-register the CRQ's and
sub-CRQ's before retrying.
2. If the previous request returns an error that is not described in
PAPR. PAPR provides procedures if the login returns with partial
success or aborted return codes (section L.5.1) but other values
do not have a defined procedure. Previously, these conditions
just returned error from the login function rather than trying
to resolve the issue.
This can cause further issues since most callers of the login
function are not prepared to handle an error when logging in. This
improper cleanup can lead to the device being permanently DOWN'd.
For example, if the VIOS believes that the device is already logged
in then it will return INVALID_STATE (-7). If we never re-register
CRQ's then it will always think that the device is already logged
in. This leaves the device inoperable.

The partial reset involves freeing the sub-CRQs, freeing the CRQ then
registering and initializing a new CRQ and sub-CRQs. This essentially
restarts all communication with VIOS to allow for a fresh login attempt
that will be unhindered by any previous failed attempts.

Fixes: dff515a3e71d ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-4-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# d78a671e 09-Aug-2023 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Handle DMA unmapping of login buffs in release functions

Rather than leaving the DMA unmapping of the login buffers to the
login response handler, move this work into the login release func

ibmvnic: Handle DMA unmapping of login buffs in release functions

Rather than leaving the DMA unmapping of the login buffers to the
login response handler, move this work into the login release functions.
Previously, these functions were only used for freeing the allocated
buffers. This could lead to issues if there are more than one
outstanding login buffer requests, which is possible if a login request
times out.

If a login request times out, then there is another call to send login.
The send login function makes a call to the login buffer release
function. In the past, this freed the buffers but did not DMA unmap.
Therefore, the VIOS could still write to the old login (now freed)
buffer. It is for this reason that it is a good idea to leave the DMA
unmap call to the login buffers release function.

Since the login buffer release functions now handle DMA unmapping,
remove the duplicate DMA unmapping in handle_login_rsp().

Fixes: dff515a3e71d ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-3-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 411c565b 09-Aug-2023 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Unmap DMA login rsp buffer on send login fail

If the LOGIN CRQ fails to send then we must DMA unmap the response
buffer. Previously, if the CRQ failed then the memory was freed without
DMA

ibmvnic: Unmap DMA login rsp buffer on send login fail

If the LOGIN CRQ fails to send then we must DMA unmap the response
buffer. Previously, if the CRQ failed then the memory was freed without
DMA unmapping.

Fixes: c98d9cc4170d ("ibmvnic: send_login should check for crq errors")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-2-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# db17ba71 09-Aug-2023 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Enforce stronger sanity checks on login response

Ensure that all offsets in a login response buffer are within the size
of the allocated response buffer. Any offsets or lengths that surpass

ibmvnic: Enforce stronger sanity checks on login response

Ensure that all offsets in a login response buffer are within the size
of the allocated response buffer. Any offsets or lengths that surpass
the allocation are likely the result of an incomplete response buffer.
In these cases, a full reset is necessary.

When attempting to login, the ibmvnic device will allocate a response
buffer and pass a reference to the VIOS. The VIOS will then send the
ibmvnic device a LOGIN_RSP CRQ to signal that the buffer has been filled
with data. If the ibmvnic device does not get a response in 20 seconds,
the old buffer is freed and a new login request is sent. With 2
outstanding requests, any LOGIN_RSP CRQ's could be for the older
login request. If this is the case then the login response buffer (which
is for the newer login request) could be incomplete and contain invalid
data. Therefore, we must enforce strict sanity checks on the response
buffer values.

Testing has shown that the `off_rxadd_buff_size` value is filled in last
by the VIOS and will be the smoking gun for these circumstances.

Until VIOS can implement a mechanism for tracking outstanding response
buffers and a method for mapping a LOGIN_RSP CRQ to a particular login
response buffer, the best ibmvnic can do in this situation is perform a
full reset.

Fixes: dff515a3e71d ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230809221038.51296-1-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v6.1.44
# 813f3662 04-Aug-2023 Yu Liao <liaoyu15@huawei.com>

ibmvnic: remove unused rc variable

gcc with W=1 reports
drivers/net/ethernet/ibm/ibmvnic.c:194:13: warning: variable 'rc' set but not used [-Wunused-but-set-variable]

ibmvnic: remove unused rc variable

gcc with W=1 reports
drivers/net/ethernet/ibm/ibmvnic.c:194:13: warning: variable 'rc' set but not used [-Wunused-but-set-variable]
^
This variable is not used so remove it.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308040609.zQsSXWXI-lkp@intel.com/
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Reviewed-by: Nick Child <nnac123@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


Revision tags: v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37
# 48538ccb 28-Jun-2023 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Do not reset dql stats on NON_FATAL err

All ibmvnic resets, make a call to netdev_tx_reset_queue() when
re-opening the device. netdev_tx_reset_queue() resets the num_queued
and num_complete

ibmvnic: Do not reset dql stats on NON_FATAL err

All ibmvnic resets, make a call to netdev_tx_reset_queue() when
re-opening the device. netdev_tx_reset_queue() resets the num_queued
and num_completed byte counters. These stats are used in Byte Queue
Limit (BQL) algorithms. The difference between these two stats tracks
the number of bytes currently sitting on the physical NIC. ibmvnic
increases the number of queued bytes though calls to
netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports
that it is done transmitting bytes, the ibmvnic device increases the
number of completed bytes through calls to netdev_tx_completed_queue().
It is important to note that the driver batches its transmit calls and
num_queued is increased every time that an skb is added to the next
batch, not necessarily when the batch is sent to VIOS for transmission.

Unlike other reset types, a NON FATAL reset will not flush the sub crq
tx buffers. Therefore, it is possible for the batched skb array to be
partially full. So if there is call to netdev_tx_reset_queue() when
re-opening the device, the value of num_queued (0) would not account
for the skb's that are currently batched. Eventually, when the batch
is sent to VIOS, the call to netdev_tx_completed_queue() would increase
num_completed to a value greater than the num_queued. This causes a
BUG_ON crash:

ibmvnic 30000002: Firmware reports error, cause: adapter problem.
Starting recovery...
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
------------[ cut here ]------------
kernel BUG at lib/dynamic_queue_limits.c:27!
Oops: Exception in kernel mode, sig: 5
[....]
NIP dql_completed+0x28/0x1c0
LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic]
Call Trace:
ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable)
ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic]
__handle_irq_event_percpu+0x98/0x270
---[ end trace ]---

Therefore, do not reset the dql stats when performing a NON_FATAL reset.

Fixes: 0d973388185d ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14
# 6f2ce45f 23-Feb-2023 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Assign XPS map to correct queue index

When setting the XPS map value for TX queues, use the index of the
transmit queue.
Previously, the function was passing the index of the loop that iter

ibmvnic: Assign XPS map to correct queue index

When setting the XPS map value for TX queues, use the index of the
transmit queue.
Previously, the function was passing the index of the loop that iterates
over all queues (RX and TX). This was causing invalid XPS map values.

Fixes: 6831582937bd ("ibmvnic: Toggle between queue types in affinity mapping")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230223153944.44969-1-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9
# 68315829 27-Jan-2023 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Toggle between queue types in affinity mapping

Previously, ibmvnic IRQs were assigned to CPU numbers by assigning all
the IRQs for transmit queues then assigning all the IRQs for receive
qu

ibmvnic: Toggle between queue types in affinity mapping

Previously, ibmvnic IRQs were assigned to CPU numbers by assigning all
the IRQs for transmit queues then assigning all the IRQs for receive
queues. With multi-threaded processors, in a heavy RX or TX environment,
physical cores would either be overloaded or underutilized (due to the
IRQ assignment algorithm). This approach is sub-optimal because IRQs for
the same subprocess (RX or TX) would be bound to adjacent CPU numbers,
meaning they were more likely to be contending for the same core.

For example, in a system with 64 CPU's and 32 queues, the IRQs would
be bound to CPU in the following pattern:

IRQ type | CPU number
-----------------------
TX0 | 0-1
TX1 | 2-3
<etc>
RX0 | 32-33
RX1 | 34-35
<etc>

Observe that in SMT-8, the first 4 tx queues would be sharing the
same core.

A more optimal algorithm would balance the number RX and TX IRQ's across
the physical cores. Therefore, to increase performance, distribute RX and
TX IRQs across cores by alternating between assigning IRQs for RX and TX
queues to CPUs.
With a system with 64 CPUs and 32 queues, this results in the following
pattern:

IRQ type | CPU number
-----------------------
TX0 | 0-1
RX0 | 2-3
TX1 | 4-5
RX1 | 6-7
<etc>

Observe that in SMT-8, there is equal distribution of RX and TX IRQs
per core. In the above case, each core handles 2 TX and 2 RX IRQ's.

Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Link: https://lore.kernel.org/r/20230127214358.318152-1-nnac123@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

show more ...


Revision tags: v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79
# df8f66d0 10-Nov-2022 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Update XPS assignments during affinity binding

Transmit Packet Steering (XPS) maps cpu numbers to transmit
queues. By running the same connection on the same set of cpu's,
contention for th

ibmvnic: Update XPS assignments during affinity binding

Transmit Packet Steering (XPS) maps cpu numbers to transmit
queues. By running the same connection on the same set of cpu's,
contention for the queue and cache miss rate can be minimized.
When assigning a cpu mask for a tranmit queues irq number, assign
the same cpu mask as the set of cpu's that XPS should use for that
queue.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 92125c3a 10-Nov-2022 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Add hotpluggable CPU callbacks to reassign affinity hints

When CPU's are added and removed, ibmvnic devices will reassign
hint values. Introduce a new cpu hotplug state CPUHP_IBMVNIC_DEAD
t

ibmvnic: Add hotpluggable CPU callbacks to reassign affinity hints

When CPU's are added and removed, ibmvnic devices will reassign
hint values. Introduce a new cpu hotplug state CPUHP_IBMVNIC_DEAD
to signal to ibmvnic devices that the CPU has been removed and it
is time to reset affinity hint assignments. On the other hand,
when CPU's are being added, add a state instance to
CPUHP_AP_ONLINE_DYN which will trigger a reassignment of affinity
hints once the new CPU's are online. This implementation is based
on the virtio_net driver.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 44fbc1b6 10-Nov-2022 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Assign IRQ affinity hints to device queues

Assign affinity hints to ibmvnic device queue interrupts.
Affinity hints are assigned and removed during sub-crq init and
teardown, respectively.

ibmvnic: Assign IRQ affinity hints to device queues

Assign affinity hints to ibmvnic device queue interrupts.
Affinity hints are assigned and removed during sub-crq init and
teardown, respectively. This update should improve latency if
utilized as interrupt lines and processing are more equally
distributed among CPU's. This implementation is based on the
virtio_net driver.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


Revision tags: v6.0.8, v5.15.78, v6.0.7, v5.15.77
# d6dd2fe7 31-Oct-2022 Nick Child <nnac123@linux.ibm.com>

ibmvnic: Free rwi on reset success

Free the rwi structure in the event that the last rwi in the list
processed successfully. The logic in commit 4f408e1fa6e1 ("ibmvnic:
retry reset if there are no o

ibmvnic: Free rwi on reset success

Free the rwi structure in the event that the last rwi in the list
processed successfully. The logic in commit 4f408e1fa6e1 ("ibmvnic:
retry reset if there are no other resets") introduces an issue that
results in a 32 byte memory leak whenever the last rwi in the list
gets processed.

Fixes: 4f408e1fa6e1 ("ibmvnic: retry reset if there are no other resets")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Link: https://lore.kernel.org/r/20221031150642.13356-1-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71
# b48b89f9 27-Sep-2022 Jakub Kicinski <kuba@kernel.org>

net: drop the weight argument from netif_napi_add

We tell driver developers to always pass NAPI_POLL_WEIGHT
as the weight to netif_napi_add(). This may be confusing
to newcomers, drop the weight arg

net: drop the weight argument from netif_napi_add

We tell driver developers to always pass NAPI_POLL_WEIGHT
as the weight to netif_napi_add(). This may be confusing
to newcomers, drop the weight argument, those who really
need to tweak the weight can use netif_napi_add_weight().

Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN
Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52
# 1b18f09d 02-Jul-2022 Rick Lindsley <ricklind@us.ibm.com>

ibmvnic: Properly dispose of all skbs during a failover.

During a reset, there may have been transmits in flight that are no
longer valid and cannot be fulfilled. Resetting and clearing the
queues

ibmvnic: Properly dispose of all skbs during a failover.

During a reset, there may have been transmits in flight that are no
longer valid and cannot be fulfilled. Resetting and clearing the
queues is insufficient; each skb also needs to be explicitly freed
so that upper levels are not left waiting for confirmation of a
transmit that will never happen. If this happens frequently enough,
the apparent backlog will cause TCP to begin "congestion control"
unnecessarily, culminating in permanently decreased throughput.

Fixes: d7c0ef36bde03 ("ibmvnic: Free and re-allocate scrqs when tx/rx scrqs change")
Tested-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Rick Lindsley <ricklind@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


Revision tags: v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38
# 6bff3ffc 04-May-2022 Christophe Leroy <christophe.leroy@csgroup.eu>

net: ethernet: Prepare cleanup of powerpc's asm/prom.h

powerpc's asm/prom.h includes some headers that it doesn't
need itself.

In order to clean powerpc's asm/prom.h up in a further step,
first cle

net: ethernet: Prepare cleanup of powerpc's asm/prom.h

powerpc's asm/prom.h includes some headers that it doesn't
need itself.

In order to clean powerpc's asm/prom.h up in a further step,
first clean all files that include asm/prom.h

Some files don't need asm/prom.h at all. For those ones,
just remove inclusion of asm/prom.h

Some files don't need any of the items provided by asm/prom.h,
but need some of the headers included by asm/prom.h. For those
ones, add the needed headers that are brought by asm/prom.h at
the moment and remove asm/prom.h

Some files really need asm/prom.h but also need some of the
headers included by asm/prom.h. For those one, leave asm/prom.h
but also add the needed headers so that they can be removed
from asm/prom.h in a later step.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/09a13d592d628de95d30943e59b2170af5b48110.1651663857.git.christophe.leroy@csgroup.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v5.15.37
# aeaf59b7 27-Apr-2022 Dany Madden <drt@linux.ibm.com>

Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"

This reverts commit 723ad916134784b317b72f3f6cf0f7ba774e5dae

When client requests channel or ring size larger than what th

Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"

This reverts commit 723ad916134784b317b72f3f6cf0f7ba774e5dae

When client requests channel or ring size larger than what the server
can support the server will cap the request to the supported max. So,
the client would not be able to successfully request resources that
exceed the server limit.

Fixes: 723ad9161347 ("ibmvnic: Add ethtool private flag for driver-defined queue limits")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220427235146.23189-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v5.15.36, v5.15.35, v5.15.34
# 93b1ebb3 13-Apr-2022 Sukadev Bhattiprolu <sukadev@linux.ibm.com>

ibmvnic: Allow multiple ltbs in txpool ltb_set

Allow multiple LTBs in the txpool's ltb_set. i.e rather than using
a single large LTB, use several smaller LTBs.

The first n-1 LTBs will all be of the

ibmvnic: Allow multiple ltbs in txpool ltb_set

Allow multiple LTBs in the txpool's ltb_set. i.e rather than using
a single large LTB, use several smaller LTBs.

The first n-1 LTBs will all be of the same size. The size of the last
LTB in the set depends on the number of buffers and buffer (mtu) size.
This strategy hopefully allows more reuse of the initial LTBs and also
reduces the chances of an allocation failure (of the large LTB) when
system is low in memory.

Suggested-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# a75de820 13-Apr-2022 Sukadev Bhattiprolu <sukadev@linux.ibm.com>

ibmvnic: Allow multiple ltbs in rxpool ltb_set

Allow multiple LTBs in the rxpool's ltb_set. The first n-1 LTBs will all
be of the same size. The size of the last LTB in the set depends on the
number

ibmvnic: Allow multiple ltbs in rxpool ltb_set

Allow multiple LTBs in the rxpool's ltb_set. The first n-1 LTBs will all
be of the same size. The size of the last LTB in the set depends on the
number of buffers and buffer (mtu) size.

Having a set of LTBs per pool provides a couple of benefits.

First, with the current value of IBMVNIC_MAX_LTB_SIZE of 16MB, with an
MTU of 9000, we need a LTB (DMA buffer) of that size but the allocation
can fail in low memory conditions. With a set of LTBs per pool, we can
use several smaller (8MB) LTBs and hopefully have fewer allocation
failures. (See also comments in ibmvnic.h on the trade-off with smaller
LTBs)

Second since the kernel limits the size of the DMA buffer to 16MB (based
on MAX_ORDER), with a single DMA buffer per pool, the pool is also limited
to 16MB. This in turn limits the number of buffers per pool to 1763 when
MTU is 9000. With a set of LTBs per pool, we can have upto the max of 4096
buffers per pool even when MTU is 9000.

Suggested-by: Brian King <brking@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# d6b45850 13-Apr-2022 Sukadev Bhattiprolu <sukadev@linux.ibm.com>

ibmvnic: convert rxpool ltb to a set of ltbs

Define and use interfaces that treat the long term buffer (LTB) of an
rxpool as a set of LTBs rather than a single LTB. The set only has one
LTB for now.

ibmvnic: convert rxpool ltb to a set of ltbs

Define and use interfaces that treat the long term buffer (LTB) of an
rxpool as a set of LTBs rather than a single LTB. The set only has one
LTB for now.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 0c91bf9c 13-Apr-2022 Sukadev Bhattiprolu <sukadev@linux.ibm.com>

ibmvnic: define map_txpool_buf_to_ltb()

Define a helper to map a given txpool buffer into its corresponding long
term buffer (LTB) and offset. Currently there is just one LTB per txpool
so the mappi

ibmvnic: define map_txpool_buf_to_ltb()

Define a helper to map a given txpool buffer into its corresponding long
term buffer (LTB) and offset. Currently there is just one LTB per txpool
so the mapping is trivial. When we add support for multiple LTBs per
txpool, this helper will be more useful.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 2872a67c 13-Apr-2022 Sukadev Bhattiprolu <sukadev@linux.ibm.com>

ibmvnic: define map_rxpool_buf_to_ltb()

Define a helper to map a given rx pool buffer into its corresponding long
term buffer (LTB) and offset. Currently there is just one LTB per pool so
the mappin

ibmvnic: define map_rxpool_buf_to_ltb()

Define a helper to map a given rx pool buffer into its corresponding long
term buffer (LTB) and offset. Currently there is just one LTB per pool so
the mapping is trivial. When we add support for multiple LTBs per pool,
this helper will be more useful.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 8880fc66 13-Apr-2022 Sukadev Bhattiprolu <sukadev@linux.ibm.com>

ibmvnic: rename local variable index to bufidx

The local variable 'index' is heavily used in some functions and is
confusing with the presence of other "index" fields like pool->index,
->consumer_in

ibmvnic: rename local variable index to bufidx

The local variable 'index' is heavily used in some functions and is
confusing with the presence of other "index" fields like pool->index,
->consumer_index, etc. Rename it to bufidx to better reflect that its
the index of a buffer in the pool.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30
# 4219196d 16-Mar-2022 Sukadev Bhattiprolu <sukadev@linux.ibm.com>

ibmvnic: fix race between xmit and reset

There is a race between reset and the transmit paths that can lead to
ibmvnic_xmit() accessing an scrq after it has been freed in the reset
path. It can resu

ibmvnic: fix race between xmit and reset

There is a race between reset and the transmit paths that can lead to
ibmvnic_xmit() accessing an scrq after it has been freed in the reset
path. It can result in a crash like:

Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc0080000016189f8
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c0080000016189f8] ibmvnic_xmit+0x60/0xb60 [ibmvnic]
LR [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
Call Trace:
[c008000001618f08] ibmvnic_xmit+0x570/0xb60 [ibmvnic] (unreliable)
[c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
[c000000000c9cfcc] sch_direct_xmit+0xec/0x330
[c000000000bfe640] __dev_xmit_skb+0x3a0/0x9d0
[c000000000c00ad4] __dev_queue_xmit+0x394/0x730
[c008000002db813c] __bond_start_xmit+0x254/0x450 [bonding]
[c008000002db8378] bond_start_xmit+0x40/0xc0 [bonding]
[c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
[c000000000c00ca4] __dev_queue_xmit+0x564/0x730
[c000000000cf97e0] neigh_hh_output+0xd0/0x180
[c000000000cfa69c] ip_finish_output2+0x31c/0x5c0
[c000000000cfd244] __ip_queue_xmit+0x194/0x4f0
[c000000000d2a3c4] __tcp_transmit_skb+0x434/0x9b0
[c000000000d2d1e0] __tcp_retransmit_skb+0x1d0/0x6a0
[c000000000d2d984] tcp_retransmit_skb+0x34/0x130
[c000000000d310e8] tcp_retransmit_timer+0x388/0x6d0
[c000000000d315ec] tcp_write_timer_handler+0x1bc/0x330
[c000000000d317bc] tcp_write_timer+0x5c/0x200
[c000000000243270] call_timer_fn+0x50/0x1c0
[c000000000243704] __run_timers.part.0+0x324/0x460
[c000000000243894] run_timer_softirq+0x54/0xa0
[c000000000ea713c] __do_softirq+0x15c/0x3e0
[c000000000166258] __irq_exit_rcu+0x158/0x190
[c000000000166420] irq_exit+0x20/0x40
[c00000000002853c] timer_interrupt+0x14c/0x2b0
[c000000000009a00] decrementer_common_virt+0x210/0x220
--- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c

The immediate cause of the crash is the access of tx_scrq in the following
snippet during a reset, where the tx_scrq can be either NULL or an address
that will soon be invalid:

ibmvnic_xmit()
{
...
tx_scrq = adapter->tx_scrq[queue_num];
txq = netdev_get_tx_queue(netdev, queue_num);
ind_bufp = &tx_scrq->ind_buf;

if (test_bit(0, &adapter->resetting)) {
...
}

But beyond that, the call to ibmvnic_xmit() itself is not safe during a
reset and the reset path attempts to avoid this by stopping the queue in
ibmvnic_cleanup(). However just after the queue was stopped, an in-flight
ibmvnic_complete_tx() could have restarted the queue even as the reset is
progressing.

Since the queue was restarted we could get a call to ibmvnic_xmit() which
can then access the bad tx_scrq (or other fields).

We cannot however simply have ibmvnic_complete_tx() check the ->resetting
bit and skip starting the queue. This can race at the "back-end" of a good
reset which just restarted the queue but has not cleared the ->resetting
bit yet. If we skip restarting the queue due to ->resetting being true,
the queue would remain stopped indefinitely potentially leading to transmit
timeouts.

IOW ->resetting is too broad for this purpose. Instead use a new flag
that indicates whether or not the queues are active. Only the open/
reset paths control when the queues are active. ibmvnic_complete_tx()
and others wake up the queue only if the queue is marked active.

So we will have:
A. reset/open thread in ibmvnic_cleanup() and __ibmvnic_open()

->resetting = true
->tx_queues_active = false
disable tx queues
...
->tx_queues_active = true
start tx queues

B. Tx interrupt in ibmvnic_complete_tx():

if (->tx_queues_active)
netif_wake_subqueue();

To ensure that ->tx_queues_active and state of the queues are consistent,
we need a lock which:

- must also be taken in the interrupt path (ibmvnic_complete_tx())
- shared across the multiple queues in the adapter (so they don't
become serialized)

Use rcu_read_lock() and have the reset thread synchronize_rcu() after
updating the ->tx_queues_active state.

While here, consolidate a few boolean fields in ibmvnic_adapter for
better alignment.

Based on discussions with Brian King and Dany Madden.

Fixes: 7ed5b31f4a66 ("net/ibmvnic: prevent more than one thread from running in reset")
Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


Revision tags: v5.15.29, v5.15.28, v5.15.27, v5.15.26
# fd98693c 25-Feb-2022 Sukadev Bhattiprolu <sukadev@linux.ibm.com>

ibmvnic: Allow queueing resets during probe

We currently don't allow queuing resets when adapter is in VNIC_PROBING
state - instead we throw away the reset and return EBUSY. The reasoning
is probabl

ibmvnic: Allow queueing resets during probe

We currently don't allow queuing resets when adapter is in VNIC_PROBING
state - instead we throw away the reset and return EBUSY. The reasoning
is probably that during ibmvnic_probe() the ibmvnic_adapter itself is
being initialized so performing a reset during this time can lead us to
accessing fields in the ibmvnic_adapter that are not fully initialized.
A review of the code shows that all the adapter state neede to process a
reset is initialized before registering the CRQ so that should no longer
be a concern.

Further the expectation is that if we do get a reset (transport event)
during probe, the do..while() loop in ibmvnic_probe() will handle this
by reinitializing the CRQ.

While that is true to some extent, it is possible that the reset might
occur _after_ the CRQ is registered and CRQ_INIT message was exchanged
but _before_ the adapter state is set to VNIC_PROBED. As mentioned above,
such a reset will be thrown away. While the client assumes that the
adapter is functional, the vnic server will wait for the client to reinit
the adapter. This disconnect between the two leaves the adapter down
needing manual intervention.

Because ibmvnic_probe() has other work to do after initializing the CRQ
(such as registering the netdev at a minimum) and because the reset event
can occur at any instant after the CRQ is initialized, there will always
be a window between initializing the CRQ and considering the adapter
ready for resets (ie state == PROBED).

So rather than discarding resets during this window, allow queueing them
- but only process them after the adapter is fully initialized.

To do this, introduce a new completion state ->probe_done and have the
reset worker thread wait on this before processing resets.

This change brings up two new situations in or just after ibmvnic_probe().
First after one or more resets were queued, we encounter an error and
decide to retry the initialization. At that point the queued resets are
no longer relevant since we could be talking to a new vnic server. So we
must purge/flush the queued resets before restarting the initialization.
As a side note, since we are still in the probing stage and we have not
registered the netdev, it will not be CHANGE_PARAM reset.

Second this change opens up a potential race between the worker thread
in __ibmvnic_reset(), the tasklet and the ibmvnic_open() due to the
following sequence of events:

1. Register CRQ
2. Get transport event before CRQ_INIT completes.
3. Tasklet schedules reset:
a) add rwi to list
b) schedule_work() to start worker thread which runs
and waits for ->probe_done.
4. ibmvnic_probe() decides to retry, purges rwi_list
5. Re-register crq and this time rest of probe succeeds - register
netdev and complete(->probe_done).
6. Worker thread resumes in __ibmvnic_reset() from 3b.
7. Worker thread sets ->resetting bit
8. ibmvnic_open() comes in, notices ->resetting bit, sets state
to IBMVNIC_OPEN and returns early expecting worker thread to
finish the open.
9. Worker thread finds rwi_list empty and returns without
opening the interface.

If this happens, the ->ndo_open() call is effectively lost and the
interface remains down. To address this, ensure that ->rwi_list is
not empty before setting the ->resetting bit. See also comments in
__ibmvnic_reset().

Fixes: 6a2fb0e99f9c ("ibmvnic: driver initialization for kdump/kexec")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


12345678910>>...45