Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9 |
|
#
e632291a |
| 24-Jan-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
IB/IPoIB: Fix legacy IPoIB due to wrong number of queues
The cited commit creates child PKEY interfaces over netlink will multiple tx and rx queues, but some devices doesn't support more than 1 tx a
IB/IPoIB: Fix legacy IPoIB due to wrong number of queues
The cited commit creates child PKEY interfaces over netlink will multiple tx and rx queues, but some devices doesn't support more than 1 tx and 1 rx queues. This causes to a crash when traffic is sent over the PKEY interface due to the parent having a single queue but the child having multiple queues.
This patch fixes the number of queues to 1 for legacy IPoIB at the earliest possible point in time.
BUG: kernel NULL pointer dereference, address: 000000000000036b PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 4 PID: 209665 Comm: python3 Not tainted 6.1.0_for_upstream_min_debug_2022_12_12_17_02 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmem_cache_alloc+0xcb/0x450 Code: ce 7e 49 8b 50 08 49 83 78 10 00 4d 8b 28 0f 84 cb 02 00 00 4d 85 ed 0f 84 c2 02 00 00 41 8b 44 24 28 48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b8 41 8b RSP: 0018:ffff88822acbbab8 EFLAGS: 00010202 RAX: 0000000000000070 RBX: ffff8881c28e3e00 RCX: 00000000064f8dae RDX: 00000000064f8dad RSI: 0000000000000a20 RDI: 0000000000030d00 RBP: 0000000000000a20 R08: ffff8882f5d30d00 R09: ffff888104032f40 R10: ffff88810fade828 R11: 736f6d6570736575 R12: ffff88810081c000 R13: 00000000000002fb R14: ffffffff817fc865 R15: 0000000000000000 FS: 00007f9324ff9700(0000) GS:ffff8882f5d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000036b CR3: 00000001125af004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> skb_clone+0x55/0xd0 ip6_finish_output2+0x3fe/0x690 ip6_finish_output+0xfa/0x310 ip6_send_skb+0x1e/0x60 udp_v6_send_skb+0x1e5/0x420 udpv6_sendmsg+0xb3c/0xe60 ? ip_mc_finish_output+0x180/0x180 ? __switch_to_asm+0x3a/0x60 ? __switch_to_asm+0x34/0x60 sock_sendmsg+0x33/0x40 __sys_sendto+0x103/0x160 ? _copy_to_user+0x21/0x30 ? kvm_clock_get_cycles+0xd/0x10 ? ktime_get_ts64+0x49/0xe0 __x64_sys_sendto+0x25/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f9374f1ed14 Code: 42 41 f8 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 68 41 f8 ff 48 8b RSP: 002b:00007f9324ff7bd0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f9324ff7cc8 RCX: 00007f9374f1ed14 RDX: 00000000000002fb RSI: 00007f93000052f0 RDI: 0000000000000030 RBP: 0000000000000000 R08: 00007f9324ff7d40 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 000000012a05f200 R14: 0000000000000001 R15: 00007f9374d57bdc </TASK>
Fixes: dbc94a0fb817 ("IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/95eb6b74c7cf49fa46281f9d056d685c9fa11d38.1674584576.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
show more ...
|
Revision tags: v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17 |
|
#
ccae0447 |
| 04-Jan-2023 |
Mark Zhang <markzhang@nvidia.com> |
RDMA/cma: Refactor the inbound/outbound path records process flow
Refactors based on comments [1] of the multiple path records support patchset: - Return failure if not able to set inbound/outbound
RDMA/cma: Refactor the inbound/outbound path records process flow
Refactors based on comments [1] of the multiple path records support patchset: - Return failure if not able to set inbound/outbound PRs; - Simplify the flow when receiving the PRs from netlink channel: When a good PR response is received, unpack it and call the path_query callback directly. This saves two memory allocations; - Define RDMA_PRIMARY_PATH_MAX_REC_NUM in a proper place.
[1] https://lore.kernel.org/linux-rdma/Yyxp9E9pJtUids2o@nvidia.com/
Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> #srp Link: https://lore.kernel.org/r/7610025d57342b8b6da0f19516c9612f9c3fdc37.1672819376.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
show more ...
|
Revision tags: v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66 |
|
#
5a374949 |
| 08-Sep-2022 |
Mark Zhang <markzhang@nvidia.com> |
RDMA/cma: Multiple path records support with netlink channel
Support receiving inbound and outbound IB path records (along with GMP PathRecord) from user-space service through the RDMA netlink chann
RDMA/cma: Multiple path records support with netlink channel
Support receiving inbound and outbound IB path records (along with GMP PathRecord) from user-space service through the RDMA netlink channel. The LIDs in these 3 PRs can be used in this way: 1. GMP PR: used as the standard local/remote LIDs; 2. DLID of outbound PR: Used as the "dlid" field for outbound traffic; 3. DLID of inbound PR: Used as the "dlid" field for outbound traffic in responder side.
This is aimed to support adaptive routing. With current IB routing solution when a packet goes out it's assigned with a fixed DLID per target, meaning a fixed router will be used. The LIDs in inbound/outbound path records can be used to identify group of routers that allow communication with another subnet's entity. With them packets from an inter-subnet connection may travel through any router in the set to reach the target.
As confirmed with Jason, when sending a netlink request, kernel uses LS_RESOLVE_PATH_USE_ALL so that the service knows kernel supports multiple PRs.
Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/2fa2b6c93c4c16c8915bac3cfc4f27be1d60519d.1662631201.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
show more ...
|
Revision tags: v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53 |
|
#
2157f5ca |
| 05-Jul-2022 |
Jakub Kicinski <kuba@kernel.org> |
ipoib: switch to netif_napi_add_weight()
We want to remove the weight argument from the basic netif_napi_add() API and just default to 64. Switch ipoib to the new API for explicitly specifying the w
ipoib: switch to netif_napi_add_weight()
We want to remove the weight argument from the basic netif_napi_add() API and just default to 64. Switch ipoib to the new API for explicitly specifying the weight.
Link: https://lore.kernel.org/r/20220705230208.924408-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
show more ...
|
Revision tags: v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33 |
|
#
e945c653 |
| 04-Apr-2022 |
Jason Gunthorpe <jgg@nvidia.com> |
RDMA: Split kernel-only global device caps from uverbs device caps
Split out flags from ib_device::device_cap_flags that are only used internally to the kernel into kernel_cap_flags that is not part
RDMA: Split kernel-only global device caps from uverbs device caps
Split out flags from ib_device::device_cap_flags that are only used internally to the kernel into kernel_cap_flags that is not part of the uapi. This limits the device_cap_flags to being the same bitmap that will be copied to userspace.
This cleanly splits out the uverbs flags from the kernel flags to avoid confusion in the flags bitmap.
Add some short comments describing which each of the kernel flags is connected to. Remove unused kernel flags.
Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14 |
|
#
10f7b9bc |
| 19-Oct-2021 |
Jakub Kicinski <kuba@kernel.org> |
RDMA/ipoib: Use dev_addr_mod()
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr i
RDMA/ipoib: Use dev_addr_mod()
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers.
Link: https://lore.kernel.org/r/20211019182604.1441387-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.14.13, v5.14.12 |
|
#
8869574a |
| 10-Oct-2021 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
RDMA: Remove redundant 'flush_workqueue()' calls
'destroy_workqueue()' already drains the queue before destroying it, so there is no need to flush it explicitly.
Remove the redundant 'flush_workque
RDMA: Remove redundant 'flush_workqueue()' calls
'destroy_workqueue()' already drains the queue before destroying it, so there is no need to flush it explicitly.
Remove the redundant 'flush_workqueue()' calls.
This was generated with coccinelle:
@@ expression E; @@ - flush_workqueue(E); destroy_workqueue(E);
Link: https://lore.kernel.org/r/ca7bac6e6c9c5cc8d04eec3944edb13de0e381a3.1633874776.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60 |
|
#
a7605370 |
| 27-Jul-2021 |
Arnd Bergmann <arnd@arndb.de> |
dev_ioctl: split out ndo_eth_ioctl
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SI
dev_ioctl: split out ndo_eth_ioctl
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.
Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands.
This is a purely cosmetic change intended to help readers find their way through the implementation.
Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46 |
|
#
bf194997 |
| 16-Jun-2021 |
Leon Romanovsky <leonro@nvidia.com> |
RDMA: Fix kernel-doc warnings about wrong comment
Compilation with W=1 produces warnings similar to the below.
drivers/infiniband/ulp/ipoib/ipoib_main.c:320: warning: This comment starts with '/
RDMA: Fix kernel-doc warnings about wrong comment
Compilation with W=1 produces warnings similar to the below.
drivers/infiniband/ulp/ipoib/ipoib_main.c:320: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
All such occurrences were found with the following one line git grep -A 1 "\/\*\*" drivers/infiniband/
Link: https://lore.kernel.org/r/e57d5f4ddd08b7a19934635b44d6d632841b9ba7.1623823612.git.leonro@nvidia.com Reviewed-by: Jack Wang <jinpu.wang@ionos.com> #rtrs Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.10.43, v5.10.42, v5.10.41 |
|
#
a5e27fb6 |
| 28-May-2021 |
Weihang Li <liweihang@huawei.com> |
RDMA/ipoib: Use refcount_t instead of atomic_t for reference counting
The refcount_t API will WARN on underflow and overflow of a reference counter, and avoid use-after-free risks.
Link: https://lo
RDMA/ipoib: Use refcount_t instead of atomic_t for reference counting
The refcount_t API will WARN on underflow and overflow of a reference counter, and avoid use-after-free risks.
Link: https://lore.kernel.org/r/1622194663-2383-13-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
1f8f60f3 |
| 26-May-2021 |
YueHaibing <yuehaibing@huawei.com> |
IB/ipoib: Use DEVICE_ATTR_*() macros
Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR, which makes the code a bit shorter and easier to read.
Link: https://lore.kernel.org/r/20210526132753.3
IB/ipoib: Use DEVICE_ATTR_*() macros
Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR, which makes the code a bit shorter and easier to read.
Link: https://lore.kernel.org/r/20210526132753.3092-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30 |
|
#
3ccbd933 |
| 08-Apr-2021 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/ipoib: Print a message if only child interface is UP
When "enhanced IPoIB" is enabled for CX-5 devices it requires the parent device to be UP, otherwise the child devices won't work.
Thus add
RDMA/ipoib: Print a message if only child interface is UP
When "enhanced IPoIB" is enabled for CX-5 devices it requires the parent device to be UP, otherwise the child devices won't work.
Thus add a debug message to give admin a hint when only the child interface is UP but parent interface is not.
Link: https://lore.kernel.org/r/20210408093215.24023-1-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.10.27 |
|
#
042a00f9 |
| 29-Mar-2021 |
Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> |
IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev
The current rdma_netdev handling in ipoib hooks the tx_timeout handler, but prints out a totally useless message that prevents effective debugg
IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev
The current rdma_netdev handling in ipoib hooks the tx_timeout handler, but prints out a totally useless message that prevents effective debugging especially when multiple transmit queues are being used.
Add a tx_timeout rdma_netdev hook and implement the callback in the hfi1 to print additional information.
The existing non-helpful message is avoided when the driver has presented a callback.
Link: https://lore.kernel.org/r/1617026056-50483-3-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20 |
|
#
1fb7f897 |
| 01-Mar-2021 |
Mark Bloch <mbloch@nvidia.com> |
RDMA: Support more than 255 rdma ports
Current code uses many different types when dealing with a port of a RDMA device: u8, unsigned int and u32. Switch to u32 to clean up the logic.
This allows u
RDMA: Support more than 255 rdma ports
Current code uses many different types when dealing with a port of a RDMA device: u8, unsigned int and u32. Switch to u32 to clean up the logic.
This allows us to make (at least) the core view consistent and use the same type. Unfortunately not all places can be converted. Many uverbs functions expect port to be u8 so keep those places in order not to break UAPIs. HW/Spec defined values must also not be changed.
With the switch to u32 we now can support devices with more than 255 ports. U32_MAX is reserved to make control logic a bit easier to deal with. As a device with U32_MAX ports probably isn't going to happen any time soon this seems like a non issue.
When a device with more than 255 ports is created uverbs will report the RDMA device as having 255 ports as this is the max currently supported.
The verbs interface is not changed yet because the IBTA spec limits the port size in too many places to be u8 and all applications that relies in verbs won't be able to cope with this change. At this stage, we are extending the interfaces that are using vendor channel solely
Once the limitation is lifted mlx5 in switchdev mode will be able to have thousands of SFs created by the device. As the only instance of an RDMA device that reports more than 255 ports will be a representor device and it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other ULPs aren't effected by this change and their sysfs/interfaces that are exposes to userspace can remain unchanged.
While here cleanup some alignment issues and remove unneeded sanity checks (mainly in rdmavt),
Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14 |
|
#
633d6102 |
| 28-Jan-2021 |
Christoph Lameter <cl@linux.com> |
RDMA/ipoib: Remove racy Subnet Manager sendonly join checks
When a system receives a REREG event from the SM, then the SM information in the kernel is marked as invalid and a request is sent to the
RDMA/ipoib: Remove racy Subnet Manager sendonly join checks
When a system receives a REREG event from the SM, then the SM information in the kernel is marked as invalid and a request is sent to the SM to update the information. The SM information is invalid in that time period.
However, receiving a REREG also occurs simultaneously in user space applications that are now trying to rejoin the multicast groups. Some of those may be sendonly multicast groups which are then failing.
If the SM information is invalid then ib_sa_sendonly_fullmem_support() returns false. That is wrong because it just means that we do not know yet if the potentially new SM supports sendonly joins.
Sendonly join was introduced in 2015 and all the Subnet managers have supported it ever since. So there is no point in checking if a subnet manager supports it.
Should an old opensm get a request for a sendonly join then the request will fail. The code that is removed here accomodated that situation and fell back to a full join.
Falling back to a full join is problematic in itself. The reason to use the sendonly join was to reduce the traffic on the Infiniband fabric otherwise one could have just stayed with the regular join. So this patch may cause users of very old opensms to discover that lots of traffic needlessly crosses their IB fabrics.
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2101281845160.13303@www.lameter.com Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9 |
|
#
1c7fd726 |
| 07-Oct-2020 |
Joe Perches <joe@perches.com> |
RDMA: Convert sysfs device * show functions to use sysfs_emit()
Done with cocci script:
@@ identifier d_show; identifier dev, attr, buf; @@
ssize_t d_show(struct device *dev, struct device_attribu
RDMA: Convert sysfs device * show functions to use sysfs_emit()
Done with cocci script:
@@ identifier d_show; identifier dev, attr, buf; @@
ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - sprintf(buf, + sysfs_emit(buf, ...); ...> }
@@ identifier d_show; identifier dev, attr, buf; @@
ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> }
@@ identifier d_show; identifier dev, attr, buf; @@
ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> }
@@ identifier d_show; identifier dev, attr, buf; expression chr; @@
ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - strcpy(buf, chr); + sysfs_emit(buf, chr); ...> }
@@ identifier d_show; identifier dev, attr, buf; identifier len; @@
ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - sprintf(buf, + sysfs_emit(buf, ...); ...> return len; }
@@ identifier d_show; identifier dev, attr, buf; identifier len; @@
ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; }
@@ identifier d_show; identifier dev, attr, buf; identifier len; @@
ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; }
@@ identifier d_show; identifier dev, attr, buf; identifier len; @@
ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... - len += scnprintf(buf + len, PAGE_SIZE - len, + len += sysfs_emit_at(buf, len, ...); ...> return len; }
@@ identifier d_show; identifier dev, attr, buf; expression chr; @@
ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { ... - strcpy(buf, chr); - return strlen(buf); + return sysfs_emit(buf, chr); }
Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.8.14 |
|
#
5ce2dced |
| 04-Oct-2020 |
Kamal Heib <kamalheib1@gmail.com> |
RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
Report the "ipoib pkey", "mode" and "umcast" netlink attributes for every IPoiB interface type, not just children created with 'ip link add'.
Afte
RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
Report the "ipoib pkey", "mode" and "umcast" netlink attributes for every IPoiB interface type, not just children created with 'ip link add'.
After setting the rtnl_link_ops for the parent interface, implement the dellink() callback to block users from trying to remove it.
Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Link: https://lore.kernel.org/r/20201004132948.26669-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.8.13, v5.8.12 |
|
#
eff74233 |
| 25-Sep-2020 |
Taehee Yoo <ap420073@gmail.com> |
net: core: introduce struct netdev_nested_priv for nested interface infrastructure
Functions related to nested interface infrastructure such as netdev_walk_all_{ upper | lower }_dev() pass both priv
net: core: introduce struct netdev_nested_priv for nested interface infrastructure
Functions related to nested interface infrastructure such as netdev_walk_all_{ upper | lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added.
In the following patch, a new member variable will be added into this struct to fix the lockdep issue.
Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61 |
|
#
df561f66 |
| 23-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through mar
treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
show more ...
|
Revision tags: v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6 |
|
#
87fb5c1c |
| 23-Jun-2020 |
Michael Guralnik <michaelgur@mellanox.com> |
RDMA/ipoib: Handle user-supplied address when creating child
Use the address supplied by user when creating a child interface.
Previously, the address requested by the user was ignored and overridd
RDMA/ipoib: Handle user-supplied address when creating child
Use the address supplied by user when creating a child interface.
Previously, the address requested by the user was ignored and overridden with parent's GID and the random QP number assigned to the child.
Link: https://lore.kernel.org/r/20200623110105.1225750-3-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
65936bf2 |
| 25-Jun-2020 |
Jason Gunthorpe <jgg@nvidia.com> |
RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah()
ipoib_mcast_carrier_on_task() insanely open codes a rtnl_lock() such that the only time flush_workqueue() can be called is if it also clears IPOIB_
RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah()
ipoib_mcast_carrier_on_task() insanely open codes a rtnl_lock() such that the only time flush_workqueue() can be called is if it also clears IPOIB_FLAG_OPER_UP.
Thus the flush inside ipoib_flush_ah() will deadlock if it gets unlucky enough, and lockdep doesn't help us to find it early:
CPU0 CPU1 CPU2 __ipoib_ib_dev_flush() down_read(vlan_rwsem)
ipoib_vlan_add() rtnl_trylock() down_write(vlan_rwsem)
ipoib_mcast_carrier_on_task() while (!rtnl_trylock()) msleep(20);
ipoib_flush_ah() flush_workqueue(priv->wq)
Clean up the ah_reaper related functions and lifecycle to make sense:
- Start/Stop of the reaper should only be done in open/stop NDOs, not in any other places
- cancel and flush of the reaper should only happen in the stop NDO. cancel is only functional when combined with IPOIB_STOP_REAPER.
- Non-stop places were flushing the AH's just need to flush out dead AH's synchronously and ignore the background task completely. It is fully locked and harmless to leave running.
Which ultimately fixes the ABBA deadlock by removing the unnecessary flush_workqueue() from the problematic place under the vlan_rwsem.
Fixes: efc82eeeae4e ("IB/ipoib: No longer use flush as a parameter") Link: https://lore.kernel.org/r/20200625174219.290842-1-kamalheib1@gmail.com Reported-by: Kamal Heib <kheib@redhat.com> Tested-by: Kamal Heib <kheib@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43 |
|
#
1acba6a8 |
| 27-May-2020 |
Valentine Fatiev <valentinef@mellanox.com> |
IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
When connected mode is set, and we have connected and datagram traffic in parallel, ipoib might crash with double free of dat
IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
When connected mode is set, and we have connected and datagram traffic in parallel, ipoib might crash with double free of datagram skb.
The current mechanism assumes that the order in the completion queue is the same as the order of sent packets for all QPs. Order is kept only for specific QP, in case of mixed UD and CM traffic we have few QPs (one UD and few CM's) in parallel.
The problem: ----------------------------------------------------------
Transmit queue: ----------------- UD skb pointer kept in queue itself, CM skb kept in spearate queue and uses transmit queue as a placeholder to count the number of total transmitted packets.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 ------------------------------------------------------------ NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... ------------------------------------------------------------ ^ ^ tail head
Completion queue (problematic scenario) - the order not the same as in the transmit queue:
1 2 3 4 5 6 7 8 9 ------------------------------------ ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5 ------------------------------------
1. CM1 'wc' processing - skb freed in cm separate ring. - tx_tail of transmit queue increased although UD2 is not freed. Now driver assumes UD2 index is already freed and it could be used for new transmitted skb.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 ------------------------------------------------------------ NL NL UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... ------------------------------------------------------------ ^ ^ ^ (Bad)tail head (Bad - Could be used for new SKB)
In this case (due to heavy load) UD2 skb pointer could be replaced by new transmitted packet UD_NEW, as the driver assumes its free. At this point we will have to process two 'wc' with same index but we have only one pointer to free.
During second attempt to free the same skb we will have NULL pointer exception.
2. UD2 'wc' processing - skb freed according the index we got from 'wc', but it was already overwritten by mistake. So actually the skb that was released is the skb of the new transmitted packet and not the original one.
3. UD_NEW 'wc' processing - attempt to free already freed skb. NUll pointer exception.
The fix: -----------------------------------------------------------------------
The fix is to stop using the UD ring as a placeholder for CM packets, the cyclic ring variables tx_head and tx_tail will manage the UD tx_ring, a new cyclic variables global_tx_head and global_tx_tail are introduced for managing and counting the overall outstanding sent packets, then the send queue will be stopped and waken based on these variables only.
Note that no locking is needed since global_tx_head is updated in the xmit flow and global_tx_tail is updated in the NAPI flow only. A previous attempt tried to use one variable to count the outstanding sent packets, but it did not work since xmit and NAPI flows can run at the same time and the counter will be updated wrongly. Thus, we use the same simple cyclic head and tail scheme that we have today for the UD tx_ring.
Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable in all modes") Link: https://lore.kernel.org/r/20200527134705.480068-1-leon@kernel.org Signed-off-by: Valentine Fatiev <valentinef@mellanox.com> Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|
Revision tags: v5.4.42, v5.4.41 |
|
#
8f149b68 |
| 11-May-2020 |
Gary Leshner <Gary.S.Leshner@intel.com> |
IB/ipoib: Add capability to switch between datagram and connected mode
This is the prerequisite modification to the ipoib ulp to allow a rdma netdev to obtain the default ndo ops for init/uninit/ope
IB/ipoib: Add capability to switch between datagram and connected mode
This is the prerequisite modification to the ipoib ulp to allow a rdma netdev to obtain the default ndo ops for init/uninit/open/close.
This is accomplished by setting the netdev ops field within the callback function passed to the netdev allocation routine which in turn was passed into the rdma netdev allocation routine.
This allows the rdma netdev to call back into the ulp to create the resources required for connected mode operation.
Additionally as the ulp is not re-entrant, when switching modes, the number of real tx queues is set to 1 for the connected mode.
For datagram mode the number of real tx queues is set to the actual number of tx queues specified at the netdev's allocation.
For the internal ulp netdev the number of tx queues defaults to 1.
It is up to the rdma netdev to specify the actual number it can support.
When the driver does not support a rdma netdev for acceleration, (-ENOTSUPPORTED return code or the verbs function for allocation is NULL) the ipoib ulp functions are unaffected by using the internal netdev allocated by the ipoib ulp.
Link: https://lore.kernel.org/r/20200511160706.173205.19086.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|
#
b7e159eb |
| 11-May-2020 |
Gary Leshner <Gary.S.Leshner@intel.com> |
IB/{hfi1, ipoib, rdma}: Broadcast ping sent packets which exceeded mtu size
When in connected mode ipoib sent broadcast pings which exceeded the mtu size for broadcast addresses.
Add an mtu attribu
IB/{hfi1, ipoib, rdma}: Broadcast ping sent packets which exceeded mtu size
When in connected mode ipoib sent broadcast pings which exceeded the mtu size for broadcast addresses.
Add an mtu attribute to the rdma_netdev structure which ipoib sets to its mcast mtu size.
The RDMA netdev uses this value to determine if the skb length is too long for the mtu specified and if it is, drops the packet and logs an error about the errant packet.
Link: https://lore.kernel.org/r/20200511160655.173205.14546.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|
#
6d72344c |
| 11-May-2020 |
Kaike Wan <kaike.wan@intel.com> |
IB/ipoib: Increase ipoib Datagram mode MTU's upper limit
Currently the ipoib UD mtu is restricted to 4K bytes. Remove this limitation so that the IPOIB module can potentially use an MTU (in UD mode)
IB/ipoib: Increase ipoib Datagram mode MTU's upper limit
Currently the ipoib UD mtu is restricted to 4K bytes. Remove this limitation so that the IPOIB module can potentially use an MTU (in UD mode) that is bounded by the MTU of the underlying device. A field is added to the ib_port_attr structure to indicate the maximum physical MTU the underlying device supports.
Link: https://lore.kernel.org/r/20200511160618.173205.23053.stgit@awfm-01.aw.intel.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|