Revision tags: v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60 |
|
#
d2b10794 |
| 03-Aug-2021 |
Leon Romanovsky <leonro@nvidia.com> |
RDMA/core: Create clean QP creations interface for uverbs
Unify create QP creation interface to make clean approach to create XRC_TGT and regular QPs.
Link: https://lore.kernel.org/r/5cd50e7d8ad911
RDMA/core: Create clean QP creations interface for uverbs
Unify create QP creation interface to make clean approach to create XRC_TGT and regular QPs.
Link: https://lore.kernel.org/r/5cd50e7d8ad9112545a1a61dea62799a5cb3224a.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
5507f67d |
| 03-Aug-2021 |
Leon Romanovsky <leonro@nvidia.com> |
RDMA/core: Properly increment and decrement QP usecnts
The QP usecnts were incremented through QP attributes structure while decreased through QP itself. Rely on the ib_creat_qp_user() code that ini
RDMA/core: Properly increment and decrement QP usecnts
The QP usecnts were incremented through QP attributes structure while decreased through QP itself. Rely on the ib_creat_qp_user() code that initialized all QP parameters prior returning to the user and increment exactly like destroy does.
Link: https://lore.kernel.org/r/25d256a3bb1fc480b77d7fe439817b993de48610.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
8da9fe4e |
| 03-Aug-2021 |
Leon Romanovsky <leonro@nvidia.com> |
RDMA/core: Reorganize create QP low-level functions
The low-level create QP function grew to be larger than any sensible inline function should be. The inline attribute is not really needed for that
RDMA/core: Reorganize create QP low-level functions
The low-level create QP function grew to be larger than any sensible inline function should be. The inline attribute is not really needed for that function and can be implemented as exported symbol.
Link: https://lore.kernel.org/r/2c08709d86f876c3dfb77684357b2a939e570ca4.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
8fc3beeb |
| 03-Aug-2021 |
Leon Romanovsky <leonro@nvidia.com> |
RDMA/core: Delete duplicated and unreachable code
The ib_create_named_qp() is kernel verb and no kernel users exist that use XRC_INI QP. Hence such QP path is not reachable. In addition, delete dupl
RDMA/core: Delete duplicated and unreachable code
The ib_create_named_qp() is kernel verb and no kernel users exist that use XRC_INI QP. Hence such QP path is not reachable. In addition, delete duplicated assignments of QP attributes from the initialization structure.
Link: https://lore.kernel.org/r/1b4c0d1def5f8f6d26839e14d19da950cc4a0b05.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.10.53 |
|
#
514aee66 |
| 23-Jul-2021 |
Leon Romanovsky <leonro@nvidia.com> |
RDMA: Globally allocate and release QP memory
Convert QP object to follow IB/core general allocation scheme. That change allows us to make sure that restrack properly kref the memory.
Link: https:
RDMA: Globally allocate and release QP memory
Convert QP object to follow IB/core general allocation scheme. That change allows us to make sure that restrack properly kref the memory.
Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com Reviewed-by: Gal Pressman <galpress@amazon.com> #efa Tested-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46 |
|
#
c5f8f2c5 |
| 16-Jun-2021 |
Anand Khoje <anand.a.khoje@oracle.com> |
IB/core: Removed port validity check from ib_get_cached_subnet_prefix
Removed port validity check from ib_get_cached_subnet_prefix() as this check is not needed because "port_num" is valid.
Link: h
IB/core: Removed port validity check from ib_get_cached_subnet_prefix
Removed port validity check from ib_get_cached_subnet_prefix() as this check is not needed because "port_num" is valid.
Link: https://lore.kernel.org/r/20210616154509.1047-2-anand.a.khoje@oracle.com Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
526a12c8 |
| 11-Jun-2021 |
Jason Gunthorpe <jgg@nvidia.com> |
RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
This code is trying to attach a list of counters grouped into 4 groups to the ib_port sysfs. Instead of creating a bunch of
RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's
This code is trying to attach a list of counters grouped into 4 groups to the ib_port sysfs. Instead of creating a bunch of kobjects simply express everything naturally as an ib_port_attribute and add a single attribute_groups list.
Remove all the naked kobject manipulations.
Link: https://lore.kernel.org/r/0d5a7241ee0fe66622de04fcbaafaf6a791d5c7c.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
b7066b32 |
| 11-Jun-2021 |
Jason Gunthorpe <jgg@nvidia.com> |
RDMA/core: Create the device hw_counters through the normal groups mechanism
Instead of calling device_add_groups() add the group to the existing groups array which is managed through device_add().
RDMA/core: Create the device hw_counters through the normal groups mechanism
Instead of calling device_add_groups() add the group to the existing groups array which is managed through device_add().
This requires setting up the hw_counters before device_add(), so it gets split up from the already split port sysfs flow.
Move all the memory freeing to the release function.
Link: https://lore.kernel.org/r/666250d937b64f6fdf45da9e2dc0b6e5e4f7abd8.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
d8a58838 |
| 11-Jun-2021 |
Jason Gunthorpe <jgg@nvidia.com> |
RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer
It is much saner to store a pointer to the kobject structure that contains the cannonical stats pointer than to copy the
RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer
It is much saner to store a pointer to the kobject structure that contains the cannonical stats pointer than to copy the stats pointers into a public structure.
Future patches will require the sysfs pointer for other purposes.
Link: https://lore.kernel.org/r/f90551dfd296cde1cb507bbef27cca9891d19871.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20 |
|
#
1fb7f897 |
| 01-Mar-2021 |
Mark Bloch <mbloch@nvidia.com> |
RDMA: Support more than 255 rdma ports
Current code uses many different types when dealing with a port of a RDMA device: u8, unsigned int and u32. Switch to u32 to clean up the logic.
This allows u
RDMA: Support more than 255 rdma ports
Current code uses many different types when dealing with a port of a RDMA device: u8, unsigned int and u32. Switch to u32 to clean up the logic.
This allows us to make (at least) the core view consistent and use the same type. Unfortunately not all places can be converted. Many uverbs functions expect port to be u8 so keep those places in order not to break UAPIs. HW/Spec defined values must also not be changed.
With the switch to u32 we now can support devices with more than 255 ports. U32_MAX is reserved to make control logic a bit easier to deal with. As a device with U32_MAX ports probably isn't going to happen any time soon this seems like a non issue.
When a device with more than 255 ports is created uverbs will report the RDMA device as having 255 ports as this is the max currently supported.
The verbs interface is not changed yet because the IBTA spec limits the port size in too many places to be u8 and all applications that relies in verbs won't be able to cope with this change. At this stage, we are extending the interfaces that are using vendor channel solely
Once the limitation is lifted mlx5 in switchdev mode will be able to have thousands of SFs created by the device. As the only instance of an RDMA device that reports more than 255 ports will be a representor device and it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other ULPs aren't effected by this change and their sysfs/interfaces that are exposes to userspace can remain unchanged.
While here cleanup some alignment issues and remove unneeded sanity checks (mainly in rdmavt),
Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10 |
|
#
286e1d3f |
| 08-Dec-2020 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
RDMA/core: Clean up cq pool mechanism
The CQ pool mechanism had two problems:
1. The CQ pool lists were uninitialized in the device registration error flow. As a result, all the list pointers r
RDMA/core: Clean up cq pool mechanism
The CQ pool mechanism had two problems:
1. The CQ pool lists were uninitialized in the device registration error flow. As a result, all the list pointers remained NULL. This caused the kernel to crash (in procedure ib_cq_pool_destroy) when that error flow was taken (and unregister called). The stack trace snippet:
BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0×0000) ? not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI . . . RIP: 0010:ib_cq_pool_destroy+0x1b/0×70 [ib_core] . . . Call Trace: disable_device+0x9f/0×130 [ib_core] __ib_unregister_device+0x35/0×90 [ib_core] ib_register_device+0x529/0×610 [ib_core] __mlx5_ib_add+0x3a/0×70 [mlx5_ib] mlx5_add_device+0x87/0×1c0 [mlx5_core] mlx5_register_interface+0x74/0xc0 [mlx5_core] do_one_initcall+0x4b/0×1f4 do_init_module+0x5a/0×223 load_module+0x1938/0×1d40
2. At device unregister, when cleaning up the cq pool, the cq's in the pool lists were freed, but the cq entries were left in the list.
The fix for the first issue is to initialize the cq pool lists when the ib_device structure is allocated for a new device (in procedure _ib_alloc_device).
The fix for the second problem is to delete cq entries from the pool lists when cleaning up the cq pool.
In addition, procedure ib_cq_pool_destroy() is renamed to the more appropriate name ib_cq_pool_cleanup().
Fixes: 4aa1615268a8 ("RDMA/core: Fix ordering of CQ pool destruction") Link: https://lore.kernel.org/r/20201208073545.9723-2-leon@kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
66f57b87 |
| 17-Nov-2020 |
Leon Romanovsky <leonro@mellanox.com> |
RDMA/restrack: Support all QP types
The latest changes in restrack name handling allowed to simplify the QP creation code to support all types of QPs.
For example XRC QP are presented with rdmatool
RDMA/restrack: Support all QP types
The latest changes in restrack name handling allowed to simplify the QP creation code to support all types of QPs.
For example XRC QP are presented with rdmatool.
$ ibv_xsrq_pingpong & $ rdma res show qp link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 7 type UD state RTS sq-psn 0 comm [mlx5_ib] link ibp0s9/1 lqpn 42 type XRC_TGT state INIT sq-psn 0 path-mig-state MIGRATED comm [ib_uverbs] link ibp0s9/1 lqpn 43 type XRC_INI state INIT sq-psn 0 path-mig-state MIGRATED pdn 197 pid 419 comm ibv_xsrq_pingpong
Link: https://lore.kernel.org/r/20201117070148.1974114-4-leon@kernel.org Reviewed-by: Mark Zhang <markz@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
0413755c |
| 04-Nov-2020 |
Leon Romanovsky <leonro@mellanox.com> |
RDMA/restrack: Store all special QPs in restrack DB
Special QPs (SMI and GSI) have different rules in regards of their QP numbers. While all other QP numbers are unique per-device, the QP0 and QP1 a
RDMA/restrack: Store all special QPs in restrack DB
Special QPs (SMI and GSI) have different rules in regards of their QP numbers. While all other QP numbers are unique per-device, the QP0 and QP1 are created per-port as requested by IBTA.
In multiple port devices, the number of SMI and GSI QPs with be equal to the number ports.
$ rdma dev 0: ibp0s9: node_type ca fw 4.4.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 $ rdma link 0/1: ibp0s9/1: subnet_prefix fe80:0000:0000:0000 lid 13397 sm_lid 49151 lmc 0 state ACTIVE physical_state LINK_UP 0/2: ibp0s9/2: subnet_prefix fe80:0000:0000:0000 lid 13397 sm_lid 49151 lmc 0 state UNKNOWN physical_state UNKNOWN
Before: $ rdma res show qp type SMI,GSI link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
After: $ rdma res show qp type SMI,GSI link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] link ibp0s9/2 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/2 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
Link: https://lore.kernel.org/r/20201104144008.3808124-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11 |
|
#
b09c4d70 |
| 22-Sep-2020 |
Leon Romanovsky <leonro@mellanox.com> |
RDMA/restrack: Improve readability in task name management
Use rdma_restrack_set_name() and rdma_restrack_parent_name() instead of tricky uses of rdma_restrack_attach_task()/rdma_restrack_uadd().
T
RDMA/restrack: Improve readability in task name management
Use rdma_restrack_set_name() and rdma_restrack_parent_name() instead of tricky uses of rdma_restrack_attach_task()/rdma_restrack_uadd().
This uniformly makes all restracks add'd using rdma_restrack_add().
Link: https://lore.kernel.org/r/20200922091106.2152715-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
c34a23c2 |
| 22-Sep-2020 |
Leon Romanovsky <leonro@mellanox.com> |
RDMA/restrack: Simplify restrack tracking in kernel flows
Have a single rdma_restrack_add() that adds an entry, there is no reason to split the user/kernel here, the rdma_restrack_set_task() is resp
RDMA/restrack: Simplify restrack tracking in kernel flows
Have a single rdma_restrack_add() that adds an entry, there is no reason to split the user/kernel here, the rdma_restrack_set_task() is responsible for this difference.
This patch prepares the code to the future requirement of making restrack is mandatory for managing ib objects.
Link: https://lore.kernel.org/r/20200922091106.2152715-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
13ef5539 |
| 22-Sep-2020 |
Leon Romanovsky <leonro@mellanox.com> |
RDMA/restrack: Count references to the verbs objects
Refactor the restrack code to make sure the kref inside the restrack entry properly kref's the object in which it is embedded. This slight change
RDMA/restrack: Count references to the verbs objects
Refactor the restrack code to make sure the kref inside the restrack entry properly kref's the object in which it is embedded. This slight change is needed for future conversions of MR and QP which are refcounted before the release and kfree.
The ideal flow from ib_core perspective as follows: * Allocate ib_* structure with rdma_zalloc_*. * Set everything that is known to ib_core to that newly created object. * Initialize kref with restrack help * Call to driver specific allocation functions. * Insert into restrack DB .... * Return and release restrack with restrack_put.
Largely this means a rdma_restrack_new() should be called near allocating the containing structure.
Link: https://lore.kernel.org/r/20200922091106.2152715-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43 |
|
#
c7ff819a |
| 27-May-2020 |
Yamin Friedman <yaminf@mellanox.com> |
RDMA/core: Introduce shared CQ pool API
Allow a ULP to ask the core to provide a completion queue based on a least-used search on a per-device CQ pools. The device CQ pools grow in a lazy fashion wh
RDMA/core: Introduce shared CQ pool API
Allow a ULP to ask the core to provide a completion queue based on a least-used search on a per-device CQ pools. The device CQ pools grow in a lazy fashion when more CQs are requested.
This feature reduces the amount of interrupts when using many QPs. Using shared CQs allows for more effcient completion handling. It also reduces the amount of overhead needed for CQ contexts.
Test setup: Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz servers. Running NVMeoF 4KB read IOs over ConnectX-5EX across Spectrum switch. TX-depth = 32. The patch was applied in the nvme driver on both the target and initiator. Four controllers are accessed from each core. In the current test case we have exposed sixteen NVMe namespaces using four different subsystems (four namespaces per subsystem) from one NVM port. Each controller allocated X queues (RDMA QPs) and attached to Y CQs. Before this series we had X == Y, i.e for four controllers we've created total of 4X QPs and 4X CQs. In the shared case, we've created 4X QPs and only X CQs which means that we have four controllers that share a completion queue per core. Until fourteen cores there is no significant change in performance and the number of interrupts per second is less than a million in the current case. ================================================== |Cores|Current KIOPs |Shared KIOPs |improvement| |-----|---------------|--------------|-----------| |14 |2332 |2723 |16.7% | |-----|---------------|--------------|-----------| |20 |2086 |2712 |30% | |-----|---------------|--------------|-----------| |28 |1971 |2669 |35.4% | |================================================= |Cores|Current avg lat|Shared avg lat|improvement| |-----|---------------|--------------|-----------| |14 |767us |657us |14.3% | |-----|---------------|--------------|-----------| |20 |1225us |943us |23% | |-----|---------------|--------------|-----------| |28 |1816us |1341us |26.1% | ======================================================== |Cores|Current interrupts|Shared interrupts|improvement| |-----|------------------|-----------------|-----------| |14 |1.6M/sec |0.4M/sec |72% | |-----|------------------|-----------------|-----------| |20 |2.8M/sec |0.6M/sec |72.4% | |-----|------------------|-----------------|-----------| |28 |2.9M/sec |0.8M/sec |63.4% | ==================================================================== |Cores|Current 99.99th PCTL lat|Shared 99.99th PCTL lat|improvement| |-----|------------------------|-----------------------|-----------| |14 |67ms |6ms |90.9% | |-----|------------------------|-----------------------|-----------| |20 |5ms |6ms |-10% | |-----|------------------------|-----------------------|-----------| |28 |8.7ms |6ms |25.9% | |===================================================================
Performance improvement with sixteen disks (sixteen CQs per core) is comparable.
Link: https://lore.kernel.org/r/1590568495-101621-3-git-send-email-yaminf@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|
Revision tags: v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23 |
|
#
e38b55ea |
| 27-Feb-2020 |
Maor Gottlieb <maorg@mellanox.com> |
RDMA/core: Fix protection fault in ib_mr_pool_destroy
Fix NULL pointer dereference in the error flow of ib_create_qp_user when accessing to uninitialized list pointers - rdma_mrs and sig_mrs. The fo
RDMA/core: Fix protection fault in ib_mr_pool_destroy
Fix NULL pointer dereference in the error flow of ib_create_qp_user when accessing to uninitialized list pointers - rdma_mrs and sig_mrs. The following crash from syzkaller revealed it.
kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 23167 Comm: syz-executor.1 Not tainted 5.5.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:ib_mr_pool_destroy+0x81/0x1f0 Code: 00 00 fc ff df 49 c1 ec 03 4d 01 fc e8 a8 ea 72 fe 41 80 3c 24 00 0f 85 62 01 00 00 48 8b 13 48 89 d6 4c 8d 6a c8 48 c1 ee 03 <42> 80 3c 3e 00 0f 85 34 01 00 00 48 8d 7a 08 4c 8b 02 48 89 fe 48 RSP: 0018:ffffc9000951f8b0 EFLAGS: 00010046 RAX: 0000000000040000 RBX: ffff88810f268038 RCX: ffffffff82c41628 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc9000951f850 RBP: ffff88810f268020 R08: 0000000000000004 R09: fffff520012a3f0a R10: 0000000000000001 R11: fffff520012a3f0a R12: ffffed1021e4d007 R13: ffffffffffffffc8 R14: 0000000000000246 R15: dffffc0000000000 FS: 00007f54bc788700(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000116920002 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rdma_rw_cleanup_mrs+0x15/0x30 ib_destroy_qp_user+0x674/0x7d0 ib_create_qp_user+0xb01/0x11c0 create_qp+0x1517/0x2130 ib_uverbs_create_qp+0x13e/0x190 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465b49 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f54bc787c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49 RDX: 0000000000000040 RSI: 0000000020000540 RDI: 0000000000000003 RBP: 00007f54bc787c70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54bc7886bc R13: 00000000004ca2ec R14: 000000000070ded0 R15: 0000000000000005
Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API") Link: https://lore.kernel.org/r/20200227112708.93023-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|
Revision tags: v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9 |
|
#
620d3f81 |
| 08-Jan-2020 |
Jason Gunthorpe <jgg@mellanox.com> |
RDMA/core: Do not erase the type of ib_qp.uobject
This is a struct ib_uqp_object pointer, instead of using container_of() all over the place just store it with its actual type.
Link: https://lore.k
RDMA/core: Do not erase the type of ib_qp.uobject
This is a struct ib_uqp_object pointer, instead of using container_of() all over the place just store it with its actual type.
Link: https://lore.kernel.org/r/1578504126-9400-8-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|
Revision tags: v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3 |
|
#
6b57cea9 |
| 12-Dec-2019 |
Parav Pandit <parav@mellanox.com> |
IB/core: Let IB core distribute cache update events
Currently when the low level driver notifies Pkey, GID, and port change events they are notified to the registered handlers in the order they are
IB/core: Let IB core distribute cache update events
Currently when the low level driver notifies Pkey, GID, and port change events they are notified to the registered handlers in the order they are registered.
IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey change events.
Since all GID queries done by ULPs are serviced by IB core, and the IB core deferes cache updates to a work queue, it is possible for other clients to see stale cache data when they handle their own events.
For example, the below call tree shows how ipoib will call rdma_query_gid() concurrently with the update to the cache sitting in the WQ.
mlx5_ib_handle_event() ib_dispatch_event() ib_cache_event() queue_work() -> slow cache update
[..] ipoib_event() queue_work() [..] work handler ipoib_ib_dev_flush_light() __ipoib_ib_dev_flush() ipoib_dev_addr_changed_valid() rdma_query_gid() <- Returns old GID, cache not updated.
Move all the event dispatch to a work queue so that the cache update is always done before any clients are notified.
Fixes: f35faa4ba956 ("IB/core: Simplify ib_query_gid to always refer to cache") Link: https://lore.kernel.org/r/20191212113024.336702-3-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|
Revision tags: v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9 |
|
#
c043ff2c |
| 30-Oct-2019 |
Michal Kalderon <michal.kalderon@marvell.com> |
RDMA: Connect between the mmap entry and the umap_priv structure
The rdma_user_mmap_io interface created a common interface for drivers to correctly map hw resources and zap them once the ucontext i
RDMA: Connect between the mmap entry and the umap_priv structure
The rdma_user_mmap_io interface created a common interface for drivers to correctly map hw resources and zap them once the ucontext is destroyed enabling the drivers to safely free the hw resources.
However, this meant the drivers need to delay freeing the resource to the ucontext destroy phase to ensure they were no longer mapped. The new mechanism for a common way of handling user/driver address mapping enabled notifying the driver if all umap_priv mappings were removed, and enabled freeing the hw resources when they are done with and not delay it until ucontext destroy.
Since not all drivers use the mechanism, NULL can be sent to the rdma_user_mmap_io interface to continue working as before. Drivers that use the mmap_xa interface can pass the entry being mapped to the rdma_user_mmap_io function to be linked together.
Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|
#
b86deba9 |
| 30-Oct-2019 |
Michal Kalderon <michal.kalderon@marvell.com> |
RDMA/core: Move core content from ib_uverbs to ib_core
Move functionality that is called by the driver, which is related to umap, to a new file that will be linked in ib_core. This is a first step i
RDMA/core: Move core content from ib_uverbs to ib_core
Move functionality that is called by the driver, which is related to umap, to a new file that will be linked in ib_core. This is a first step in later enabling ib_uverbs to be optional. vm_ops is now initialized in ib_uverbs_mmap instead of priv_init to avoid having to move all the rdma_umap functions as well.
Link: https://lore.kernel.org/r/20191030094417.16866-2-michal.kalderon@marvell.com Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|
Revision tags: v5.3.8, v5.3.7 |
|
#
549af008 |
| 15-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
IB/core: Avoid deadlock during netlink message handling
When rdmacm module is not loaded, and when netlink message is received to get char device info, it results into a deadlock due to recursive lo
IB/core: Avoid deadlock during netlink message handling
When rdmacm module is not loaded, and when netlink message is received to get char device info, it results into a deadlock due to recursive locking of rdma_nl_mutex with the below call sequence.
[..] rdma_nl_rcv() mutex_lock() [..] rdma_nl_rcv_msg() ib_get_client_nl_info() request_module() iw_cm_init() rdma_nl_register() mutex_lock(); <- Deadlock, acquiring mutex again
Due to above call sequence, following call trace and deadlock is observed.
kernel: __mutex_lock+0x35e/0x860 kernel: ? __mutex_lock+0x129/0x860 kernel: ? rdma_nl_register+0x1a/0x90 [ib_core] kernel: rdma_nl_register+0x1a/0x90 [ib_core] kernel: ? 0xffffffffc029b000 kernel: iw_cm_init+0x34/0x1000 [iw_cm] kernel: do_one_initcall+0x67/0x2d4 kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0 kernel: do_init_module+0x5a/0x223 kernel: load_module+0x1998/0x1e10 kernel: ? __symbol_put+0x60/0x60 kernel: __do_sys_finit_module+0x94/0xe0 kernel: do_syscall_64+0x5a/0x270 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
process stack trace: [<0>] __request_module+0x1c9/0x460 [<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core] [<0>] nldev_get_chardev+0x1ac/0x320 [ib_core] [<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core] [<0>] rdma_nl_rcv+0xcd/0x120 [ib_core] [<0>] netlink_unicast+0x179/0x220 [<0>] netlink_sendmsg+0x2f6/0x3f0 [<0>] sock_sendmsg+0x30/0x40 [<0>] ___sys_sendmsg+0x27a/0x290 [<0>] __sys_sendmsg+0x58/0xa0 [<0>] do_syscall_64+0x5a/0x270 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
To overcome this deadlock and to allow multiple netlink messages to progress in parallel, following scheme is implemented.
1. Split the lock protecting the cb_table into a per-index lock, and make it a rwlock. This lock is used to ensure no callbacks are running after unregistration returns. Since a module will not be registered once it is already running callbacks, this avoids the deadlock.
2. Use smp_store_release() to update the cb_table during registration so that no lock is required. This avoids lockdep problems with thinking all the rwsems are the same lock class.
Fixes: 0e2d00eb6fd45 ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload") Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|
Revision tags: v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12, v5.2.11, v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6 |
|
#
52e0a118 |
| 01-Aug-2019 |
Gal Pressman <galpress@amazon.com> |
RDMA/restrack: Track driver QP types in resource tracker
The check for QP type different than XRC has excluded driver QP types from the resource tracker. As a result, "rdma resource show" user comma
RDMA/restrack: Track driver QP types in resource tracker
The check for QP type different than XRC has excluded driver QP types from the resource tracker. As a result, "rdma resource show" user command would not show opened driver QPs which does not reflect the real state of the system.
Check QP type explicitly instead of assuming enum values/ordering.
Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation") Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190801104354.11417-1-galpress@amazon.com Signed-off-by: Doug Ledford <dledford@redhat.com>
show more ...
|
Revision tags: v5.2.5, v5.2.4, v5.2.3 |
|
#
1d2fedd8 |
| 23-Jul-2019 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Support netlink commands in non init_net net namespaces
Now that IB core supports RDMA device binding with specific net namespace, enable IB core to accept netlink commands in non init_ne
RDMA/core: Support netlink commands in non init_net net namespaces
Now that IB core supports RDMA device binding with specific net namespace, enable IB core to accept netlink commands in non init_net namespaces.
This is done by having per net namespace netlink socket.
At present only netlink device handling client RDMA_NL_NLDEV supports device handling in multiple net namespaces. Hence do not accept netlink messages for other clients in non init_net net namespaces.
Link: https://lore.kernel.org/r/20190723070205.6247-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
show more ...
|