History log of /openbmc/linux/net/Makefile (Results 276 – 300 of 857)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 36f87780 24-Jan-2017 David S. Miller <davem@davemloft.net>

Merge branch 'packet-sampling-offload'

Jiri Pirko says:

====================
Add support for offloading packet-sampling

Yotam says:

The first patch introduces the psample module, a netlink channe

Merge branch 'packet-sampling-offload'

Jiri Pirko says:

====================
Add support for offloading packet-sampling

Yotam says:

The first patch introduces the psample module, a netlink channel dedicated
to packet sampling implemented using generic netlink. This module provides
a generic way for kernel modules to sample packets, while not being tied
to any specific subsystem like NFLOG.

The second patch adds the sample tc action, which uses psample to randomly
sample packets that match a classifier. The user can configure the psample
group number, the sampling rate and the packet's truncation (to save
kernel-user traffic).

The last two patches add the support for offloading the matchall-sample
tc command in the mlxsw driver, for ingress qdiscs.

An example for psample usage can be found in the libpsample project at:
https://github.com/Mellanox/libpsample

v1->v2:
- Reword first patch's commit message
- Fix typo in comment in second patch
- Change order of tc_sample uapi enum to match convention
- Rename act_sample action callback tcf_sample -> tcf_sample_act
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 6ae0a628 23-Jan-2017 Yotam Gigi <yotamg@mellanox.com>

net: Introduce psample, a new genetlink channel for packet sampling

Add a general way for kernel modules to sample packets, without being tied
to any specific subsystem. This netlink channel can be

net: Introduce psample, a new genetlink channel for packet sampling

Add a general way for kernel modules to sample packets, without being tied
to any specific subsystem. This netlink channel can be used by tc,
iptables, etc. and allow to standardize packet sampling in the kernel.

For every sampled packet, the psample module adds the following metadata
fields:

PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable

PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable

PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been
truncated during sampling

PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the
user who initiated the sampling. This field allows the user to
differentiate between several samplers working simultaneously and
filter packets relevant to him

PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The
sequence is kept for each group

PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets

PSAMPLE_ATTR_DATA - the actual packet bits

The sampled packets are sent to the PSAMPLE_NL_MCGRP_SAMPLE multicast
group. In addition, add the GET_GROUPS netlink command which allows the
user to see the current sample groups, their refcount and sequence number.
This command currently supports only netlink dump mode.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 62ed8ced 24-Jan-2017 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v4.10-rc5' into for-linus

Sync up with mainline to apply fixup to a commit that came through
power supply tree.


# f3a3e248 09-Jan-2017 David S. Miller <davem@davemloft.net>

Merge branch 'net-smc'

Ursula Braun says:

====================
net/smc: Shared Memory Communications - RDMA

here is now V4 of the SMC-R patches having processed your feedback from end
of November.

Merge branch 'net-smc'

Ursula Braun says:

====================
net/smc: Shared Memory Communications - RDMA

here is now V4 of the SMC-R patches having processed your feedback from end
of November. The most important change is the replacement of sysfs by a
generic netlink solution in patch 04. And I tried to get rid of the __packed
attributes. There are still a few usages left due to SMC-R protocol defined
structures.

V4 changes:
The order of patches 03 and 04 for pnet table management and SMC IB-client
establishing has been exchanged, since pnet table management is now built on
top of smc_ib_devices.
Patch 01: Use EXPORT_SYMBOL_GPL().
Patch 02: Define "use_fallback" as bool.
Get rid of useless smc_sock fields clearing in smc_sock_alloc(),
since sk_alloc() clears out the memory.
Patch 03: Postpone smc_ib_remember_port_attr() call till ib_device is
mentioned in the pnet table.
Patch 04: Replace sysfs-usage by a generic netlink approach for pnet table
configuration.
Change layout of pnet table entries to reference net_device and
ib_device instead of dealing with names of net_devices and
ib_devices.
Patch 05: Adapt "use_fallback" usages to new type bool.
Get rid of useless smc_sock fields clearing in smc_sock_alloc()
Avoid __packed where possible.
Check if clc responses are not too big.
Patch 09: Postpone smc_setup_per_ibdev till the first connection with this
ib_device is really created.
Patch 11: Get rid of __packed usage.

V3 changes:
Patch 05: Remove unneeded DEFINE_WAIT
Patch 06: Improve synchronization of link group creation
Patch 07: Rename peer_rmbe_len into peer_rmbe_size to be more consistent
Patch 09: Avoid calls of ib_get_memory_region with IB_ACCESS_LOCAL_WRITE,
use new default local_dma_lkey from protection domain as lkey
instead.
Remove no longer needed function smc_ib_dereg_memory_region().
Patch 14: Switch to state ACTIVE only if still in state INIT.
Return 0 for recvmsg invoked in a socket closing state.
Allow getname call in state APPCLOSEWAIT1
Do not trigger destruction of a socket-in-error queued in accept
queue.
During cleanup of accept queue, make sure sockets are destructed,
and sockets in fallback mode are handled appropriately.
When freeing sndbufs/rmbs, remove them from their list and free
the entry.
Use add_wait_queue() and remove_wait_queue() in close wait
functions.
If actively closing a socket in state for PEERFINCLOSEWAIT, keep
this state.
If passively closing a socket while bytes are to be received, move
to state APPCLOSEWAIT1.
If actively aborting a socket, skip sending the close_abort flag,
since RDMA communication is no longer possible.
When terminating a link group, do not schedule link group freeing a
2nd time, since already done when unregistering the last remaining
connection.
Patch 15: Introduce smc_diag module for monitoring SMC protocol sockets.
This replaces the old patch 0015 dealing with procfs.

V2 changes:
Patch 0002: Add SMC versions for family key strings in net/core/sock.c.
Patch 0006: initialize rb_tree.
Patch 0007: Get rid of unneeded use of xchg() in smc_sndbuf_unuse() and
smc_rmb_unuse().
Patch 0008: Correct error checking logic for ib_function calls.
Define struct smc_link field wr_tx_id as atomic_long_t.
Use "do_div" instead of "%" to be architecture-independent.
Patch 0009: Correct error checking logic for ib_function calls.
Patch 0011: Remove xchg() calls in cursor handling. Use atomic64_t for cursor
overlays on 64-bit architectures. If not available, use plain u64
and add locking for cursor reading and writing.
Implement smc_curs_add() without modulo operator "%".
Patch 0012: Remove xchg() calls in cursor handling.
Implement smc_tx_rdma_writes() without module operator "%".
Patch 0013: Remove xchg() calls in cursor handling.
Patch 0014: Return type bool in smc_wr_tx_has_pending().
Remove unneeded semicolon in smc_close_shutdown_write().
Call smc_close_active() in non-fallback case only.
Get rid of duplicate schedule of sock_put_work().
Take nested sock_lock in smc_listen_work().
Start close stream_wait in case of prepared sends only.
Patch 0015: Remove unneeded socket ref_count in smc_proc_seq_show().
Take lock before list_empty check in smc_proc_sock_list_del().

These patches are the initial part of the implementation of the
"Shared Memory Communications-RDMA" (SMC-R) protocol as defined in
RFC7609 [1]. While SMC-R does not aim to replace TCP,
it taps a wealth of existing data center TCP socket applications
to become more efficient without the need for rewriting them.
SMC-R uses RDMA over Converged Ethernet (RoCE) to save CPU consumption.
For instance, when running 10 parallel connections with uperf, we measured
a decrease of 60% in CPU consumption with SMC-R compared to TCP/IP
(with throughput and latency comparable;
measured on x86_64 with the same RoCE card and port).

SMC-R does not require an RDMA communication manager (RDMA CM).

SMC-R inherits TCP qualities such as reliable connections, host-based
firewall packet filtering (on connection establishment) and unmodified
application of communication encryption such as TLS (transport layer
security) or SSL (secure sockets layer). Since original TCP is used to
establish SMC-R connections, load balancers and packet inspection based
on TCP/IP connection establishment continue to work for SMC-R.

On the other hand, using SMC-R implies:
- either involving a preload library when invoking the unchanged TCP-application
or slightly modifying the source by simply changing the socket family in
the socket() call
- accepting extra overhead and latency in connection establishment due to
SMC Connection Layer Control (CLC) handshake
- explicit coupling of RoCE ports with Ethernet ports
- not routable as currently built on RoCE V1
- bypassing of packet-based networking features
- filtering (netfilter)
- sniffing (libpcap, packet sockets, (E)BPF)
- traffic control (scheduling, shaping)
- bypassing of IP-header based socket options
- bypassing of memory buffer (pressure) management
- unusable together with IPsec

Overview of the SMC-R Protocol described in informational RFC 7609

SMC-R is an open protocol that provides RDMA capabilities over RoCE
transparently for applications exploiting TCP sockets.
A new socket protocol family PF_SMC is introduced.
There are no changes required to applications using the sockets API for TCP
stream sockets other than the specification of the new socket family AF_SMC.
Unmodified applications can be used by means of a dynamic preload shared
library which rewrites the socket API call
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) into
socket(AF_SMC, SOCK_STREAM, IPPROTO_TCP).
SMC-R re-uses the address family AF_INET for all addressing purposes around
struct sockaddr.

SMC-R system architecture layers:

+=============================================================================+
| | unmodified TCP application |
| native SMC application +--------------------------------------+
| | dynamic preload shared library |
+=============================================================================+
| SMC socket |
+-----------------------------------------------------------------------------+
| | TCP socket (for connection establishment and fallback) |
| IB verbs +--------------------------------------------------------+
| | IP |
+--------------------+--------------------------------------------------------+
| RoCE device driver | some network device driver |
+=============================================================================+

Terms:

A link group is determined by an ordered peer pair of TCP client and TCP server
(IP addresses and subnet). Reversed client server roles cause an own link group.
A link is a logical point-to-point connection based on an
infiniband reliable connected queue pair (RC-QP) between two RoCE ports
(MACs and GIDs) of a peer pair.
A link group can have 1..8 links for failover and load balancing.
This initial Linux implementation always has 1 link per link group.
Each link group on a peer can have 1..255 remote memory buffers (RMBs).
If more RMBs are needed, a peer can open another link group
(this initial Linux implementation) or fall back to TCP.
Each RMB has its own particular size and its own (R)DMA mapping and credentials
(rtoken consisting of rkey and RDMA "virtual address").
This initial Linux implementation uses physically contiguous memory for RMBs
but we are working towards scattered memory because of memory fragmentation.
Each RMB has 1..255 RMB elements (RMBEs) of equal size
to provide multiplexing of connections within an RMB.
An RMBE is the RDMA Write destination organized as wrapping ring buffer
for data transmit of a particular connection in one direction
(duplex by means of mirror symmetry as with TCP).
This initial Linux implementation always has 1 RMBE per RMB
and thus an individual RMB for each connection.

SMC-R connection establishment with subsequent data transfer:

CLIENT SERVER

TCP three-way handshake:
regular TCP SYN
-------------------------------------------------------->
regular TCP SYN ACK
<--------------------------------------------------------
regular TCP ACK
-------------------------------------------------------->

SMC Connection Layer Control (CLC) handshake
exchanges RDMA credentials between peers:
via above TCP connection: SMC CLC Proposal
-------------------------------------------------------->
via above TCP connection: SMC CLC Accept
<--------------------------------------------------------
via above TCP connection: SMC CLC Confirm
-------------------------------------------------------->

SMC Link Layer Control (LLC) (only once per link, i.e. 1st conn. of link group):
RoCE RC-QP: SMC LLC Confirm Link
<========================================================
RoCE RC-QP: SMC LLC Confirm Link response
========================================================>

SMC data transmission (incl. SMC Connection Data Control (CDC) message):
RoCE RC-QP: RDMA Write
========================================================>
RoCE RC-QP: SMC CDC message (flow control)
========================================================>
...

RoCE RC-QP: RDMA Write
<========================================================
RoCE RC-QP: SMC CDC message (flow control)
<========================================================
...

Data flow within an established connection:

+----------------------------------------------------------------------------
| SENDER
| sendmsg()
| |
| | produces into sndbuf [sender's process context]
| v
| +--------+
| | sndbuf | [ring buffer]
| +--------+
| |
| | consumes from sndbuf and produces into receiver's RMBE [any context]
| | by sending RDMA Write followed by SMC CDC message over RoCE RC-QP
| |
+----|-----------------------------------------------------------------------
|
+----|-----------------------------------------------------------------------
| v RECEIVER
| +------+
| | RMBE | [ring buffer, can have size different from sender's sndbuf]
| | | [RMBE represents rcvbuf, no further de-coupling as on sender side]
| +------+
| |
| | consumes from RMBE [receiver's process context]
| v
| recvmsg()
+----------------------------------------------------------------------------

Flow control ("cursor" updates) by means of SMC CDC messages:

SENDER RECEIVER

sends updates via CDC-------------+ sends updates via CDC
on consuming from sndbuf | on consuming from RMBE
and producing into RMBE | by means of recvmsg()
| |
| |
+-----------------------------------|------------+
| |
+--v-------------------------+ +--v-----------------------+
| receiver's consumer cursor | | sender's producer cursor----+
+----------------|-----------+ +--------------------------+ |
| |
| receiver's RMBE |
| +--------------------------+ |
| | | |
+--------------------------------+ | |
| | | |
| v | |
| +------------| |
|-------------+////////////| |
|//RDMA data written by////| |
|////sender that is////////| |
|/available to be consumed/| |
|///////// +---------------| |
|----------+^ | |
| | | |
| +-----------------+
| |
+--------------------------+

Sending updates of the producer cursor is immediate for low latency;
something like Nagle's algorithm (absence of TCP_NODELAY) is optional and
currently not part of this initial Linux implementation.
Sending updates of the consumer cursor is conditional to avoid the
silly window syndrome.

Normal connection termination:

Normal connection termination starts transitioning from socket state
ACTIVE via either "Active Close" or "Passive Close".

shutdown rdwr +-----------------+
or close, +-------------->| INIT / CLOSED |<-------------+
send PeerCon|nClosed +-----------------+ | PeerConnClosed
| | | received
| connection | established |
| V |
+----------------+ +-----------------+ +----------------+
|AppFinCloseWait | | ACTIVE | |PeerFinCloseWait|
+----------------+ +-----------------+ +----------------+
| | | |
| Active Close: | |Passive Close: |
| close or | |PeerConnClosed or |
| shutdown wr or| |PeerDoneWriting |
| shutdown rdwr | |received |

| V V |
PeerConnClo|sed +--------------+ +-------------+ | close or
received +--<----|PeerCloseWait1| |AppCloseWait1|--->----+ shutdown rdwr,
| +--------------+ +-------------+ | send
| PeerDoneWri|ting | shutdown wr, | PeerConnClosed
| received | send Pee|rDoneWriting |
| V V |
| +--------------+ +-------------+ |
+--<----|PeerCloseWait2| |AppCloseWait2|--->----+
+--------------+ +-------------+

In state CLOSED, the socket can be destructed only, once the application has
issued a close().

Abnormal connection termination:

+-----------------+
+-------------->| INIT / CLOSED |<-------------+
| +-----------------+ |
| |
| +-----------------------+ |
| | Any state | |
PeerConnAbo|rt | (before setting | | send
received | | PeerConnClosed | | PeerConnAbort
| | indicator in | |
| | peer's RMBE) | |
| +-----------------------+ |
| | | |
| Active Abort: | | Passive Abort: |
| problem, | | PeerConnAbort |
| send | | received, |
| PeerConnAbort,| | ECONNRESET |
| ECONNABORTED | | |
| V V |
| +--------------+ +--------------+ |
+-------|PeerAbortWait | | ProcessAbort |------+
+--------------+ +--------------+

Implementation notes beyond RFC 7609:

A PNET table in sysfs provides the mapping between network device names and
RoCE Infiniband device names for the transparent switch of data communication.
A PNET table can contain an arbitrary number of PNETIDs.
Each PNETID contains exactly one (Ethernet) network device name
and one or more RoCE Infiniband device names.
Each device name can only exist in at most one PNETID (no overlapping).
This initial Linux implementation allows at most one RoCE Infiniband device
name per PNETID.
After a new TCP connection is established, the network device name
used for egress traffic with the TCP connection's local source IP address
is used as key to lookup the unique PNETID, and the RoCE Infiniband device
of this PNETID is used to switch data communication from TCP to RDMA
during SMC CLC handshake.

Problem determination:

A protocol dissector is available with upstream wireshark for formatting
SMC-R related RoCE LAN traffic.
[https://code.wireshark.org/review/gitweb?p=wireshark.git;a=blob;f=epan/dissectors/packet-smcr.c]

We are working on enhancing the Linux implementation to cover:

- Improve default socket closing asynchronicity
- Address corner cases with many parallel connections
- Tracing
- Integrated load balancing and fail-over within a link group
- Splice and sendpage support
- IPv6 addressing support
- Keepalive, Cork
- Namespaces support
- Urgent data
- More socket options
- Diagnostics
- Statistics support
- SNMP support

References:

[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# ac713874 09-Jan-2017 Ursula Braun <ubraun@linux.vnet.ibm.com>

smc: establish new socket family

* enable smc module loading and unloading
* register new socket family
* basic smc socket creation and deletion
* use backing TCP socket to run CLC (Connection La

smc: establish new socket family

* enable smc module loading and unloading
* register new socket family
* basic smc socket creation and deletion
* use backing TCP socket to run CLC (Connection Layer Control)
handshake of SMC protocol
* Setup for infiniband traffic is implemented in follow-on patches.
For now fallback to TCP socket is always used.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# f26e8817 16-Dec-2016 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 4.10 merge window.


Revision tags: v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31
# 712cba5d 07-Nov-2016 Max Filippov <jcmvbkbc@gmail.com>

Merge tag 'v4.9-rc3' into xtensa-for-next

Linux 4.9-rc3


# cc9b9402 04-Nov-2016 Mark Brown <broonie@kernel.org>

Merge branch 'topic/error' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into regulator-fixed


# 9902aa47 01-Nov-2016 Russell King <rmk+kernel@armlinux.org.uk>

Merge branch 'drm-tda998x-mali' into drm-tda998x-devel


Revision tags: v4.4.30, v4.4.29
# fe0f59c4 30-Oct-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge back earlier cpufreq material for v4.10.


Revision tags: v4.4.28
# 0fc4f78f 25-Oct-2016 Daniel Vetter <daniel.vetter@ffwll.ch>

Merge remote-tracking branch 'airlied/drm-next' into topic/drm-misc

Backmerge latest drm-next to have a baseline for the
s/fence/dma_fence/ patch from Chris.

Signed-off-by: Daniel Vetter <daniel.ve

Merge remote-tracking branch 'airlied/drm-next' into topic/drm-misc

Backmerge latest drm-next to have a baseline for the
s/fence/dma_fence/ patch from Chris.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

show more ...


# f9bf1d97 25-Oct-2016 Daniel Vetter <daniel.vetter@ffwll.ch>

Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued

Backmerge because Chris Wilson needs the very latest&greates of
Gustavo Padovan's sync_file work, specifically the refcount

Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued

Backmerge because Chris Wilson needs the very latest&greates of
Gustavo Padovan's sync_file work, specifically the refcounting changes
from:

commit 30cd85dd6edc86ea8d8589efb813f1fad41ef233
Author: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Date: Wed Oct 19 15:48:32 2016 -0200

dma-buf/sync_file: hold reference to fence when creating sync_file

Also good to sync in general since git tends to get confused with the
cherry-picking going on.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

show more ...


Revision tags: v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26
# aea98380 17-Oct-2016 Mauro Carvalho Chehab <mchehab@s-opensource.com>

Merge tag 'v4.9-rc1' into patchwork

Linux 4.9-rc1

* tag 'v4.9-rc1': (13774 commits)
Linux 4.9-rc1
score: traps: Add missing include file to fix build error
fs/super.c: don't fool lockdep in f

Merge tag 'v4.9-rc1' into patchwork

Linux 4.9-rc1

* tag 'v4.9-rc1': (13774 commits)
Linux 4.9-rc1
score: traps: Add missing include file to fix build error
fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths
fs/super.c: fix race between freeze_super() and thaw_super()
overlayfs: Fix setting IOP_XATTR flag
iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector()
CIFS: Retrieve uid and gid from special sid if enabled
CIFS: Add new mount option to set owner uid and gid from special sids in acl
qedr: Add events support and register IB device
qedr: Add GSI support
qedr: Add LL2 RoCE interface
qedr: Add support for data path
qedr: Add support for memory registeration verbs
qedr: Add support for QP verbs
qedr: Add support for PD,PKEY and CQ verbs
qedr: Add support for user context verbs
qedr: Add support for RoCE HW init
qedr: Add RoCE driver framework
pkeys: Remove easily triggered WARN
MIPS: Wire up new pkey_{mprotect,alloc,free} syscalls
...

show more ...


Revision tags: v4.7.8, v4.4.25
# 4d69f155 16-Oct-2016 Ingo Molnar <mingo@kernel.org>

Merge tag 'v4.9-rc1' into x86/fpu, to resolve conflict

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4a7126a2 13-Oct-2016 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v4.8' into next

Sync up with mainline to bring in I2C host notify changes and other
updates.


# 8b2ada27 28-Oct-2016 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge branches 'pm-cpufreq-fixes' and 'pm-sleep-fixes'

* pm-cpufreq-fixes:
cpufreq: intel_pstate: Always set max P-state in performance mode
cpufreq: intel_pstate: Set P-state upfront in perform

Merge branches 'pm-cpufreq-fixes' and 'pm-sleep-fixes'

* pm-cpufreq-fixes:
cpufreq: intel_pstate: Always set max P-state in performance mode
cpufreq: intel_pstate: Set P-state upfront in performance mode

* pm-sleep-fixes:
PM / suspend: Fix missing KERN_CONT for suspend message

show more ...


# 1d33369d 16-Oct-2016 Ingo Molnar <mingo@kernel.org>

Merge tag 'v4.9-rc1' into x86/urgent, to pick up updates

Signed-off-by: Ingo Molnar <mingo@kernel.org>


Revision tags: v4.4.24, v4.7.7
# 687ee0ad 05-Oct-2016 Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next

Pull networking updates from David Miller:

1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and
co. at Google

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next

Pull networking updates from David Miller:

1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and
co. at Google. https://lwn.net/Articles/701165/

2) Do TCP Small Queues for retransmits, from Eric Dumazet.

3) Support collect_md mode for all IPV4 and IPV6 tunnels, from Alexei
Starovoitov.

4) Allow cls_flower to classify packets in ip tunnels, from Amir Vadai.

5) Support DSA tagging in older mv88e6xxx switches, from Andrew Lunn.

6) Support GMAC protocol in iwlwifi mwm, from Ayala Beker.

7) Support ndo_poll_controller in mlx5, from Calvin Owens.

8) Move VRF processing to an output hook and allow l3mdev to be
loopback, from David Ahern.

9) Support SOCK_DESTROY for UDP sockets. Also from David Ahern.

10) Congestion control in RXRPC, from David Howells.

11) Support geneve RX offload in ixgbe, from Emil Tantilov.

12) When hitting pressure for new incoming TCP data SKBs, perform a
partial rathern than a full purge of the OFO queue (which could be
huge). From Eric Dumazet.

13) Convert XFRM state and policy lookups to RCU, from Florian Westphal.

14) Support RX network flow classification to igb, from Gangfeng Huang.

15) Hardware offloading of eBPF in nfp driver, from Jakub Kicinski.

16) New skbmod packet action, from Jamal Hadi Salim.

17) Remove some inefficiencies in snmp proc output, from Jia He.

18) Add FIB notifications to properly propagate route changes to
hardware which is doing forwarding offloading. From Jiri Pirko.

19) New dsa driver for qca8xxx chips, from John Crispin.

20) Implement RFC7559 ipv6 router solicitation backoff, from Maciej
Żenczykowski.

21) Add L3 mode to ipvlan, from Mahesh Bandewar.

22) Support 802.1ad in mlx4, from Moshe Shemesh.

23) Support hardware LRO in mediatek driver, from Nelson Chang.

24) Add TC offloading to mlx5, from Or Gerlitz.

25) Convert various drivers to ethtool ksettings interfaces, from
Philippe Reynes.

26) TX max rate limiting for cxgb4, from Rahul Lakkireddy.

27) NAPI support for ath10k, from Rajkumar Manoharan.

28) Support XDP in mlx5, from Rana Shahout and Saeed Mahameed.

29) UDP replicast support in TIPC, from Richard Alpe.

30) Per-queue statistics for qed driver, from Sudarsana Reddy Kalluru.

31) Support BQL in thunderx driver, from Sunil Goutham.

32) TSO support in alx driver, from Tobias Regnery.

33) Add stream parser engine and use it in kcm.

34) Support async DHCP replies in ipconfig module, from Uwe
Kleine-König.

35) DSA port fast aging for mv88e6xxx driver, from Vivien Didelot.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1715 commits)
mlxsw: switchx2: Fix misuse of hard_header_len
mlxsw: spectrum: Fix misuse of hard_header_len
net/faraday: Stop NCSI device on shutdown
net/ncsi: Introduce ncsi_stop_dev()
net/ncsi: Rework the channel monitoring
net/ncsi: Allow to extend NCSI request properties
net/ncsi: Rework request index allocation
net/ncsi: Don't probe on the reserved channel ID (0x1f)
net/ncsi: Introduce NCSI_RESERVED_CHANNEL
net/ncsi: Avoid unused-value build warning from ia64-linux-gcc
net: Add netdev all_adj_list refcnt propagation to fix panic
net: phy: Add Edge-rate driver for Microsemi PHYs.
vmxnet3: Wake queue from reset work
i40e: avoid NULL pointer dereference and recursive errors on early PCI error
qed: Add RoCE ll2 & GSI support
qed: Add support for memory registeration verbs
qed: Add support for QP verbs
qed: PD,PKEY and CQ verb support
qed: Add support for RoCE hw init
qede: Add qedr framework
...

show more ...


Revision tags: v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4
# 16217dc7 14-Sep-2016 Thomas Gleixner <tglx@linutronix.de>

Merge tag 'irqchip-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Merge the first drop of irqchip updates for 4.9 from Marc Zyngier:

- ACPI IORT core code
-

Merge tag 'irqchip-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Merge the first drop of irqchip updates for 4.9 from Marc Zyngier:

- ACPI IORT core code
- IORT support for the GICv3 ITS
- A few of GIC cleanups

show more ...


Revision tags: v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1
# 48433419 17-Aug-2016 David S. Miller <davem@davemloft.net>

Merge branch 'strparser'

Tom Herbert says:

====================
strp: Stream parser for messages

This patch set introduces a utility for parsing application layer
protocol messages in a TCP stream

Merge branch 'strparser'

Tom Herbert says:

====================
strp: Stream parser for messages

This patch set introduces a utility for parsing application layer
protocol messages in a TCP stream. This is a generalization of the
mechanism implemented of Kernel Connection Multiplexor.

This patch set adapts KCM to use the strparser. We expect that kTLS
can use this mechanism also. RDS would probably be another candidate
to use a common stream parsing mechanism.

The API includes a context structure, a set of callbacks, utility
functions, and a data ready function. The callbacks include
a parse_msg function that is called to perform parsing (e.g.
BPF parsing in case of KCM), and a rcv_msg function that is called
when a full message has been completed.

For strparser we specify the return codes from the parser to allow
the backend to indicate that control of the socket should be
transferred back to userspace to handle some exceptions in the
stream: The return values are:

>0 : indicates length of successfully parsed message
0 : indicates more data must be received to parse the message
-ESTRPIPE : current message should not be processed by the
kernel, return control of the socket to userspace which
can proceed to read the messages itself
other < 0 : Error is parsing, give control back to userspace
assuming that synchronization is lost and the stream
is unrecoverable (application expected to close TCP socket)

There is one issue I haven't been able to fully resolve. If parse_msg
returns ESTRPIPE (wants control back to userspace) the parser may
already have consumed some bytes of the message. There is no way to
put bytes back into the TCP receive queue and tcp_read_sock does not
allow an easy way to peek messages. In lieu of a better solution, we
return ENODATA on the socket to indicate that the data stream is
unrecoverable (application needs to close socket). This condition
should only happen if an application layer message header is split
across two skbuffs and parsing just the first skbuff wasn't sufficient
to determine the that transfer to userspace is needed.

This patch set contains:

- strparser implementation
- changes to kcm to use strparser
- strparser.txt documentation

v2:
- Add copyright notice to C files
- Remove GPL module license from strparser.c
- Add report of rxpause

v3:
- Restore GPL module license
- Use EXPORT_SYMBOL_GPL

v4:
- Removed unused function, changed another to be static as suggested
by davem
- Rewoked data_ready to be called from upper layer, no longer requires
taking over socket data_ready callback as suggested by Lance Chao

Tested:
- Ran a KCM thrash test for 24 hours. No behavioral or performance
differences observed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


Revision tags: v4.7.1, v4.4.18
# 43a0c675 15-Aug-2016 Tom Herbert <tom@herbertland.com>

strparser: Stream parser for messages

This patch introduces a utility for parsing application layer protocol
messages in a TCP stream. This is a generalization of the mechanism
implemented of Kernel

strparser: Stream parser for messages

This patch introduces a utility for parsing application layer protocol
messages in a TCP stream. This is a generalization of the mechanism
implemented of Kernel Connection Multiplexor.

The API includes a context structure, a set of callbacks, utility
functions, and a data ready function.

A stream parser instance is defined by a strparse structure that
is bound to a TCP socket. The function to initialize the structure
is:

int strp_init(struct strparser *strp, struct sock *csk,
struct strp_callbacks *cb);

csk is the TCP socket being bound to and cb are the parser callbacks.

The upper layer calls strp_tcp_data_ready when data is ready on the lower
socket for strparser to process. This should be called from a data_ready
callback that is set on the socket:

void strp_tcp_data_ready(struct strparser *strp);

A parser is bound to a TCP socket by setting data_ready function to
strp_tcp_data_ready so that all receive indications on the socket
go through the parser. This is assumes that sk_user_data is set to
the strparser structure.

There are four callbacks.
- parse_msg is called to parse the message (returns length or error).
- rcv_msg is called when a complete message has been received
- read_sock_done is called when data_ready function exits
- abort_parser is called to abort the parser

The input to parse_msg is an skbuff which contains next message under
construction. The backend processing of parse_msg will parse the
application layer protocol headers to determine the length of
the message in the stream. The possible return values are:

>0 : indicates length of successfully parsed message
0 : indicates more data must be received to parse the message
-ESTRPIPE : current message should not be processed by the
kernel, return control of the socket to userspace which
can proceed to read the messages itself
other < 0 : Error is parsing, give control back to userspace
assuming that synchronzation is lost and the stream
is unrecoverable (application expected to close TCP socket)

In the case of error return (< 0) strparse will stop the parser
and report and error to userspace. The application must deal
with the error. To handle the error the strparser is unbound
from the TCP socket. If the error indicates that the stream
TCP socket is at recoverable point (ESTRPIPE) then the application
can read the TCP socket to process the stream. Once the application
has dealt with the exceptions in the stream, it may again bind the
socket to a strparser to continue data operations.

Note that ENODATA may be returned to the application. In this case
parse_msg returned -ESTRPIPE, however strparser was unable to maintain
synchronization of the stream (i.e. some of the message in question
was already read by the parser).

strp_pause and strp_unpause are used to provide flow control. For
instance, if rcv_msg is called but the upper layer can't immediately
consume the message it can hold the message and pause strparser.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# cc926387 15-Aug-2016 Daniel Vetter <daniel.vetter@ffwll.ch>

Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued

Backmerge because too many conflicts, and also we need to get at the
latest struct fence patches from Gustavo. Requested by

Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued

Backmerge because too many conflicts, and also we need to get at the
latest struct fence patches from Gustavo. Requested by Chris Wilson.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

show more ...


# a2071cd7 10-Aug-2016 Ingo Molnar <mingo@kernel.org>

Merge branch 'linus' into locking/urgent, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>


Revision tags: v4.4.17, openbmc-4.4-20160804-1
# 468fc7ed 27-Jul-2016 Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next

Pull networking updates from David Miller:

1) Unified UDP encapsulation offload methods for drivers, from
Alexander Duyck.

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next

Pull networking updates from David Miller:

1) Unified UDP encapsulation offload methods for drivers, from
Alexander Duyck.

2) Make DSA binding more sane, from Andrew Lunn.

3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
packets as soon as the device sees them, with the option to mirror
the packet on TX via the same interface. From Brenden Blanco and
others.

6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

8) Simplify netlink conntrack entry layout, from Florian Westphal.

9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
Schimmel, Yotam Gigi, and Jiri Pirko.

10) Add SKB array infrastructure and convert tun and macvtap over to it.
From Michael S Tsirkin and Jason Wang.

11) Support qdisc packet injection in pktgen, from John Fastabend.

12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

13) Add NV congestion control support to TCP, from Lawrence Brakmo.

14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

16) Support MPLS over IPV4, from Simon Horman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
xgene: Fix build warning with ACPI disabled.
be2net: perform temperature query in adapter regardless of its interface state
l2tp: Correctly return -EBADF from pppol2tp_getname.
net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
net: ipmr/ip6mr: update lastuse on entry change
macsec: ensure rx_sa is set when validation is disabled
tipc: dump monitor attributes
tipc: add a function to get the bearer name
tipc: get monitor threshold for the cluster
tipc: make cluster size threshold for monitoring configurable
tipc: introduce constants for tipc address validation
net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
MAINTAINERS: xgene: Add driver and documentation path
Documentation: dtb: xgene: Add MDIO node
dtb: xgene: Add MDIO node
drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
drivers: net: xgene: Use exported functions
drivers: net: xgene: Enable MDIO driver
drivers: net: xgene: Add backward compatibility
drivers: net: phy: xgene: Add MDIO driver
...

show more ...


Revision tags: v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1
# ddbcb794 19-Jul-2016 David S. Miller <davem@davemloft.net>

Merge branch 'ncsi'

Gavin Shan says:

====================
NCSI Support

This series rebases on David's linux-net git repo ("master" branch). It's
to support NCSI stack on drivers/net/ethernet/farad

Merge branch 'ncsi'

Gavin Shan says:

====================
NCSI Support

This series rebases on David's linux-net git repo ("master" branch). It's
to support NCSI stack on drivers/net/ethernet/faraday/ftgmac100.c. The
implementation is based on NCSI spec (version: 1.1.0):
https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf

As the following figure shows and defined in NCSI spec:

* The NC-SI (aka NCSI) is defined as the interface between a (Base)
Management Controller (BMC) and one or multiple Network Interface
Controlers (NIC) on host side. The interface is responsible for providing
external network connectivity for BMC.
* Each BMC can connect to multiple packages, up to 8. Each package can have
multiple channels, up to 32. Every package and channel are identified by
3-bits and 5-bits in NCSI packet.
* NCSI packet, encapsulated in ethernet frame, has 0x88F8 in the protocol
field. The destination MAC address should be 0xFF's while the source MAC
address can be arbitrary one.
* NCSI packets are classified to command, response, AEN (Asynchronous Event Notification).
Commands are sent from BMC to host (NIC) for configuration and
information retrival. Responses, corresponding to commands, are sent from
host to BMC for confirmation and requested information. One command should
have one and only one response. AEN is sent from host to BMC for notification
(e.g. link down on active channel) so that BMC can take appropriate action.

+------------------+ +----------------------------------------------+
| | | Host |
| BMC | | |
| | | +-------------------+ +-------------------+ |
| +---------+ | | | Package-A | | Package-B | |
| | | | | +---------+---------+ +-------------------+ |
| |ftgmac100| | | | Channel | Channel | | Channel | Channel | |
+----+----+----+---+ +-+---------+---------+--+---------+---------+-+
| | |
| | |
+-----------------------------+----------------------+

The series of patches is highlighted as:

The design for the patchset is highlighted as below:

* The network driver uses 3 interfaces exported from NCSI stack:
ncsi_register_dev() - Register (create) a associated NCSI device.
ncsi_start_dev() - Bring up the NCSI device.
ncsi_unregister_dev() - Destroy the registered NCSI device.
* There are several data structures introduced for different objects:
struct ncsi_dev - NCSI device seen by network device driver.
struct ncsi_dev_priv - NCSI device seen by NCSI stack.
struct ncsi_package - NCSI package which can have multiple channels.
struct ncsi_channel - NCSI channel.
* The NCSI stack is driven by workqueue and state machine internally.
* The all available NCSI packages and channels are enumerated (probed) on
the first call to ncsi_start_dev(). The NCSI topology won't change until
the NCSI device is destroyed.
* All available channels will be brought up When the hardware arbitration
is enabled. Otherwise, only one channel is selected as active one. The
NCSI internal is driven by state machine with help of a workqueue. In
the meanwhile, there are 3 states for each channel which can be put into
a queue requesting for configuration or suspending. Channels in the queue
with inactive state set will be configured (bringup) while channels in
the queue with active state will be suspended (teardown). The request
configuration or suspending is being applied on the channel if it's in
invisible state.
* Failover, another inactive channel is selected as active, can happen when
the hardware arbitration is disabled. The failover can be caused by timeout
on link monitor and AEN.
* NCSI stack should be configurable through netlink or another mechanism, it's
not implemented in this patchset. It's something TBD.
* The first NIC driver that is aware of NCSI: drivers/net/ethernet/faraday/ftgmac100.c

Changelog
=========
v2 -> v3:
* Include (one line) change in include/uapi/linux/if_ether.h to fix build
error.
v1 -> v2:
* Support NCSI spec v1.1.0 (3 more commands and 4 hardware arbitration
modes added).
* Enable AEN packets according to the supported list.
* Introduce NCSI channel states and processing queue in order to support
the hardware arbitration.
* The hardware arbitration is supported (tested with emulated environment).
* Introduce link monitor with GLS (Get Link Status) command/response as part
of the error handling defined in NCSI spec.
* Support IPv6 address discovery when CONFIG_IPV6 is enabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


1...<<11121314151617181920>>...35