#
36f87780 |
| 24-Jan-2017 |
David S. Miller <davem@davemloft.net> |
Merge branch 'packet-sampling-offload'
Jiri Pirko says:
==================== Add support for offloading packet-sampling
Yotam says:
The first patch introduces the psample module, a netlink channe
Merge branch 'packet-sampling-offload'
Jiri Pirko says:
==================== Add support for offloading packet-sampling
Yotam says:
The first patch introduces the psample module, a netlink channel dedicated to packet sampling implemented using generic netlink. This module provides a generic way for kernel modules to sample packets, while not being tied to any specific subsystem like NFLOG.
The second patch adds the sample tc action, which uses psample to randomly sample packets that match a classifier. The user can configure the psample group number, the sampling rate and the packet's truncation (to save kernel-user traffic).
The last two patches add the support for offloading the matchall-sample tc command in the mlxsw driver, for ingress qdiscs.
An example for psample usage can be found in the libpsample project at: https://github.com/Mellanox/libpsample
v1->v2: - Reword first patch's commit message - Fix typo in comment in second patch - Change order of tc_sample uapi enum to match convention - Rename act_sample action callback tcf_sample -> tcf_sample_act ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
6ae0a628 |
| 23-Jan-2017 |
Yotam Gigi <yotamg@mellanox.com> |
net: Introduce psample, a new genetlink channel for packet sampling
Add a general way for kernel modules to sample packets, without being tied to any specific subsystem. This netlink channel can be
net: Introduce psample, a new genetlink channel for packet sampling
Add a general way for kernel modules to sample packets, without being tied to any specific subsystem. This netlink channel can be used by tc, iptables, etc. and allow to standardize packet sampling in the kernel.
For every sampled packet, the psample module adds the following metadata fields:
PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable
PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable
PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been truncated during sampling
PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the user who initiated the sampling. This field allows the user to differentiate between several samplers working simultaneously and filter packets relevant to him
PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The sequence is kept for each group
PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets
PSAMPLE_ATTR_DATA - the actual packet bits
The sampled packets are sent to the PSAMPLE_NL_MCGRP_SAMPLE multicast group. In addition, add the GET_GROUPS netlink command which allows the user to see the current sample groups, their refcount and sequence number. This command currently supports only netlink dump mode.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
62ed8ced |
| 24-Jan-2017 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v4.10-rc5' into for-linus
Sync up with mainline to apply fixup to a commit that came through power supply tree.
|
#
f3a3e248 |
| 09-Jan-2017 |
David S. Miller <davem@davemloft.net> |
Merge branch 'net-smc'
Ursula Braun says:
==================== net/smc: Shared Memory Communications - RDMA
here is now V4 of the SMC-R patches having processed your feedback from end of November.
Merge branch 'net-smc'
Ursula Braun says:
==================== net/smc: Shared Memory Communications - RDMA
here is now V4 of the SMC-R patches having processed your feedback from end of November. The most important change is the replacement of sysfs by a generic netlink solution in patch 04. And I tried to get rid of the __packed attributes. There are still a few usages left due to SMC-R protocol defined structures.
V4 changes: The order of patches 03 and 04 for pnet table management and SMC IB-client establishing has been exchanged, since pnet table management is now built on top of smc_ib_devices. Patch 01: Use EXPORT_SYMBOL_GPL(). Patch 02: Define "use_fallback" as bool. Get rid of useless smc_sock fields clearing in smc_sock_alloc(), since sk_alloc() clears out the memory. Patch 03: Postpone smc_ib_remember_port_attr() call till ib_device is mentioned in the pnet table. Patch 04: Replace sysfs-usage by a generic netlink approach for pnet table configuration. Change layout of pnet table entries to reference net_device and ib_device instead of dealing with names of net_devices and ib_devices. Patch 05: Adapt "use_fallback" usages to new type bool. Get rid of useless smc_sock fields clearing in smc_sock_alloc() Avoid __packed where possible. Check if clc responses are not too big. Patch 09: Postpone smc_setup_per_ibdev till the first connection with this ib_device is really created. Patch 11: Get rid of __packed usage.
V3 changes: Patch 05: Remove unneeded DEFINE_WAIT Patch 06: Improve synchronization of link group creation Patch 07: Rename peer_rmbe_len into peer_rmbe_size to be more consistent Patch 09: Avoid calls of ib_get_memory_region with IB_ACCESS_LOCAL_WRITE, use new default local_dma_lkey from protection domain as lkey instead. Remove no longer needed function smc_ib_dereg_memory_region(). Patch 14: Switch to state ACTIVE only if still in state INIT. Return 0 for recvmsg invoked in a socket closing state. Allow getname call in state APPCLOSEWAIT1 Do not trigger destruction of a socket-in-error queued in accept queue. During cleanup of accept queue, make sure sockets are destructed, and sockets in fallback mode are handled appropriately. When freeing sndbufs/rmbs, remove them from their list and free the entry. Use add_wait_queue() and remove_wait_queue() in close wait functions. If actively closing a socket in state for PEERFINCLOSEWAIT, keep this state. If passively closing a socket while bytes are to be received, move to state APPCLOSEWAIT1. If actively aborting a socket, skip sending the close_abort flag, since RDMA communication is no longer possible. When terminating a link group, do not schedule link group freeing a 2nd time, since already done when unregistering the last remaining connection. Patch 15: Introduce smc_diag module for monitoring SMC protocol sockets. This replaces the old patch 0015 dealing with procfs.
V2 changes: Patch 0002: Add SMC versions for family key strings in net/core/sock.c. Patch 0006: initialize rb_tree. Patch 0007: Get rid of unneeded use of xchg() in smc_sndbuf_unuse() and smc_rmb_unuse(). Patch 0008: Correct error checking logic for ib_function calls. Define struct smc_link field wr_tx_id as atomic_long_t. Use "do_div" instead of "%" to be architecture-independent. Patch 0009: Correct error checking logic for ib_function calls. Patch 0011: Remove xchg() calls in cursor handling. Use atomic64_t for cursor overlays on 64-bit architectures. If not available, use plain u64 and add locking for cursor reading and writing. Implement smc_curs_add() without modulo operator "%". Patch 0012: Remove xchg() calls in cursor handling. Implement smc_tx_rdma_writes() without module operator "%". Patch 0013: Remove xchg() calls in cursor handling. Patch 0014: Return type bool in smc_wr_tx_has_pending(). Remove unneeded semicolon in smc_close_shutdown_write(). Call smc_close_active() in non-fallback case only. Get rid of duplicate schedule of sock_put_work(). Take nested sock_lock in smc_listen_work(). Start close stream_wait in case of prepared sends only. Patch 0015: Remove unneeded socket ref_count in smc_proc_seq_show(). Take lock before list_empty check in smc_proc_sock_list_del().
These patches are the initial part of the implementation of the "Shared Memory Communications-RDMA" (SMC-R) protocol as defined in RFC7609 [1]. While SMC-R does not aim to replace TCP, it taps a wealth of existing data center TCP socket applications to become more efficient without the need for rewriting them. SMC-R uses RDMA over Converged Ethernet (RoCE) to save CPU consumption. For instance, when running 10 parallel connections with uperf, we measured a decrease of 60% in CPU consumption with SMC-R compared to TCP/IP (with throughput and latency comparable; measured on x86_64 with the same RoCE card and port).
SMC-R does not require an RDMA communication manager (RDMA CM).
SMC-R inherits TCP qualities such as reliable connections, host-based firewall packet filtering (on connection establishment) and unmodified application of communication encryption such as TLS (transport layer security) or SSL (secure sockets layer). Since original TCP is used to establish SMC-R connections, load balancers and packet inspection based on TCP/IP connection establishment continue to work for SMC-R.
On the other hand, using SMC-R implies: - either involving a preload library when invoking the unchanged TCP-application or slightly modifying the source by simply changing the socket family in the socket() call - accepting extra overhead and latency in connection establishment due to SMC Connection Layer Control (CLC) handshake - explicit coupling of RoCE ports with Ethernet ports - not routable as currently built on RoCE V1 - bypassing of packet-based networking features - filtering (netfilter) - sniffing (libpcap, packet sockets, (E)BPF) - traffic control (scheduling, shaping) - bypassing of IP-header based socket options - bypassing of memory buffer (pressure) management - unusable together with IPsec
Overview of the SMC-R Protocol described in informational RFC 7609
SMC-R is an open protocol that provides RDMA capabilities over RoCE transparently for applications exploiting TCP sockets. A new socket protocol family PF_SMC is introduced. There are no changes required to applications using the sockets API for TCP stream sockets other than the specification of the new socket family AF_SMC. Unmodified applications can be used by means of a dynamic preload shared library which rewrites the socket API call socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) into socket(AF_SMC, SOCK_STREAM, IPPROTO_TCP). SMC-R re-uses the address family AF_INET for all addressing purposes around struct sockaddr.
SMC-R system architecture layers:
+=============================================================================+ | | unmodified TCP application | | native SMC application +--------------------------------------+ | | dynamic preload shared library | +=============================================================================+ | SMC socket | +-----------------------------------------------------------------------------+ | | TCP socket (for connection establishment and fallback) | | IB verbs +--------------------------------------------------------+ | | IP | +--------------------+--------------------------------------------------------+ | RoCE device driver | some network device driver | +=============================================================================+
Terms:
A link group is determined by an ordered peer pair of TCP client and TCP server (IP addresses and subnet). Reversed client server roles cause an own link group. A link is a logical point-to-point connection based on an infiniband reliable connected queue pair (RC-QP) between two RoCE ports (MACs and GIDs) of a peer pair. A link group can have 1..8 links for failover and load balancing. This initial Linux implementation always has 1 link per link group. Each link group on a peer can have 1..255 remote memory buffers (RMBs). If more RMBs are needed, a peer can open another link group (this initial Linux implementation) or fall back to TCP. Each RMB has its own particular size and its own (R)DMA mapping and credentials (rtoken consisting of rkey and RDMA "virtual address"). This initial Linux implementation uses physically contiguous memory for RMBs but we are working towards scattered memory because of memory fragmentation. Each RMB has 1..255 RMB elements (RMBEs) of equal size to provide multiplexing of connections within an RMB. An RMBE is the RDMA Write destination organized as wrapping ring buffer for data transmit of a particular connection in one direction (duplex by means of mirror symmetry as with TCP). This initial Linux implementation always has 1 RMBE per RMB and thus an individual RMB for each connection.
SMC-R connection establishment with subsequent data transfer:
CLIENT SERVER
TCP three-way handshake: regular TCP SYN --------------------------------------------------------> regular TCP SYN ACK <-------------------------------------------------------- regular TCP ACK -------------------------------------------------------->
SMC Connection Layer Control (CLC) handshake exchanges RDMA credentials between peers: via above TCP connection: SMC CLC Proposal --------------------------------------------------------> via above TCP connection: SMC CLC Accept <-------------------------------------------------------- via above TCP connection: SMC CLC Confirm -------------------------------------------------------->
SMC Link Layer Control (LLC) (only once per link, i.e. 1st conn. of link group): RoCE RC-QP: SMC LLC Confirm Link <======================================================== RoCE RC-QP: SMC LLC Confirm Link response ========================================================>
SMC data transmission (incl. SMC Connection Data Control (CDC) message): RoCE RC-QP: RDMA Write ========================================================> RoCE RC-QP: SMC CDC message (flow control) ========================================================> ...
RoCE RC-QP: RDMA Write <======================================================== RoCE RC-QP: SMC CDC message (flow control) <======================================================== ...
Data flow within an established connection:
+---------------------------------------------------------------------------- | SENDER | sendmsg() | | | | produces into sndbuf [sender's process context] | v | +--------+ | | sndbuf | [ring buffer] | +--------+ | | | | consumes from sndbuf and produces into receiver's RMBE [any context] | | by sending RDMA Write followed by SMC CDC message over RoCE RC-QP | | +----|----------------------------------------------------------------------- | +----|----------------------------------------------------------------------- | v RECEIVER | +------+ | | RMBE | [ring buffer, can have size different from sender's sndbuf] | | | [RMBE represents rcvbuf, no further de-coupling as on sender side] | +------+ | | | | consumes from RMBE [receiver's process context] | v | recvmsg() +----------------------------------------------------------------------------
Flow control ("cursor" updates) by means of SMC CDC messages:
SENDER RECEIVER
sends updates via CDC-------------+ sends updates via CDC on consuming from sndbuf | on consuming from RMBE and producing into RMBE | by means of recvmsg() | | | | +-----------------------------------|------------+ | | +--v-------------------------+ +--v-----------------------+ | receiver's consumer cursor | | sender's producer cursor----+ +----------------|-----------+ +--------------------------+ | | | | receiver's RMBE | | +--------------------------+ | | | | | +--------------------------------+ | | | | | | | v | | | +------------| | |-------------+////////////| | |//RDMA data written by////| | |////sender that is////////| | |/available to be consumed/| | |///////// +---------------| | |----------+^ | | | | | | | +-----------------+ | | +--------------------------+
Sending updates of the producer cursor is immediate for low latency; something like Nagle's algorithm (absence of TCP_NODELAY) is optional and currently not part of this initial Linux implementation. Sending updates of the consumer cursor is conditional to avoid the silly window syndrome.
Normal connection termination:
Normal connection termination starts transitioning from socket state ACTIVE via either "Active Close" or "Passive Close".
shutdown rdwr +-----------------+ or close, +-------------->| INIT / CLOSED |<-------------+ send PeerCon|nClosed +-----------------+ | PeerConnClosed | | | received | connection | established | | V | +----------------+ +-----------------+ +----------------+ |AppFinCloseWait | | ACTIVE | |PeerFinCloseWait| +----------------+ +-----------------+ +----------------+ | | | | | Active Close: | |Passive Close: | | close or | |PeerConnClosed or | | shutdown wr or| |PeerDoneWriting | | shutdown rdwr | |received |
| V V | PeerConnClo|sed +--------------+ +-------------+ | close or received +--<----|PeerCloseWait1| |AppCloseWait1|--->----+ shutdown rdwr, | +--------------+ +-------------+ | send | PeerDoneWri|ting | shutdown wr, | PeerConnClosed | received | send Pee|rDoneWriting | | V V | | +--------------+ +-------------+ | +--<----|PeerCloseWait2| |AppCloseWait2|--->----+ +--------------+ +-------------+
In state CLOSED, the socket can be destructed only, once the application has issued a close().
Abnormal connection termination:
+-----------------+ +-------------->| INIT / CLOSED |<-------------+ | +-----------------+ | | | | +-----------------------+ | | | Any state | | PeerConnAbo|rt | (before setting | | send received | | PeerConnClosed | | PeerConnAbort | | indicator in | | | | peer's RMBE) | | | +-----------------------+ | | | | | | Active Abort: | | Passive Abort: | | problem, | | PeerConnAbort | | send | | received, | | PeerConnAbort,| | ECONNRESET | | ECONNABORTED | | | | V V | | +--------------+ +--------------+ | +-------|PeerAbortWait | | ProcessAbort |------+ +--------------+ +--------------+
Implementation notes beyond RFC 7609:
A PNET table in sysfs provides the mapping between network device names and RoCE Infiniband device names for the transparent switch of data communication. A PNET table can contain an arbitrary number of PNETIDs. Each PNETID contains exactly one (Ethernet) network device name and one or more RoCE Infiniband device names. Each device name can only exist in at most one PNETID (no overlapping). This initial Linux implementation allows at most one RoCE Infiniband device name per PNETID. After a new TCP connection is established, the network device name used for egress traffic with the TCP connection's local source IP address is used as key to lookup the unique PNETID, and the RoCE Infiniband device of this PNETID is used to switch data communication from TCP to RDMA during SMC CLC handshake.
Problem determination:
A protocol dissector is available with upstream wireshark for formatting SMC-R related RoCE LAN traffic. [https://code.wireshark.org/review/gitweb?p=wireshark.git;a=blob;f=epan/dissectors/packet-smcr.c]
We are working on enhancing the Linux implementation to cover:
- Improve default socket closing asynchronicity - Address corner cases with many parallel connections - Tracing - Integrated load balancing and fail-over within a link group - Splice and sendpage support - IPv6 addressing support - Keepalive, Cork - Namespaces support - Urgent data - More socket options - Diagnostics - Statistics support - SNMP support
References:
[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609 ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ac713874 |
| 09-Jan-2017 |
Ursula Braun <ubraun@linux.vnet.ibm.com> |
smc: establish new socket family
* enable smc module loading and unloading * register new socket family * basic smc socket creation and deletion * use backing TCP socket to run CLC (Connection La
smc: establish new socket family
* enable smc module loading and unloading * register new socket family * basic smc socket creation and deletion * use backing TCP socket to run CLC (Connection Layer Control) handshake of SMC protocol * Setup for infiniband traffic is implemented in follow-on patches. For now fallback to TCP socket is always used.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f26e8817 |
| 16-Dec-2016 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 4.10 merge window.
|
Revision tags: v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31 |
|
#
712cba5d |
| 07-Nov-2016 |
Max Filippov <jcmvbkbc@gmail.com> |
Merge tag 'v4.9-rc3' into xtensa-for-next
Linux 4.9-rc3
|
#
cc9b9402 |
| 04-Nov-2016 |
Mark Brown <broonie@kernel.org> |
Merge branch 'topic/error' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into regulator-fixed
|
#
9902aa47 |
| 01-Nov-2016 |
Russell King <rmk+kernel@armlinux.org.uk> |
Merge branch 'drm-tda998x-mali' into drm-tda998x-devel
|
Revision tags: v4.4.30, v4.4.29 |
|
#
fe0f59c4 |
| 30-Oct-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
Merge back earlier cpufreq material for v4.10.
|
Revision tags: v4.4.28 |
|
#
0fc4f78f |
| 25-Oct-2016 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
Merge remote-tracking branch 'airlied/drm-next' into topic/drm-misc
Backmerge latest drm-next to have a baseline for the s/fence/dma_fence/ patch from Chris.
Signed-off-by: Daniel Vetter <daniel.ve
Merge remote-tracking branch 'airlied/drm-next' into topic/drm-misc
Backmerge latest drm-next to have a baseline for the s/fence/dma_fence/ patch from Chris.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
show more ...
|
#
f9bf1d97 |
| 25-Oct-2016 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge because Chris Wilson needs the very latest&greates of Gustavo Padovan's sync_file work, specifically the refcount
Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge because Chris Wilson needs the very latest&greates of Gustavo Padovan's sync_file work, specifically the refcounting changes from:
commit 30cd85dd6edc86ea8d8589efb813f1fad41ef233 Author: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Date: Wed Oct 19 15:48:32 2016 -0200
dma-buf/sync_file: hold reference to fence when creating sync_file
Also good to sync in general since git tends to get confused with the cherry-picking going on.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
show more ...
|
Revision tags: v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26 |
|
#
aea98380 |
| 17-Oct-2016 |
Mauro Carvalho Chehab <mchehab@s-opensource.com> |
Merge tag 'v4.9-rc1' into patchwork
Linux 4.9-rc1
* tag 'v4.9-rc1': (13774 commits) Linux 4.9-rc1 score: traps: Add missing include file to fix build error fs/super.c: don't fool lockdep in f
Merge tag 'v4.9-rc1' into patchwork
Linux 4.9-rc1
* tag 'v4.9-rc1': (13774 commits) Linux 4.9-rc1 score: traps: Add missing include file to fix build error fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths fs/super.c: fix race between freeze_super() and thaw_super() overlayfs: Fix setting IOP_XATTR flag iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector() CIFS: Retrieve uid and gid from special sid if enabled CIFS: Add new mount option to set owner uid and gid from special sids in acl qedr: Add events support and register IB device qedr: Add GSI support qedr: Add LL2 RoCE interface qedr: Add support for data path qedr: Add support for memory registeration verbs qedr: Add support for QP verbs qedr: Add support for PD,PKEY and CQ verbs qedr: Add support for user context verbs qedr: Add support for RoCE HW init qedr: Add RoCE driver framework pkeys: Remove easily triggered WARN MIPS: Wire up new pkey_{mprotect,alloc,free} syscalls ...
show more ...
|
Revision tags: v4.7.8, v4.4.25 |
|
#
4d69f155 |
| 16-Oct-2016 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v4.9-rc1' into x86/fpu, to resolve conflict
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4a7126a2 |
| 13-Oct-2016 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v4.8' into next
Sync up with mainline to bring in I2C host notify changes and other updates.
|
#
8b2ada27 |
| 28-Oct-2016 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
Merge branches 'pm-cpufreq-fixes' and 'pm-sleep-fixes'
* pm-cpufreq-fixes: cpufreq: intel_pstate: Always set max P-state in performance mode cpufreq: intel_pstate: Set P-state upfront in perform
Merge branches 'pm-cpufreq-fixes' and 'pm-sleep-fixes'
* pm-cpufreq-fixes: cpufreq: intel_pstate: Always set max P-state in performance mode cpufreq: intel_pstate: Set P-state upfront in performance mode
* pm-sleep-fixes: PM / suspend: Fix missing KERN_CONT for suspend message
show more ...
|
#
1d33369d |
| 16-Oct-2016 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v4.9-rc1' into x86/urgent, to pick up updates
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
Revision tags: v4.4.24, v4.7.7 |
|
#
687ee0ad |
| 05-Oct-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and co. at Google
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and co. at Google. https://lwn.net/Articles/701165/
2) Do TCP Small Queues for retransmits, from Eric Dumazet.
3) Support collect_md mode for all IPV4 and IPV6 tunnels, from Alexei Starovoitov.
4) Allow cls_flower to classify packets in ip tunnels, from Amir Vadai.
5) Support DSA tagging in older mv88e6xxx switches, from Andrew Lunn.
6) Support GMAC protocol in iwlwifi mwm, from Ayala Beker.
7) Support ndo_poll_controller in mlx5, from Calvin Owens.
8) Move VRF processing to an output hook and allow l3mdev to be loopback, from David Ahern.
9) Support SOCK_DESTROY for UDP sockets. Also from David Ahern.
10) Congestion control in RXRPC, from David Howells.
11) Support geneve RX offload in ixgbe, from Emil Tantilov.
12) When hitting pressure for new incoming TCP data SKBs, perform a partial rathern than a full purge of the OFO queue (which could be huge). From Eric Dumazet.
13) Convert XFRM state and policy lookups to RCU, from Florian Westphal.
14) Support RX network flow classification to igb, from Gangfeng Huang.
15) Hardware offloading of eBPF in nfp driver, from Jakub Kicinski.
16) New skbmod packet action, from Jamal Hadi Salim.
17) Remove some inefficiencies in snmp proc output, from Jia He.
18) Add FIB notifications to properly propagate route changes to hardware which is doing forwarding offloading. From Jiri Pirko.
19) New dsa driver for qca8xxx chips, from John Crispin.
20) Implement RFC7559 ipv6 router solicitation backoff, from Maciej Żenczykowski.
21) Add L3 mode to ipvlan, from Mahesh Bandewar.
22) Support 802.1ad in mlx4, from Moshe Shemesh.
23) Support hardware LRO in mediatek driver, from Nelson Chang.
24) Add TC offloading to mlx5, from Or Gerlitz.
25) Convert various drivers to ethtool ksettings interfaces, from Philippe Reynes.
26) TX max rate limiting for cxgb4, from Rahul Lakkireddy.
27) NAPI support for ath10k, from Rajkumar Manoharan.
28) Support XDP in mlx5, from Rana Shahout and Saeed Mahameed.
29) UDP replicast support in TIPC, from Richard Alpe.
30) Per-queue statistics for qed driver, from Sudarsana Reddy Kalluru.
31) Support BQL in thunderx driver, from Sunil Goutham.
32) TSO support in alx driver, from Tobias Regnery.
33) Add stream parser engine and use it in kcm.
34) Support async DHCP replies in ipconfig module, from Uwe Kleine-König.
35) DSA port fast aging for mv88e6xxx driver, from Vivien Didelot.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1715 commits) mlxsw: switchx2: Fix misuse of hard_header_len mlxsw: spectrum: Fix misuse of hard_header_len net/faraday: Stop NCSI device on shutdown net/ncsi: Introduce ncsi_stop_dev() net/ncsi: Rework the channel monitoring net/ncsi: Allow to extend NCSI request properties net/ncsi: Rework request index allocation net/ncsi: Don't probe on the reserved channel ID (0x1f) net/ncsi: Introduce NCSI_RESERVED_CHANNEL net/ncsi: Avoid unused-value build warning from ia64-linux-gcc net: Add netdev all_adj_list refcnt propagation to fix panic net: phy: Add Edge-rate driver for Microsemi PHYs. vmxnet3: Wake queue from reset work i40e: avoid NULL pointer dereference and recursive errors on early PCI error qed: Add RoCE ll2 & GSI support qed: Add support for memory registeration verbs qed: Add support for QP verbs qed: PD,PKEY and CQ verb support qed: Add support for RoCE hw init qede: Add qedr framework ...
show more ...
|
Revision tags: v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4 |
|
#
16217dc7 |
| 14-Sep-2016 |
Thomas Gleixner <tglx@linutronix.de> |
Merge tag 'irqchip-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Merge the first drop of irqchip updates for 4.9 from Marc Zyngier:
- ACPI IORT core code -
Merge tag 'irqchip-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Merge the first drop of irqchip updates for 4.9 from Marc Zyngier:
- ACPI IORT core code - IORT support for the GICv3 ITS - A few of GIC cleanups
show more ...
|
Revision tags: v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1 |
|
#
48433419 |
| 17-Aug-2016 |
David S. Miller <davem@davemloft.net> |
Merge branch 'strparser'
Tom Herbert says:
==================== strp: Stream parser for messages
This patch set introduces a utility for parsing application layer protocol messages in a TCP stream
Merge branch 'strparser'
Tom Herbert says:
==================== strp: Stream parser for messages
This patch set introduces a utility for parsing application layer protocol messages in a TCP stream. This is a generalization of the mechanism implemented of Kernel Connection Multiplexor.
This patch set adapts KCM to use the strparser. We expect that kTLS can use this mechanism also. RDS would probably be another candidate to use a common stream parsing mechanism.
The API includes a context structure, a set of callbacks, utility functions, and a data ready function. The callbacks include a parse_msg function that is called to perform parsing (e.g. BPF parsing in case of KCM), and a rcv_msg function that is called when a full message has been completed.
For strparser we specify the return codes from the parser to allow the backend to indicate that control of the socket should be transferred back to userspace to handle some exceptions in the stream: The return values are:
>0 : indicates length of successfully parsed message 0 : indicates more data must be received to parse the message -ESTRPIPE : current message should not be processed by the kernel, return control of the socket to userspace which can proceed to read the messages itself other < 0 : Error is parsing, give control back to userspace assuming that synchronization is lost and the stream is unrecoverable (application expected to close TCP socket)
There is one issue I haven't been able to fully resolve. If parse_msg returns ESTRPIPE (wants control back to userspace) the parser may already have consumed some bytes of the message. There is no way to put bytes back into the TCP receive queue and tcp_read_sock does not allow an easy way to peek messages. In lieu of a better solution, we return ENODATA on the socket to indicate that the data stream is unrecoverable (application needs to close socket). This condition should only happen if an application layer message header is split across two skbuffs and parsing just the first skbuff wasn't sufficient to determine the that transfer to userspace is needed.
This patch set contains:
- strparser implementation - changes to kcm to use strparser - strparser.txt documentation
v2: - Add copyright notice to C files - Remove GPL module license from strparser.c - Add report of rxpause
v3: - Restore GPL module license - Use EXPORT_SYMBOL_GPL
v4: - Removed unused function, changed another to be static as suggested by davem - Rewoked data_ready to be called from upper layer, no longer requires taking over socket data_ready callback as suggested by Lance Chao
Tested: - Ran a KCM thrash test for 24 hours. No behavioral or performance differences observed. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.7.1, v4.4.18 |
|
#
43a0c675 |
| 15-Aug-2016 |
Tom Herbert <tom@herbertland.com> |
strparser: Stream parser for messages
This patch introduces a utility for parsing application layer protocol messages in a TCP stream. This is a generalization of the mechanism implemented of Kernel
strparser: Stream parser for messages
This patch introduces a utility for parsing application layer protocol messages in a TCP stream. This is a generalization of the mechanism implemented of Kernel Connection Multiplexor.
The API includes a context structure, a set of callbacks, utility functions, and a data ready function.
A stream parser instance is defined by a strparse structure that is bound to a TCP socket. The function to initialize the structure is:
int strp_init(struct strparser *strp, struct sock *csk, struct strp_callbacks *cb);
csk is the TCP socket being bound to and cb are the parser callbacks.
The upper layer calls strp_tcp_data_ready when data is ready on the lower socket for strparser to process. This should be called from a data_ready callback that is set on the socket:
void strp_tcp_data_ready(struct strparser *strp);
A parser is bound to a TCP socket by setting data_ready function to strp_tcp_data_ready so that all receive indications on the socket go through the parser. This is assumes that sk_user_data is set to the strparser structure.
There are four callbacks. - parse_msg is called to parse the message (returns length or error). - rcv_msg is called when a complete message has been received - read_sock_done is called when data_ready function exits - abort_parser is called to abort the parser
The input to parse_msg is an skbuff which contains next message under construction. The backend processing of parse_msg will parse the application layer protocol headers to determine the length of the message in the stream. The possible return values are:
>0 : indicates length of successfully parsed message 0 : indicates more data must be received to parse the message -ESTRPIPE : current message should not be processed by the kernel, return control of the socket to userspace which can proceed to read the messages itself other < 0 : Error is parsing, give control back to userspace assuming that synchronzation is lost and the stream is unrecoverable (application expected to close TCP socket)
In the case of error return (< 0) strparse will stop the parser and report and error to userspace. The application must deal with the error. To handle the error the strparser is unbound from the TCP socket. If the error indicates that the stream TCP socket is at recoverable point (ESTRPIPE) then the application can read the TCP socket to process the stream. Once the application has dealt with the exceptions in the stream, it may again bind the socket to a strparser to continue data operations.
Note that ENODATA may be returned to the application. In this case parse_msg returned -ESTRPIPE, however strparser was unable to maintain synchronization of the stream (i.e. some of the message in question was already read by the parser).
strp_pause and strp_unpause are used to provide flow control. For instance, if rcv_msg is called but the upper layer can't immediately consume the message it can hold the message and pause strparser.
Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
cc926387 |
| 15-Aug-2016 |
Daniel Vetter <daniel.vetter@ffwll.ch> |
Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge because too many conflicts, and also we need to get at the latest struct fence patches from Gustavo. Requested by
Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Backmerge because too many conflicts, and also we need to get at the latest struct fence patches from Gustavo. Requested by Chris Wilson.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
show more ...
|
#
a2071cd7 |
| 10-Aug-2016 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into locking/urgent, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
Revision tags: v4.4.17, openbmc-4.4-20160804-1 |
|
#
468fc7ed |
| 27-Jul-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) Unified UDP encapsulation offload methods for drivers, from Alexander Duyck.
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) Unified UDP encapsulation offload methods for drivers, from Alexander Duyck.
2) Make DSA binding more sane, from Andrew Lunn.
3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.
4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.
5) Add XDP (eXpress Data Path), essentially running BPF programs on RX packets as soon as the device sees them, with the option to mirror the packet on TX via the same interface. From Brenden Blanco and others.
6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.
7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.
8) Simplify netlink conntrack entry layout, from Florian Westphal.
9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido Schimmel, Yotam Gigi, and Jiri Pirko.
10) Add SKB array infrastructure and convert tun and macvtap over to it. From Michael S Tsirkin and Jason Wang.
11) Support qdisc packet injection in pktgen, from John Fastabend.
12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.
13) Add NV congestion control support to TCP, from Lawrence Brakmo.
14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.
15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.
16) Support MPLS over IPV4, from Simon Horman.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits) xgene: Fix build warning with ACPI disabled. be2net: perform temperature query in adapter regardless of its interface state l2tp: Correctly return -EBADF from pppol2tp_getname. net/mlx5_core/health: Remove deprecated create_singlethread_workqueue net: ipmr/ip6mr: update lastuse on entry change macsec: ensure rx_sa is set when validation is disabled tipc: dump monitor attributes tipc: add a function to get the bearer name tipc: get monitor threshold for the cluster tipc: make cluster size threshold for monitoring configurable tipc: introduce constants for tipc address validation net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update() MAINTAINERS: xgene: Add driver and documentation path Documentation: dtb: xgene: Add MDIO node dtb: xgene: Add MDIO node drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset drivers: net: xgene: Use exported functions drivers: net: xgene: Enable MDIO driver drivers: net: xgene: Add backward compatibility drivers: net: phy: xgene: Add MDIO driver ...
show more ...
|
Revision tags: v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1 |
|
#
ddbcb794 |
| 19-Jul-2016 |
David S. Miller <davem@davemloft.net> |
Merge branch 'ncsi'
Gavin Shan says:
==================== NCSI Support
This series rebases on David's linux-net git repo ("master" branch). It's to support NCSI stack on drivers/net/ethernet/farad
Merge branch 'ncsi'
Gavin Shan says:
==================== NCSI Support
This series rebases on David's linux-net git repo ("master" branch). It's to support NCSI stack on drivers/net/ethernet/faraday/ftgmac100.c. The implementation is based on NCSI spec (version: 1.1.0): https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf
As the following figure shows and defined in NCSI spec:
* The NC-SI (aka NCSI) is defined as the interface between a (Base) Management Controller (BMC) and one or multiple Network Interface Controlers (NIC) on host side. The interface is responsible for providing external network connectivity for BMC. * Each BMC can connect to multiple packages, up to 8. Each package can have multiple channels, up to 32. Every package and channel are identified by 3-bits and 5-bits in NCSI packet. * NCSI packet, encapsulated in ethernet frame, has 0x88F8 in the protocol field. The destination MAC address should be 0xFF's while the source MAC address can be arbitrary one. * NCSI packets are classified to command, response, AEN (Asynchronous Event Notification). Commands are sent from BMC to host (NIC) for configuration and information retrival. Responses, corresponding to commands, are sent from host to BMC for confirmation and requested information. One command should have one and only one response. AEN is sent from host to BMC for notification (e.g. link down on active channel) so that BMC can take appropriate action.
+------------------+ +----------------------------------------------+ | | | Host | | BMC | | | | | | +-------------------+ +-------------------+ | | +---------+ | | | Package-A | | Package-B | | | | | | | +---------+---------+ +-------------------+ | | |ftgmac100| | | | Channel | Channel | | Channel | Channel | | +----+----+----+---+ +-+---------+---------+--+---------+---------+-+ | | | | | | +-----------------------------+----------------------+
The series of patches is highlighted as:
The design for the patchset is highlighted as below:
* The network driver uses 3 interfaces exported from NCSI stack: ncsi_register_dev() - Register (create) a associated NCSI device. ncsi_start_dev() - Bring up the NCSI device. ncsi_unregister_dev() - Destroy the registered NCSI device. * There are several data structures introduced for different objects: struct ncsi_dev - NCSI device seen by network device driver. struct ncsi_dev_priv - NCSI device seen by NCSI stack. struct ncsi_package - NCSI package which can have multiple channels. struct ncsi_channel - NCSI channel. * The NCSI stack is driven by workqueue and state machine internally. * The all available NCSI packages and channels are enumerated (probed) on the first call to ncsi_start_dev(). The NCSI topology won't change until the NCSI device is destroyed. * All available channels will be brought up When the hardware arbitration is enabled. Otherwise, only one channel is selected as active one. The NCSI internal is driven by state machine with help of a workqueue. In the meanwhile, there are 3 states for each channel which can be put into a queue requesting for configuration or suspending. Channels in the queue with inactive state set will be configured (bringup) while channels in the queue with active state will be suspended (teardown). The request configuration or suspending is being applied on the channel if it's in invisible state. * Failover, another inactive channel is selected as active, can happen when the hardware arbitration is disabled. The failover can be caused by timeout on link monitor and AEN. * NCSI stack should be configurable through netlink or another mechanism, it's not implemented in this patchset. It's something TBD. * The first NIC driver that is aware of NCSI: drivers/net/ethernet/faraday/ftgmac100.c
Changelog ========= v2 -> v3: * Include (one line) change in include/uapi/linux/if_ether.h to fix build error. v1 -> v2: * Support NCSI spec v1.1.0 (3 more commands and 4 hardware arbitration modes added). * Enable AEN packets according to the supported list. * Introduce NCSI channel states and processing queue in order to support the hardware arbitration. * The hardware arbitration is supported (tested with emulated environment). * Introduce link monitor with GLS (Get Link Status) command/response as part of the error handling defined in NCSI spec. * Support IPv6 address discovery when CONFIG_IPV6 is enabled. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|