Revision tags: v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5, v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1 |
|
#
b35725a2 |
| 10-Feb-2015 |
Pravin B Shelar <pshelar@nicira.com> |
openvswitch: Reset key metadata for packet execution.
Userspace packet execute command pass down flow key for given packet. But userspace can skip some parameter with zero value. Therefore kernel ne
openvswitch: Reset key metadata for packet execution.
Userspace packet execute command pass down flow key for given packet. But userspace can skip some parameter with zero value. Therefore kernel needs to initialize key metadata to zero.
Fixes: 0714812134 ("openvswitch: Eliminate memset() from flow_extract.") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.19, v3.19-rc7, v3.19-rc6, v3.19-rc5 |
|
#
d91641d9 |
| 14-Jan-2015 |
Thomas Graf <tgraf@suug.ch> |
openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()
Also factors out Geneve validation code into a new separate function validate_and_copy_geneve_opts().
A subsequent patch will introduce
openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()
Also factors out Geneve validation code into a new separate function validate_and_copy_geneve_opts().
A subsequent patch will introduce VXLAN options. Rename the existing GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic tunnel metadata options.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
df8a39de |
| 13-Jan-2015 |
Jiri Pirko <jiri@resnulli.us> |
net: rename vlan_tx_* helpers since "tx" is misleading there
The same macros are used for rx as well. So rename it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <dave
net: rename vlan_tx_* helpers since "tx" is misleading there
The same macros are used for rx as well. So rename it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.19-rc4, v3.19-rc3 |
|
#
24cc59d1 |
| 31-Dec-2014 |
Ben Pfaff <blp@nicira.com> |
openvswitch: Consistently include VLAN header in flow and port stats.
Until now, when VLAN acceleration was in use, the bytes of the VLAN header were not included in port or flow byte counters. The
openvswitch: Consistently include VLAN header in flow and port stats.
Until now, when VLAN acceleration was in use, the bytes of the VLAN header were not included in port or flow byte counters. They were however included when VLAN acceleration was not used. This commit corrects the inconsistency, by always including the VLAN header in byte counters.
Previous discussion at http://openvswitch.org/pipermail/dev/2014-December/049521.html
Reported-by: Motonori Shindo <mshindo@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Reviewed-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.19-rc2, v3.19-rc1, v3.18, v3.18-rc7, v3.18-rc6, v3.18-rc5, v3.18-rc4 |
|
#
05da5898 |
| 06-Nov-2014 |
Jarno Rajahalme <jrajahalme@nicira.com> |
openvswitch: Add support for OVS_FLOW_ATTR_PROBE.
This new flag is useful for suppressing error logging while probing for datapath features using flow commands. For backwards compatibility reasons
openvswitch: Add support for OVS_FLOW_ATTR_PROBE.
This new flag is useful for suppressing error logging while probing for datapath features using flow commands. For backwards compatibility reasons the commands are executed normally, but error logging is suppressed.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
show more ...
|
#
12eb18f7 |
| 06-Nov-2014 |
Thomas Graf <tgraf@noironetworks.com> |
openvswitch: Constify various function arguments
Help produce better optimized code.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
|
Revision tags: v3.18-rc3, v3.18-rc2, v3.18-rc1 |
|
#
25cd9ba0 |
| 06-Oct-2014 |
Simon Horman <horms@verge.net.au> |
openvswitch: Add basic MPLS support to kernel
Allow datapath to recognize and extract MPLS labels into flow keys and execute actions which push, pop, and set labels on packets.
Based heavily on wor
openvswitch: Add basic MPLS support to kernel
Allow datapath to recognize and extract MPLS labels into flow keys and execute actions which push, pop, and set labels on packets.
Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.
Cc: Ravi K <rkerur@gmail.com> Cc: Leo Alterman <lalterman@nicira.com> Cc: Isaku Yamahata <yamahata@valinux.co.jp> Cc: Joe Stringer <joe@wand.net.nz> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
show more ...
|
#
25ef1328 |
| 17-Oct-2014 |
Pravin B Shelar <pshelar@nicira.com> |
openvswitch: Set flow-key members.
This patch adds missing memset which are required to initialize flow key member. For example for IP flow we need to initialize ip.frag for all cases.
Found by ins
openvswitch: Set flow-key members.
This patch adds missing memset which are required to initialize flow key member. For example for IP flow we need to initialize ip.frag for all cases.
Found by inspection.
This bug is introduced by commit 0714812134d7dcadeb7ecfbfeb18788aa7e1eaac ("openvswitch: Eliminate memset() from flow_extract").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
389f4894 |
| 17-Oct-2014 |
Li RongQing <roy.qing.li@gmail.com> |
openvswitch: fix a use after free
pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp setting after arphdr_ok to avoid the use the freed memory
Fixes: 0714812134d7d ("openvswit
openvswitch: fix a use after free
pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp setting after arphdr_ok to avoid the use the freed memory
Fixes: 0714812134d7d ("openvswitch: Eliminate memset() from flow_extract.") Cc: Jesse Gross <jesse@nicira.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.17 |
|
#
f5796684 |
| 03-Oct-2014 |
Jesse Gross <jesse@nicira.com> |
openvswitch: Add support for Geneve tunneling.
The Openvswitch implementation is completely agnostic to the options that are in use and can handle newly defined options without further work. It does
openvswitch: Add support for Geneve tunneling.
The Openvswitch implementation is completely agnostic to the options that are in use and can handle newly defined options without further work. It does this by simply matching on a byte array of options and allowing userspace to setup flows on this array.
Signed-off-by: Jesse Gross <jesse@nicira.com> Singed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f0b128c1 |
| 03-Oct-2014 |
Jesse Gross <jesse@nicira.com> |
openvswitch: Wrap struct ovs_key_ipv4_tunnel in a new structure.
Currently, the flow information that is matched for tunnels and the tunnel data passed around with packets is the same. However, as a
openvswitch: Wrap struct ovs_key_ipv4_tunnel in a new structure.
Currently, the flow information that is matched for tunnels and the tunnel data passed around with packets is the same. However, as additional information is added this is not necessarily desirable, as in the case of pointers.
This adds a new structure for tunnel metadata which currently contains only the existing struct. This change is purely internal to the kernel since the current OVS_KEY_ATTR_IPV4_TUNNEL is simply a compressed version of OVS_KEY_ATTR_TUNNEL that is translated at flow setup.
Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
07148121 |
| 03-Oct-2014 |
Jesse Gross <jesse@nicira.com> |
openvswitch: Eliminate memset() from flow_extract.
As new protocols are added, the size of the flow key tends to increase although few protocols care about all of the fields. In order to optimize th
openvswitch: Eliminate memset() from flow_extract.
As new protocols are added, the size of the flow key tends to increase although few protocols care about all of the fields. In order to optimize this for hashing and matching, OVS uses a variable length portion of the key. However, when fields are extracted from the packet we must still zero out the entire key.
This is no longer necessary now that OVS implements masking. Any fields (or holes in the structure) which are not part of a given protocol will be by definition not part of the mask and zeroed out during lookup. Furthermore, since masking already uses variable length keys this zeroing operation automatically benefits as well.
In principle, the only thing that needs to be done at this point is remove the memset() at the beginning of flow. However, some fields assume that they are initialized to zero, which now must be done explicitly. In addition, in the event of an error we must also zero out corresponding fields to signal that there is no valid data present. These increase the total amount of code but very little of it is executed in non-error situations.
Removing the memset() reduces the profile of ovs_flow_extract() from 0.64% to 0.56% when tested with large packets on a 10G link.
Suggested-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.17-rc7, v3.17-rc6 |
|
#
971427f3 |
| 15-Sep-2014 |
Andy Zhou <azhou@nicira.com> |
openvswitch: Add recirc and hash action.
Recirc action allows a packet to reenter openvswitch processing. currently openvswitch lookup flow for packet received and execute set of actions on that pac
openvswitch: Add recirc and hash action.
Recirc action allows a packet to reenter openvswitch processing. currently openvswitch lookup flow for packet received and execute set of actions on that packet, with help of recirc action we can process/modify the packet and recirculate it back in openvswitch for another pass.
OVS hash action calculates 5-tupple hash and set hash in flow-key hash. This can be used along with recirculation for distributing packets among different ports for bond devices. For example: OVS bonding can use following actions: Match on: bond flow; Action: hash, recirc(id) Match on: recirc-id == id and hash lower bits == a; Action: output port_bond_a
Signed-off-by: Andy Zhou <azhou@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
show more ...
|
#
8c8b1b83 |
| 15-Sep-2014 |
Pravin B Shelar <pshelar@nicira.com> |
openvswitch: Use tun_key only for egress tunnel path.
Currently tun_key is used for passing tunnel information on ingress and egress path, this cause confusion. Following patch removes its use on i
openvswitch: Use tun_key only for egress tunnel path.
Currently tun_key is used for passing tunnel information on ingress and egress path, this cause confusion. Following patch removes its use on ingress path make it egress only parameter.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
show more ...
|
#
83c8df26 |
| 15-Sep-2014 |
Pravin B Shelar <pshelar@nicira.com> |
openvswitch: refactor ovs flow extract API.
OVS flow extract is called on packet receive or packet execute code path. Following patch defines separate API for extracting flow-key in packet execute
openvswitch: refactor ovs flow extract API.
OVS flow extract is called on packet receive or packet execute code path. Following patch defines separate API for extracting flow-key in packet execute code path.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
show more ...
|
Revision tags: v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2 |
|
#
8c6b00c8 |
| 17-Aug-2014 |
Andreea-Cristina Bernat <bernat.ada@gmail.com> |
net/openvswitch/flow.c: Replace rcu_dereference() with rcu_access_pointer()
The "rcu_dereference()" call is used directly in a condition. Since its return value is never dereferenced it is recommend
net/openvswitch/flow.c: Replace rcu_dereference() with rcu_access_pointer()
The "rcu_dereference()" call is used directly in a condition. Since its return value is never dereferenced it is recommended to use "rcu_access_pointer()" instead of "rcu_dereference()". Therefore, this patch makes the replacement.
The following Coccinelle semantic patch was used: @@ @@
( if( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} | while( (<+... - rcu_dereference + rcu_access_pointer (...) ...+>)) {...} )
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.17-rc1, v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3, v3.16-rc2, v3.16-rc1, v3.15, v3.15-rc8, v3.15-rc7, v3.15-rc6, v3.15-rc5 |
|
#
ad552007 |
| 06-May-2014 |
Ben Pfaff <blp@nicira.com> |
openvswitch: Fix tracking of flags seen in TCP flows.
Flow statistics need to take into account the TCP flags from the packet currently being processed (in 'key'), not the TCP flags matched by the f
openvswitch: Fix tracking of flags seen in TCP flows.
Flow statistics need to take into account the TCP flags from the packet currently being processed (in 'key'), not the TCP flags matched by the flow found in the kernel flow table (in 'flow').
This bug made the Open vSwitch userspace fin_timeout action have no effect in many cases. This bug is introduced by commit 88d73f6c411ac2f0578 (openvswitch: Use TCP flags in the flow key for stats.)
Reported-by: Len Gao <leng@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
show more ...
|
#
86ec8dba |
| 05-May-2014 |
Jarno Rajahalme <jrajahalme@nicira.com> |
openvswitch: Fix ovs_flow_stats_get/clear RCU dereference.
For ovs_flow_stats_get() using ovsl_dereference() was wrong, since flow dumps call this with RCU read lock.
ovs_flow_stats_clear() is alwa
openvswitch: Fix ovs_flow_stats_get/clear RCU dereference.
For ovs_flow_stats_get() using ovsl_dereference() was wrong, since flow dumps call this with RCU read lock.
ovs_flow_stats_clear() is always called with ovs_mutex, so can use ovsl_dereference().
Also, make the ovs_flow_stats_get() 'flow' argument const to make later patches cleaner.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
show more ...
|
#
bb6f9a70 |
| 05-May-2014 |
Jarno Rajahalme <jrajahalme@nicira.com> |
openvswitch: Clarify locking.
Remove unnecessary locking from functions that are always called with appropriate locking.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Thomas
openvswitch: Clarify locking.
Remove unnecessary locking from functions that are always called with appropriate locking.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
show more ...
|
#
1139e241 |
| 05-May-2014 |
Jarno Rajahalme <jrajahalme@nicira.com> |
openvswitch: Compact sw_flow_key.
Minimize padding in sw_flow_key and move 'tp' top the main struct. These changes simplify code when accessing the transport port numbers and the tcp flags, and make
openvswitch: Compact sw_flow_key.
Minimize padding in sw_flow_key and move 'tp' top the main struct. These changes simplify code when accessing the transport port numbers and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit systems (128->120 bytes). These changes also make the keys for IPv4 packets to fit in one cache line.
There is a valid concern for safety of packing the struct ovs_key_ipv4_tunnel, as it would be possible to take the address of the tun_id member as a __be64 * which could result in unaligned access in some systems. However:
- sw_flow_key itself is 64-bit aligned, so the tun_id within is always 64-bit aligned. - We never make arrays of ovs_key_ipv4_tunnel (which would force every second tun_key to be misaligned). - We never take the address of the tun_id in to a __be64 *. - Whereever we use struct ovs_key_ipv4_tunnel outside the sw_flow_key, it is in stack (on tunnel input functions), where compiler has full control of the alignment.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
show more ...
|
Revision tags: v3.15-rc4, v3.15-rc3, v3.15-rc2, v3.15-rc1, v3.14 |
|
#
88d73f6c |
| 27-Mar-2014 |
Jarno Rajahalme <jrajahalme@nicira.com> |
openvswitch: Use TCP flags in the flow key for stats.
We already extract the TCP flags for the key, might as well use that for stats.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by
openvswitch: Use TCP flags in the flow key for stats.
We already extract the TCP flags for the key, might as well use that for stats.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
show more ...
|
#
63e7959c |
| 27-Mar-2014 |
Jarno Rajahalme <jrajahalme@nicira.com> |
openvswitch: Per NUMA node flow stats.
Keep kernel flow stats for each NUMA node rather than each (logical) CPU. This avoids using the per-CPU allocator and removes most of the kernel-side OVS lock
openvswitch: Per NUMA node flow stats.
Keep kernel flow stats for each NUMA node rather than each (logical) CPU. This avoids using the per-CPU allocator and removes most of the kernel-side OVS locking overhead otherwise on the top of perf reports and allows OVS to scale better with higher number of threads.
With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup rate doubles on a server with two hyper-threaded physical CPUs (16 logical cores each) compared to the current OVS master. Tested with non-trivial flow table with a TCP port match rule forcing all new connections with unique port numbers to OVS userspace. The IP addresses are still wildcarded, so the kernel flows are not considered as exact match 5-tuple flows. This type of flows can be expected to appear in large numbers as the result of more effective wildcarding made possible by improvements in OVS userspace flow classifier.
Perf results for this test (master):
Events: 305K cycles + 8.43% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner + 5.64% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock + 4.75% ovs-vswitchd ovs-vswitchd [.] find_match_wc + 3.32% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock + 2.61% ovs-vswitchd [kernel.kallsyms] [k] pcpu_alloc_area + 2.19% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range + 2.03% swapper [kernel.kallsyms] [k] intel_idle + 1.84% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock + 1.64% ovs-vswitchd ovs-vswitchd [.] classifier_lookup + 1.58% ovs-vswitchd libc-2.15.so [.] 0x7f4e6 + 1.07% ovs-vswitchd [kernel.kallsyms] [k] memset + 1.03% netperf [kernel.kallsyms] [k] __ticket_spin_lock + 0.92% swapper [kernel.kallsyms] [k] __ticket_spin_lock ...
And after this patch:
Events: 356K cycles + 6.85% ovs-vswitchd ovs-vswitchd [.] find_match_wc + 4.63% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_lock + 3.06% ovs-vswitchd [kernel.kallsyms] [k] __ticket_spin_lock + 2.81% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask_range + 2.51% ovs-vswitchd libpthread-2.15.so [.] pthread_mutex_unlock + 2.27% ovs-vswitchd ovs-vswitchd [.] classifier_lookup + 1.84% ovs-vswitchd libc-2.15.so [.] 0x15d30f + 1.74% ovs-vswitchd [kernel.kallsyms] [k] mutex_spin_on_owner + 1.47% swapper [kernel.kallsyms] [k] intel_idle + 1.34% ovs-vswitchd ovs-vswitchd [.] flow_hash_in_minimask + 1.33% ovs-vswitchd ovs-vswitchd [.] rule_actions_unref + 1.16% ovs-vswitchd ovs-vswitchd [.] hindex_node_with_hash + 1.16% ovs-vswitchd ovs-vswitchd [.] do_xlate_actions + 1.09% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref + 1.01% netperf [kernel.kallsyms] [k] __ticket_spin_lock ...
There is a small increase in kernel spinlock overhead due to the same spinlock being shared between multiple cores of the same physical CPU, but that is barely visible in the netperf TCP_CRR test performance (maybe ~1% performance drop, hard to tell exactly due to variance in the test results), when testing for kernel module throughput (with no userspace activity, handful of kernel flows).
On flow setup, a single stats instance is allocated (for the NUMA node 0). As CPUs from multiple NUMA nodes start updating stats, new NUMA-node specific stats instances are allocated. This allocation on the packet processing code path is made to never block or look for emergency memory pools, minimizing the allocation latency. If the allocation fails, the existing preallocated stats instance is used. Also, if only CPUs from one NUMA-node are updating the preallocated stats instance, no additional stats instances are allocated. This eliminates the need to pre-allocate stats instances that will not be used, also relieving the stats reader from the burden of reading stats that are never used.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
show more ...
|
#
23dabf88 |
| 27-Mar-2014 |
Jarno Rajahalme <jrajahalme@nicira.com> |
openvswitch: Remove 5-tuple optimization.
The 5-tuple optimization becomes unnecessary with a later per-NUMA node stats patch. Remove it first to make the changes easier to grasp.
Signed-off-by: J
openvswitch: Remove 5-tuple optimization.
The 5-tuple optimization becomes unnecessary with a later per-NUMA node stats patch. Remove it first to make the changes easier to grasp.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
show more ...
|
Revision tags: v3.14-rc8, v3.14-rc7, v3.14-rc6, v3.14-rc5, v3.14-rc4 |
|
#
8c63ff09 |
| 18-Feb-2014 |
Joe Perches <joe@perches.com> |
openvswitch: Use ether_addr_copy
It's slightly smaller/faster for some architectures.
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
|
#
4f647e0a |
| 27-Mar-2014 |
Flavio Leitner <fbl@redhat.com> |
openvswitch: fix a possible deadlock and lockdep warning
There are two problematic situations.
A deadlock can happen when is_percpu is false because it can get interrupted while holding the spinloc
openvswitch: fix a possible deadlock and lockdep warning
There are two problematic situations.
A deadlock can happen when is_percpu is false because it can get interrupted while holding the spinlock. Then it executes ovs_flow_stats_update() in softirq context which tries to get the same lock.
The second sitation is that when is_percpu is true, the code correctly disables BH but only for the local CPU, so the following can happen when locking the remote CPU without disabling BH:
CPU#0 CPU#1 ovs_flow_stats_get() stats_read() +->spin_lock remote CPU#1 ovs_flow_stats_get() | <interrupted> stats_read() | ... +--> spin_lock remote CPU#0 | | <interrupted> | ovs_flow_stats_update() | ... | spin_lock local CPU#0 <--+ ovs_flow_stats_update() +---------------------------------- spin_lock local CPU#1
This patch disables BH for both cases fixing the deadlocks. Acked-by: Jesse Gross <jesse@nicira.com>
================================= [ INFO: inconsistent lock state ] 3.14.0-rc8-00007-g632b06a #1 Tainted: G I --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes: (&(&cpu_stats->lock)->rlock){+.?...}, at: [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch] {SOFTIRQ-ON-W} state was registered at: [<ffffffff810f973f>] __lock_acquire+0x68f/0x1c40 [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0 [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80 [<ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch] [<ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch] [<ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch] [<ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch] [<ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0 [<ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0 [<ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0 [<ffffffff816c0798>] genl_rcv+0x28/0x40 [<ffffffff816bf830>] netlink_unicast+0x100/0x1e0 [<ffffffff816bfc57>] netlink_sendmsg+0x347/0x770 [<ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0 [<ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0 [<ffffffff8166a911>] __sys_sendmsg+0x51/0x90 [<ffffffff8166a962>] SyS_sendmsg+0x12/0x20 [<ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b irq event stamp: 1740726 hardirqs last enabled at (1740726): [<ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840 hardirqs last disabled at (1740725): [<ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840 softirqs last enabled at (1740674): [<ffffffff8109be12>] _local_bh_enable+0x22/0x50 softirqs last disabled at (1740675): [<ffffffff8109db05>] irq_exit+0xc5/0xd0
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(&(&cpu_stats->lock)->rlock); <Interrupt> lock(&(&cpu_stats->lock)->rlock);
*** DEADLOCK ***
5 locks held by swapper/0/0: #0: (((&ifa->dad_timer))){+.-...}, at: [<ffffffff810a7155>] call_timer_fn+0x5/0x320 #1: (rcu_read_lock){.+.+..}, at: [<ffffffff81788a55>] mld_sendpack+0x5/0x4a0 #2: (rcu_read_lock_bh){.+....}, at: [<ffffffff8175d149>] ip6_finish_output2+0x59/0x840 #3: (rcu_read_lock_bh){.+....}, at: [<ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0 #4: (rcu_read_lock){.+.+..}, at: [<ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch]
stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G I 3.14.0-rc8-00007-g632b06a #1 Hardware name: /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012 0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005 ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0 Call Trace: <IRQ> [<ffffffff817cfe3c>] dump_stack+0x4d/0x66 [<ffffffff817cb6da>] print_usage_bug+0x1f4/0x205 [<ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180 [<ffffffff810f8963>] mark_lock+0x223/0x2b0 [<ffffffff810f96d3>] __lock_acquire+0x623/0x1c40 [<ffffffff810f5707>] ? __lock_is_held+0x57/0x80 [<ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch] [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch] [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch] [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch] [<ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch] [<ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40 [<ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch] [<ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch] [<ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch] [<ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0 [<ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0 [<ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0 [<ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840 [<ffffffff8168c430>] dev_queue_xmit+0x10/0x20 [<ffffffff8175d641>] ip6_finish_output2+0x551/0x840 [<ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220 [<ffffffff8176128a>] ip6_finish_output+0x9a/0x220 [<ffffffff8176145f>] ip6_output+0x4f/0x1f0 [<ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0 [<ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220 [<ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50 [<ffffffff817690d7>] addrconf_dad_completed+0x147/0x220 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220 [<ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0 [<ffffffff810a71e9>] call_timer_fn+0x99/0x320 [<ffffffff810a7155>] ? call_timer_fn+0x5/0x320 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220 [<ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0 [<ffffffff8109d47d>] __do_softirq+0x12d/0x480
Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|