#
b6ca8bd5 |
| 28-Nov-2017 |
David Miller <davem@davemloft.net> |
xfrm: Move child route linkage into xfrm_dst.
XFRM bundle child chains look like this:
xdst1 --> xdst2 --> xdst3 --> path_dst
All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL. T
xfrm: Move child route linkage into xfrm_dst.
XFRM bundle child chains look like this:
xdst1 --> xdst2 --> xdst3 --> path_dst
All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL. The final child pointer in the chain, here called 'path_dst', is some other kind of route such as an ipv4 or ipv6 one.
The xfrm output path pops routes, one at a time, via the child pointer, until we hit one which has a dst->xfrm pointer which is NULL.
We can easily preserve the above mechanisms with child sitting only in the xfrm_dst structure. All children in the chain before we break out of the xfrm_output() loop have dst->xfrm non-NULL and are therefore xfrm_dst objects.
Since we break out of the loop when we find dst->xfrm NULL, we will not try to dereference 'dst' as if it were an xfrm_dst.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b92cf4aa |
| 28-Nov-2017 |
David Miller <davem@davemloft.net> |
net: Create and use new helper xfrm_dst_child().
Only IPSEC routes have a non-NULL dst->child pointer. And IPSEC routes are identified by a non-NULL dst->xfrm pointer.
Signed-off-by: David S. Mill
net: Create and use new helper xfrm_dst_child().
Only IPSEC routes have a non-NULL dst->child pointer. And IPSEC routes are identified by a non-NULL dst->xfrm pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.13.16, v4.14 |
|
#
833e0e2f |
| 10-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
net: dst: move cpu inside ifdef to avoid compilation warning
If CONFIG_DST_CACHE is not selected cpu variable will be unused and we will see a compilation warning. Move it under the ifdef.
Reported
net: dst: move cpu inside ifdef to avoid compilation warning
If CONFIG_DST_CACHE is not selected cpu variable will be unused and we will see a compilation warning. Move it under the ifdef.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: d66f2b91f95b ("bpf: don't rely on the verifier lock for metadata_dst allocation") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d66f2b91 |
| 09-Oct-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
bpf: don't rely on the verifier lock for metadata_dst allocation
bpf_skb_set_tunnel_*() functions require allocation of per-cpu metadata_dst. The allocation happens upon verification of the first p
bpf: don't rely on the verifier lock for metadata_dst allocation
bpf_skb_set_tunnel_*() functions require allocation of per-cpu metadata_dst. The allocation happens upon verification of the first program using those helpers. In preparation for removing the verifier lock, use cmpxchg() to make sure we only allocate the metadata_dsts once.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.13.5, v4.13 |
|
#
e65a4955 |
| 18-Aug-2017 |
David Lamparter <equinox@diac24.net> |
net: check type when freeing metadata dst
Commit 3fcece12bc1b ("net: store port/representator id in metadata_dst") added a new type field to metadata_dst, but metadata_dst_free() wasn't updated to c
net: check type when freeing metadata dst
Commit 3fcece12bc1b ("net: store port/representator id in metadata_dst") added a new type field to metadata_dst, but metadata_dst_free() wasn't updated to check it before freeing the METADATA_IP_TUNNEL specific dst cache entry.
This is not currently causing problems since it's far enough back in the struct to be zeroed for the only other type currently in existance (METADATA_HW_PORT_MUX), but nevertheless it's not correct.
Fixes: 3fcece12bc1b ("net: store port/representator id in metadata_dst") Signed-off-by: David Lamparter <equinox@diac24.net> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Sridhar Samudrala <sridhar.samudrala@intel.com> Cc: Simon Horman <horms@verge.net.au> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9620fef2 |
| 18-Aug-2017 |
Eric Dumazet <edumazet@google.com> |
ipv4: convert dst_metrics.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to
ipv4: convert dst_metrics.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.12 |
|
#
3fcece12 |
| 23-Jun-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
net: store port/representator id in metadata_dst
Switches and modern SR-IOV enabled NICs may multiplex traffic from Port representators and control messages over single set of hardware queues. Contr
net: store port/representator id in metadata_dst
Switches and modern SR-IOV enabled NICs may multiplex traffic from Port representators and control messages over single set of hardware queues. Control messages and muxed traffic may need ordered delivery.
Those requirements make it hard to comfortably use TC infrastructure today unless we have a way of attaching metadata to skbs at the upper device. Because single set of queues is used for many netdevs stopping TC/sched queues of all of them reliably is impossible and lower device has to retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on the fastpath.
This patch attempts to enable port/representative devs to attach metadata to skbs which carry port id. This way representatives can be queueless and all queuing can be performed at the lower netdev in the usual way.
Traffic arriving on the port/representative interfaces will be have metadata attached and will subsequently be queued to the lower device for transmission. The lower device should recognize the metadata and translate it to HW specific format which is most likely either a special header inserted before the network headers or descriptor/metadata fields.
Metadata is associated with the lower device by storing the netdev pointer along with port id so that if TC decides to redirect or mirror the new netdev will not try to interpret it.
This is mostly for SR-IOV devices since switches don't have lower netdevs today.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a4c2fd7f |
| 17-Jun-2017 |
Wei Wang <weiwan@google.com> |
net: remove DST_NOCACHE flag
DST_NOCACHE flag check has been removed from dst_release() and dst_hold_safe() in a previous patch because all the dst are now ref counted properly and can be released b
net: remove DST_NOCACHE flag
DST_NOCACHE flag check has been removed from dst_release() and dst_hold_safe() in a previous patch because all the dst are now ref counted properly and can be released based on refcnt only. Looking at the rest of the DST_NOCACHE use, all of them can now be removed or replaced with other checks. So this patch gets rid of all the DST_NOCACHE usage and remove this flag completely.
Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b2a9c0ed |
| 17-Jun-2017 |
Wei Wang <weiwan@google.com> |
net: remove DST_NOGC flag
Now that all the components have been changed to release dst based on refcnt only and not depend on dst gc anymore, we can remove the temporary flag DST_NOGC.
Note that we
net: remove DST_NOGC flag
Now that all the components have been changed to release dst based on refcnt only and not depend on dst gc anymore, we can remove the temporary flag DST_NOGC.
Note that we also need to remove the DST_NOCACHE check in dst_release() and dst_hold_safe() because now all the dst are released based on refcnt and behaves as DST_NOCACHE.
Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
5b7c9a8f |
| 17-Jun-2017 |
Wei Wang <weiwan@google.com> |
net: remove dst gc related code
This patch removes all dst gc related code and all the dst free functions
Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signe
net: remove dst gc related code
This patch removes all dst gc related code and all the dst free functions
Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
52df157f |
| 17-Jun-2017 |
Wei Wang <weiwan@google.com> |
xfrm: take refcnt of dst when creating struct xfrm_dst bundle
During the creation of xfrm_dst bundle, always take ref count when allocating the dst. This way, xfrm_bundle_create() will form a linked
xfrm: take refcnt of dst when creating struct xfrm_dst bundle
During the creation of xfrm_dst bundle, always take ref count when allocating the dst. This way, xfrm_bundle_create() will form a linked list of dst with dst->child pointing to a ref counted dst child. And the returned dst pointer is also ref counted. This makes the link from the flow cache to this dst now ref counted properly. As the dst is always ref counted properly, we can safely mark DST_NOGC flag so dst_release() will release dst based on refcnt only. And dst gc is no longer needed and all dst_free() and its related function calls should be replaced with dst_release() or dst_release_immediate().
The special handling logic for dst->child in dst_destroy() can be replaced with a simple dst_release_immediate() call on the child to release the whole list linked by dst->child pointer. Previously used DST_NOHASH flag is not needed anymore as well. The reason that DST_NOHASH is used in the existing code is mainly to prevent the dst inserted in the fib tree to be wrongly destroyed during the deletion of the xfrm_dst bundle. So in the existing code, DST_NOHASH flag is marked in all the dst children except the one which is in the fib tree. However, with this patch series to remove dst gc logic and release dst only based on ref count, it is safe to release all the children from a xfrm_dst bundle as long as the dst children are all ref counted properly which is already the case in the existing code. So, this patch removes the use of DST_NOHASH flag.
Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
4a6ce2b6 |
| 17-Jun-2017 |
Wei Wang <weiwan@google.com> |
net: introduce a new function dst_dev_put()
This function should be called when removing routes from fib tree after the dst gc is no longer in use. We first mark DST_OBSOLETE_DEAD on this dst to mak
net: introduce a new function dst_dev_put()
This function should be called when removing routes from fib tree after the dst gc is no longer in use. We first mark DST_OBSOLETE_DEAD on this dst to make sure next dst_ops->check() fails and returns NULL. Secondly, as we no longer keep the gc_list, we need to properly release dst->dev right at the moment when the dst is removed from the fib/fib6 tree. It does the following: 1. change dst->input and output pointers to dst_discard/dst_dscard_out to discard all packets 2. replace dst->dev with loopback interface
Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
5f56f409 |
| 17-Jun-2017 |
Wei Wang <weiwan@google.com> |
net: introduce DST_NOGC in dst_release() to destroy dst based on refcnt
The current mechanism of freeing dst is a bit complicated. dst has its ref count and when user grabs the reference to the dst,
net: introduce DST_NOGC in dst_release() to destroy dst based on refcnt
The current mechanism of freeing dst is a bit complicated. dst has its ref count and when user grabs the reference to the dst, the ref count is properly taken in most cases except in IPv4/IPv6/decnet/xfrm routing code due to some historic reasons.
If the reference to dst is always taken properly, we should be able to simplify the logic in dst_release() to destroy dst when dst->__refcnt drops from 1 to 0. And this should be the only condition to determine if we can call dst_destroy(). And as dst is always ref counted, there is no need for a dst garbage list to hold the dst entries that already get removed by the routing code but are still held by other users. And the task to periodically check the list to free dst if ref count become 0 is also not needed anymore.
This patch introduces a temporary flag DST_NOGC(no garbage collector). If it is set in the dst, dst_release() will call dst_destroy() when dst->__refcnt drops to 0. dst_hold_safe() will also check for this flag and do atomic_inc_not_zero() similar as DST_NOCACHE to avoid double free issue. This temporary flag is mainly used so that we can make the transition component by component without breaking other parts. This flag will be removed after all components are properly transitioned.
This patch also introduces a new function dst_release_immediate() which destroys dst without waiting on the rcu when refcnt drops to 0. It will be used in later patches.
Follow-up patches will correct all the places to properly take ref count on dst and mark DST_NOGC. dst_release() or dst_release_immediate() will be used to release the dst instead of dst_free() and its related functions. And final clean-up patch will remove the DST_NOGC flag.
Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f186ce61 |
| 08-Jun-2017 |
Krister Johansen <kjlx@templeofstupid.com> |
Fix an intermittent pr_emerg warning about lo becoming free.
It looks like this:
Message from syslogd@flamingo at Apr 26 00:45:00 ... kernel:unregister_netdevice: waiting for lo to become free. Us
Fix an intermittent pr_emerg warning about lo becoming free.
It looks like this:
Message from syslogd@flamingo at Apr 26 00:45:00 ... kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4
They seem to coincide with net namespace teardown.
The message is emitted by netdev_wait_allrefs().
Forced a kdump in netdev_run_todo, but found that the refcount on the lo device was already 0 at the time we got to the panic.
Used bcc to check the blocking in netdev_run_todo. The only places where we're off cpu there are in the rcu_barrier() and msleep() calls. That behavior is expected. The msleep time coincides with the amount of time we spend waiting for the refcount to reach zero; the rcu_barrier() wait times are not excessive.
After looking through the list of callbacks that the netdevice notifiers invoke in this path, it appears that the dst_dev_event is the most interesting. The dst_ifdown path places a hold on the loopback_dev as part of releasing the dev associated with the original dst cache entry. Most of our notifier callbacks are straight-forward, but this one a) looks complex, and b) places a hold on the network interface in question.
I constructed a new bcc script that watches various events in the liftime of a dst cache entry. Note that dst_ifdown will take a hold on the loopback device until the invalidated dst entry gets freed.
[ __dst_free] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282115677036183 __dst_free rcu_nocb_kthread kthread ret_from_fork Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3fb07daf |
| 25-May-2017 |
Eric Dumazet <edumazet@google.com> |
ipv4: add reference counting to metrics
Andrey Konovalov reported crashes in ipv4_mtu()
I could reproduce the issue with KASAN kernels, between 10.246.7.151 and 10.246.7.152 :
1) 20 concurrent net
ipv4: add reference counting to metrics
Andrey Konovalov reported crashes in ipv4_mtu()
I could reproduce the issue with KASAN kernels, between 10.246.7.151 and 10.246.7.152 :
1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &
2) At the same time run following loop : while : do ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 done
Cong Wang attempted to add back rt->fi in commit 82486aa6f1b9 ("ipv4: restore rt->fi for reference counting") but this proved to add some issues that were complex to solve.
Instead, I suggested to add a refcount to the metrics themselves, being a standalone object (in particular, no reference to other objects)
I tried to make this patch as small as possible to ease its backport, instead of being super clean. Note that we believe that only ipv4 dst need to take care of the metric refcount. But if this is wrong, this patch adds the basic infrastructure to extend this to other families.
Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang for his efforts on this problem.
Fixes: 2860583fe840 ("ipv4: Kill rt->fi") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.17, v4.10.16, v4.10.15, v4.10.14, v4.10.13, v4.10.12, v4.10.11, v4.10.10, v4.10.9, v4.10.8, v4.10.7, v4.10.6, v4.10.5, v4.10.4, v4.10.3, v4.10.2, v4.10.1, v4.10 |
|
#
51ce8bd4 |
| 06-Feb-2017 |
Julian Anastasov <ja@ssi.bg> |
net: pending_confirm is not used anymore
When same struct dst_entry can be used for many different neighbours we can not use it for pending confirmations. As last step, we can remove the pending_con
net: pending_confirm is not used anymore
When same struct dst_entry can be used for many different neighbours we can not use it for pending confirmations. As last step, we can remove the pending_confirm flag.
Reported-by: YueHaibing <yuehaibing@huawei.com> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.") Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31, v4.4.30, v4.4.29, v4.4.28, v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4, v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18, v4.4.17, openbmc-4.4-20160804-1, v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1, openbmc-20160713-1, v4.4.15, v4.6.4, v4.6.3, v4.4.14, v4.6.2, v4.4.13, openbmc-20160606-1, v4.6.1, v4.4.12, openbmc-20160521-1, v4.4.11, openbmc-20160518-1, v4.6, v4.4.10, openbmc-20160511-1, openbmc-20160505-1, v4.4.9, v4.4.8, v4.4.7, openbmc-20160329-2, openbmc-20160329-1, openbmc-20160321-1, v4.4.6, v4.5, v4.4.5, v4.4.4, v4.4.3, openbmc-20160222-1, v4.4.2 |
|
#
d71785ff |
| 12-Feb-2016 |
Paolo Abeni <pabeni@redhat.com> |
net: add dst_cache to ovs vxlan lwtunnel
In case of UDP traffic with datagram length below MTU this give about 2% performance increase when tunneling over ipv4 and about 60% when tunneling over ipv6
net: add dst_cache to ovs vxlan lwtunnel
In case of UDP traffic with datagram length below MTU this give about 2% performance increase when tunneling over ipv4 and about 60% when tunneling over ipv6
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: openbmc-20160212-1, openbmc-20160210-1, openbmc-20160202-2, openbmc-20160202-1, v4.4.1, openbmc-20160127-1, openbmc-20160120-1, v4.4 |
|
#
07a5d384 |
| 06-Jan-2016 |
Francesco Ruggeri <fruggeri@aristanetworks.com> |
net: possible use after free in dst_release
dst_release should not access dst->flags after decrementing __refcnt to 0. The dst_entry may be in dst_busy_list and dst_gc_task may dst_destroy it before
net: possible use after free in dst_release
dst_release should not access dst->flags after decrementing __refcnt to 0. The dst_entry may be in dst_busy_list and dst_gc_task may dst_destroy it before dst_release gets a chance to access dst->flags.
Fixes: d69bbf88c8d0 ("net: fix a race in dst_release()") Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst") Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: openbmc-20151217-1, openbmc-20151210-1, openbmc-20151202-1, openbmc-20151123-1, openbmc-20151118-1 |
|
#
d69bbf88 |
| 09-Nov-2015 |
Eric Dumazet <edumazet@google.com> |
net: fix a race in dst_release()
Only cpu seeing dst refcount going to 0 can safely dereference dst->flags.
Otherwise an other cpu might already have freed the dst.
Fixes: 27b75c95f10d ("net: avoi
net: fix a race in dst_release()
Only cpu seeing dst refcount going to 0 can safely dereference dst->flags.
Otherwise an other cpu might already have freed the dst.
Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst") Reported-by: Greg Thelen <gthelen@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: openbmc-20151104-1, v4.3, openbmc-20151102-1, openbmc-20151028-1 |
|
#
ede2059d |
| 07-Oct-2015 |
Eric W. Biederman <ebiederm@xmission.com> |
dst: Pass net into dst->output
The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> S
dst: Pass net into dst->output
The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.3-rc1 |
|
#
63b6c13d |
| 31-Aug-2015 |
Pravin B Shelar <pshelar@nicira.com> |
tun_dst: Remove opts_size
opts_size is only written and never read. Following patch removes this unused variable.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller
tun_dst: Remove opts_size
opts_size is only written and never read. Following patch removes this unused variable.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.2 |
|
#
e252b3d1 |
| 25-Aug-2015 |
WANG Cong <xiyou.wangcong@gmail.com> |
route: fix a use-after-free
This patch fixes the following crash:
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc7+ #166 Hardware name:
route: fix a use-after-free
This patch fixes the following crash:
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc7+ #166 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88010656d280 ti: ffff880106570000 task.ti: ffff880106570000 RIP: 0010:[<ffffffff8182f91b>] [<ffffffff8182f91b>] dst_destroy+0xa6/0xef RSP: 0018:ffff880107603e38 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff8800d225a000 RCX: ffffffff82250fd0 RDX: 0000000000000001 RSI: ffffffff82250fd0 RDI: 6b6b6b6b6b6b6b6b RBP: ffff880107603e58 R08: 0000000000000001 R09: 0000000000000001 R10: 000000000000b530 R11: ffff880107609000 R12: 0000000000000000 R13: ffffffff82343c40 R14: 0000000000000000 R15: ffffffff8182fb4f FS: 0000000000000000(0000) GS:ffff880107600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fcabd9d3000 CR3: 00000000d7279000 CR4: 00000000000006e0 Stack: ffffffff82250fd0 ffff8801077d6f00 ffffffff82253c40 ffff8800d225a000 ffff880107603e68 ffffffff8182fb5d ffff880107603f08 ffffffff810d795e ffffffff810d7648 ffff880106574000 ffff88010656d280 ffff88010656d280 Call Trace: <IRQ> [<ffffffff8182fb5d>] dst_destroy_rcu+0xe/0x1d [<ffffffff810d795e>] rcu_process_callbacks+0x618/0x7eb [<ffffffff810d7648>] ? rcu_process_callbacks+0x302/0x7eb [<ffffffff8182fb4f>] ? dst_gc_task+0x1eb/0x1eb [<ffffffff8107e11b>] __do_softirq+0x178/0x39f [<ffffffff8107e52e>] irq_exit+0x41/0x95 [<ffffffff81a4f215>] smp_apic_timer_interrupt+0x34/0x40 [<ffffffff81a4d5cd>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff8100b968>] ? default_idle+0x21/0x32 [<ffffffff8100b966>] ? default_idle+0x1f/0x32 [<ffffffff8100bf19>] arch_cpu_idle+0xf/0x11 [<ffffffff810b0bc7>] default_idle_call+0x1f/0x21 [<ffffffff810b0dce>] cpu_startup_entry+0x1ad/0x273 [<ffffffff8102fe67>] start_secondary+0x135/0x156
dst is freed right before lwtstate_put(), this is not correct...
Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry") Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.2-rc8 |
|
#
61adedf3 |
| 20-Aug-2015 |
Jiri Benc <jbenc@redhat.com> |
route: move lwtunnel state to dst_entry
Currently, the lwtunnel state resides in per-protocol data. This is a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa). The xmit funct
route: move lwtunnel state to dst_entry
Currently, the lwtunnel state resides in per-protocol data. This is a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa). The xmit function of the tunnel does not know whether the packet has been routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the lwtstate data to dst_entry makes such inter-protocol tunneling possible.
As a bonus, this brings a nice diffstat.
Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.2-rc7, v4.2-rc6, v4.2-rc5 |
|
#
d3aa45ce |
| 30-Jul-2015 |
Alexei Starovoitov <ast@plumgrid.com> |
bpf: add helpers to access tunnel metadata
Introduce helpers to let eBPF programs attached to TC manipulate tunnel metadata: bpf_skb_[gs]et_tunnel_key(skb, key, size, flags) skb: pointer to skb key:
bpf: add helpers to access tunnel metadata
Introduce helpers to let eBPF programs attached to TC manipulate tunnel metadata: bpf_skb_[gs]et_tunnel_key(skb, key, size, flags) skb: pointer to skb key: pointer to 'struct bpf_tunnel_key' size: size of 'struct bpf_tunnel_key' flags: room for future extensions
First eBPF program that uses these helpers will allocate per_cpu metadata_dst structures that will be used on TX. On RX metadata_dst is allocated by tunnel driver.
Typical usage for TX: struct bpf_tunnel_key tkey; ... populate tkey ... bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0); bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
RX: struct bpf_tunnel_key tkey = {}; bpf_skb_get_tunnel_key(skb, &tkey, sizeof(tkey), 0); ... lookup or redirect based on tkey ...
'struct bpf_tunnel_key' will be extended in the future by adding elements to the end and the 'size' argument will indicate which fields are populated, thereby keeping backwards compatibility. The 'flags' argument may be used as well when the 'size' is not enough or to indicate completely different layout of bpf_tunnel_key.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.2-rc4 |
|
#
f38a9eb1 |
| 21-Jul-2015 |
Thomas Graf <tgraf@suug.ch> |
dst: Metadata destinations
Introduces a new dst_metadata which enables to carry per packet metadata between forwarding and processing elements via the skb->dst pointer.
The structure is set up to b
dst: Metadata destinations
Introduces a new dst_metadata which enables to carry per packet metadata between forwarding and processing elements via the skb->dst pointer.
The structure is set up to be a union. Thus, each separate type of metadata requires its own dst instance. If demand arises to carry multiple types of metadata concurrently, metadata dst entries can be made stackable.
The metadata dst entry is refcnt'ed as expected for now but a non reference counted use is possible if the reference is forced before queueing the skb.
In order to allow allocating dsts with variable length, the existing dst_alloc() is split into a dst_alloc() and dst_init() function. The existing dst_init() function to initialize the subsystem is being renamed to dst_subsys_init() to make it clear what is what.
The check before ip_route_input() is changed to ignore metadata dsts and drop the dst inside the routing function thus allowing to interpret metadata in a later commit.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|