#
06aa8b8a |
| 01-Aug-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
inet: frags: rename last_in to flags
The last_in field has been used to store various flags different from first/last frag in so give it a more descriptive name: flags.
Signed-off-by: Nikolay Aleks
inet: frags: rename last_in to flags
The last_in field has been used to store various flags different from first/last frag in so give it a more descriptive name: flags.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d2373862 |
| 01-Aug-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
inet: frags: use INC_STATS_BH in the ipv6 reassembly code
Softirqs are already disabled so no need to do it again, thus let's be consistent and use the IP6_INC_STATS_BH variant.
Signed-off-by: Niko
inet: frags: use INC_STATS_BH in the ipv6 reassembly code
Softirqs are already disabled so no need to do it again, thus let's be consistent and use the IP6_INC_STATS_BH variant.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.16-rc7 |
|
#
1bab4c75 |
| 24-Jul-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
inet: frag: set limits and make init_net's high_thresh limit global
This patch makes init_net's high_thresh limit to be the maximum for all namespaces, thus introducing a global memory limit thresho
inet: frag: set limits and make init_net's high_thresh limit global
This patch makes init_net's high_thresh limit to be the maximum for all namespaces, thus introducing a global memory limit threshold equal to the sum of the individual high_thresh limits which are capped. It also introduces some sane minimums for low_thresh as it shouldn't be able to drop below 0 (or > high_thresh in the unsigned case), and overall low_thresh should not ever be above high_thresh, so we make the following relations for a namespace: init_net: high_thresh - max(not capped), min(init_net low_thresh) low_thresh - max(init_net high_thresh), min (0)
all other namespaces: high_thresh = max(init_net high_thresh), min(namespace's low_thresh) low_thresh = max(namespace's high_thresh), min(0)
The major issue with having low_thresh > high_thresh is that we'll schedule eviction but never evict anything and thus rely only on the timers.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ab1c724f |
| 24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: use seqlock for hash rebuild
rehash is rare operation, don't force readers to take the read-side rwlock.
Instead, we only have to detect the (rare) case where the secret was altered whi
inet: frag: use seqlock for hash rebuild
rehash is rare operation, don't force readers to take the read-side rwlock.
Instead, we only have to detect the (rare) case where the secret was altered while we are trying to insert a new inetfrag queue into the table.
If it was changed, drop the bucket lock and recompute the hash to get the 'new' chain bucket that we have to insert into.
Joint work with Nikolay Aleksandrov.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
e3a57d18 |
| 24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: remove periodic secret rebuild timer
merge functionality into the eviction workqueue.
Instead of rebuilding every n seconds, take advantage of the upper hash chain length limit.
If we
inet: frag: remove periodic secret rebuild timer
merge functionality into the eviction workqueue.
Instead of rebuilding every n seconds, take advantage of the upper hash chain length limit.
If we hit it, mark table for rebuild and schedule workqueue. To prevent frequent rebuilds when we're completely overloaded, don't rebuild more than once every 5 seconds.
ipfrag_secret_interval sysctl is now obsolete and has been marked as deprecated, it still can be changed so scripts won't be broken but it won't have any effect. A comment is left above each unused secret_timer variable to avoid confusion.
Joint work with Nikolay Aleksandrov.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3fd588eb |
| 24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: remove lru list
no longer used.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b13d3cbf |
| 24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: move eviction of queues to work queue
When the high_thresh limit is reached we try to toss the 'oldest' incomplete fragment queues until memory limits are below the low_thresh value. Th
inet: frag: move eviction of queues to work queue
When the high_thresh limit is reached we try to toss the 'oldest' incomplete fragment queues until memory limits are below the low_thresh value. This happens in softirq/packet processing context.
This has two drawbacks:
1) processors might evict a queue that was about to be completed by another cpu, because they will compete wrt. resource usage and resource reclaim.
2) LRU list maintenance is expensive.
But when constantly overloaded, even the 'least recently used' element is recent, so removing 'lru' queue first is not 'fairer' than removing any other fragment queue.
This moves eviction out of the fast path:
When the low threshold is reached, a work queue is scheduled which then iterates over the table and removes the queues that exceed the memory limits of the namespace. It sets a new flag called INET_FRAG_EVICTED on the evicted queues so the proper counters will get incremented when the queue is forcefully expired.
When the high threshold is reached, no more fragment queues are created until we're below the limit again.
The LRU list is now unused and will be removed in a followup patch.
Joint work with Nikolay Aleksandrov.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
86e93e47 |
| 24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: move evictor calls into frag_find function
First step to move eviction handling into a work queue.
We lose two spots that accounted evicted fragments in MIB counters.
Accounting will b
inet: frag: move evictor calls into frag_find function
First step to move eviction handling into a work queue.
We lose two spots that accounted evicted fragments in MIB counters.
Accounting will be restored since the upcoming work-queue evictor invokes the frag queue timer callbacks instead.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
fb3cfe6e |
| 24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: remove hash size assumptions from callers
hide actual hash size from individual users: The _find function will now fold the given hash value into the required range.
Signed-off-by: Flor
inet: frag: remove hash size assumptions from callers
hide actual hash size from individual users: The _find function will now fold the given hash value into the required range.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
36c77782 |
| 24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: constify match, hashfn and constructor arguments
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
Revision tags: v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3, v3.16-rc2, v3.16-rc1, v3.15, v3.15-rc8, v3.15-rc7, v3.15-rc6, v3.15-rc5, v3.15-rc4, v3.15-rc3, v3.15-rc2, v3.15-rc1, v3.14, v3.14-rc8, v3.14-rc7, v3.14-rc6, v3.14-rc5, v3.14-rc4, v3.14-rc3, v3.14-rc2, v3.14-rc1, v3.13, v3.13-rc8, v3.13-rc7, v3.13-rc6, v3.13-rc5, v3.13-rc4, v3.13-rc3, v3.13-rc2, v3.13-rc1, v3.12, v3.12-rc7 |
|
#
b1190570 |
| 23-Oct-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: split inet6_hash_frag for netfilter and initialize secrets with net_get_random_once
Defer the fragmentation hash secret initialization for IPv6 like the previous patch did for IPv4.
Because t
ipv6: split inet6_hash_frag for netfilter and initialize secrets with net_get_random_once
Defer the fragmentation hash secret initialization for IPv6 like the previous patch did for IPv4.
Because the netfilter logic reuses the hash secret we have to split it first. Thus introduce a new nf_hash_frag function which takes care to seed the hash secret.
Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.12-rc6, v3.12-rc5, v3.12-rc4, v3.12-rc3, v3.12-rc2, v3.12-rc1, v3.11, v3.11-rc7, v3.11-rc6 |
|
#
f46078cf |
| 16-Aug-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: drop packets with multiple fragmentation headers
It is not allowed for an ipv6 packet to contain multiple fragmentation headers. So discard packets which were already reassembled by fragmentat
ipv6: drop packets with multiple fragmentation headers
It is not allowed for an ipv6 packet to contain multiple fragmentation headers. So discard packets which were already reassembled by fragmentation logic and send back a parameter problem icmp.
The updates for RFC 6980 will come in later, I have to do a bit more research here.
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.11-rc5, v3.11-rc4, v3.11-rc3, v3.11-rc2, v3.11-rc1, v3.10, v3.10-rc7, v3.10-rc6, v3.10-rc5, v3.10-rc4, v3.10-rc3, v3.10-rc2, v3.10-rc1, v3.9, v3.9-rc8 |
|
#
97599dc7 |
| 16-Apr-2013 |
Eric Dumazet <edumazet@google.com> |
net: drop dst before queueing fragments
Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path) added a bug in IP defragmentation handling, as non refcounted dst could escape an RCU prot
net: drop dst before queueing fragments
Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path) added a bug in IP defragmentation handling, as non refcounted dst could escape an RCU protected section.
Commit 64f3b9e203bd068 (net: ip_expire() must revalidate route) fixed the case of timeouts, but not the general problem.
Tom Parkin noticed crashes in UDP stack and provided a patch, but further analysis permitted us to pinpoint the root cause.
Before queueing a packet into a frag list, we must drop its dst, as this dst has limited lifetime (RCU protected)
When/if a packet is finally reassembled, we use the dst of the very last skb, still protected by RCU and valid, as the dst of the reassembled packet.
Use same logic in IPv6, as there is no need to hold dst references.
Reported-by: Tom Parkin <tparkin@katalix.com> Tested-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.9-rc7, v3.9-rc6, v3.9-rc5, v3.9-rc4 |
|
#
eec2e618 |
| 22-Mar-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling
Hello!
After patch 1 got accepted to net-next I will also send a patch to netfilter-devel to make the corresponding chan
ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling
Hello!
After patch 1 got accepted to net-next I will also send a patch to netfilter-devel to make the corresponding changes to the netfilter reassembly logic.
Thanks,
Hannes
-- >8 -- [PATCH 2/2] ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling
This patch also ensures that INET_ECN_CE is propagated if one fragment had the codepoint set.
Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.9-rc3 |
|
#
5a3da1fe |
| 15-Mar-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
inet: limit length of fragment queue hash table bucket lists
This patch introduces a constant limit of the fragment queue hash table bucket list lengths. Currently the limit 128 is choosen somewhat
inet: limit length of fragment queue hash table bucket lists
This patch introduces a constant limit of the fragment queue hash table bucket list lengths. Currently the limit 128 is choosen somewhat arbitrary and just ensures that we can fill up the fragment cache with empty packets up to the default ip_frag_high_thresh limits. It should just protect from list iteration eating considerable amounts of cpu.
If we reach the maximum length in one hash bucket a warning is printed. This is implemented on the caller side of inet_frag_find to distinguish between the different users of inet_fragment.c.
I dropped the out of memory warning in the ipv4 fragment lookup path, because we already get a warning by the slab allocator.
Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.9-rc2, v3.9-rc1, v3.8 |
|
#
8064b3cf |
| 18-Feb-2013 |
Eric Dumazet <edumazet@google.com> |
ipv6: fix a sparse warning
net/ipv6/reassembly.c:82:72: warning: incorrect type in argument 3 (different base types) net/ipv6/reassembly.c:82:72: expected unsigned int [unsigned] [usertype] c net
ipv6: fix a sparse warning
net/ipv6/reassembly.c:82:72: warning: incorrect type in argument 3 (different base types) net/ipv6/reassembly.c:82:72: expected unsigned int [unsigned] [usertype] c net/ipv6/reassembly.c:82:72: got restricted __be32 [usertype] id
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
279e9f2f |
| 15-Feb-2013 |
Eric Dumazet <edumazet@google.com> |
ipv6: optimize inet6_hash_frag()
Use ipv6_addr_hash() and a single jhash invocation.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
14bbd6a5 |
| 14-Feb-2013 |
Pravin B Shelar <pshelar@nicira.com> |
net: Add skb_unclone() helper function.
This function will be used in next GRE_GSO patch. This patch does not change any functionality.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by:
net: Add skb_unclone() helper function.
This function will be used in next GRE_GSO patch. This patch does not change any functionality.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com>
show more ...
|
Revision tags: v3.8-rc7, v3.8-rc6 |
|
#
3ef0eb0d |
| 28-Jan-2013 |
Jesper Dangaard Brouer <brouer@redhat.com> |
net: frag, move LRU list maintenance outside of rwlock
Updating the fragmentation queues LRU (Least-Recently-Used) list, required taking the hash writer lock. However, the LRU list isn't tied to th
net: frag, move LRU list maintenance outside of rwlock
Updating the fragmentation queues LRU (Least-Recently-Used) list, required taking the hash writer lock. However, the LRU list isn't tied to the hash at all, so we can use a separate lock for it.
Original-idea-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d433673e |
| 28-Jan-2013 |
Jesper Dangaard Brouer <brouer@redhat.com> |
net: frag helper functions for mem limit tracking
This change is primarily a preparation to ease the extension of memory limit tracking.
The change does reduce the number atomic operation, during f
net: frag helper functions for mem limit tracking
This change is primarily a preparation to ease the extension of memory limit tracking.
The change does reduce the number atomic operation, during freeing of a frag queue. This does introduce a some performance improvement, as these atomic operations are at the core of the performance problems seen on NUMA systems.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.8-rc5, v3.8-rc4, v3.8-rc3, v3.8-rc2, v3.8-rc1, v3.7, v3.7-rc8, v3.7-rc7, v3.7-rc6 |
|
#
464dc801 |
| 15-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Don't export sysctls to unprivileged users
In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow t
net: Don't export sysctls to unprivileged users
In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow them to unprivileged users.
This makes it safe for unprivileged users in general to access per net sysctls, and allows sysctls to be exported to unprivileged users on an individual basis as they are deemed safe.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.7-rc5, v3.7-rc4, v3.7-rc3, v3.7-rc2, v3.7-rc1, v3.6, v3.6-rc7 |
|
#
6b102865 |
| 18-Sep-2012 |
Amerigo Wang <amwang@redhat.com> |
ipv6: unify fragment thresh handling code
Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@r
ipv6: unify fragment thresh handling code
Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d4915c08 |
| 18-Sep-2012 |
Amerigo Wang <amwang@redhat.com> |
ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline
Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: David Miller <davem@davemloft.net> Signed-off-by
ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline
Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b836c99f |
| 18-Sep-2012 |
Amerigo Wang <amwang@redhat.com> |
ipv6: unify conntrack reassembly expire code with standard one
Two years ago, Shan Wei tried to fix this: http://patchwork.ozlabs.org/patch/43905/
The problem is that RFC2460 requires an ICMP Time
ipv6: unify conntrack reassembly expire code with standard one
Two years ago, Shan Wei tried to fix this: http://patchwork.ozlabs.org/patch/43905/
The problem is that RFC2460 requires an ICMP Time Exceeded -- Fragment Reassembly Time Exceeded message should be sent to the source of that fragment, if the defragmentation times out.
" If insufficient fragments are received to complete reassembly of a packet within 60 seconds of the reception of the first-arriving fragment of that packet, reassembly of that packet must be abandoned and all the fragments that have been received for that packet must be discarded. If the first fragment (i.e., the one with a Fragment Offset of zero) has been received, an ICMP Time Exceeded -- Fragment Reassembly Time Exceeded message should be sent to the source of that fragment. "
As Herbert suggested, we could actually use the standard IPv6 reassembly code which follows RFC2460.
With this patch applied, I can see ICMP Time Exceeded sent from the receiver when the sender sent out 3/4 fragmented IPv6 UDP packet.
Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: David Miller <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: netfilter-devel@vger.kernel.org Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.6-rc6, v3.6-rc5, v3.6-rc4, v3.6-rc3, v3.6-rc2, v3.6-rc1, v3.5, v3.5-rc7, v3.5-rc6, v3.5-rc5, v3.5-rc4, v3.5-rc3, v3.5-rc2, v3.5-rc1, v3.4 |
|
#
ec16439e |
| 18-May-2012 |
Eric Dumazet <edumazet@google.com> |
ipv6: use skb coalescing in reassembly
ip6_frag_reasm() can use skb_try_coalesce() to build optimized skb, reducing memory used by them (truesize), and reducing number of cache line misses and overh
ipv6: use skb coalescing in reassembly
ip6_frag_reasm() can use skb_try_coalesce() to build optimized skb, reducing memory used by them (truesize), and reducing number of cache line misses and overhead for the consumer.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|