#
7b9438de |
| 17-Jan-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: add stream reconf timer
This patch is to add a per transport timer based on sctp timer frame for stream reconf chunk retransmission. It would start after sending a reconf request chunk, and st
sctp: add stream reconf timer
This patch is to add a per transport timer based on sctp timer frame for stream reconf chunk retransmission. It would start after sending a reconf request chunk, and stop after receiving the response chunk.
If the timer expires, besides retransmitting the reconf request chunk, it would also do the same thing with data RTO timer. like to increase the appropriate error counts, and perform threshold management, possibly destroying the asoc if sctp retransmission thresholds are exceeded, just as section 5.1.1 describes.
This patch is also to add asoc strreset_chunk, it is used to save the reconf request chunk, so that it can be retransmitted, and to check if the response is really for this request by comparing the information inside with the response chunk as well.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
cc16f00f |
| 17-Jan-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: add support for generating stream reconf ssn reset request chunk
This patch is to add asoc strreset_outseq and strreset_inseq for saving the reconf request sequence, initialize them when creat
sctp: add support for generating stream reconf ssn reset request chunk
This patch is to add asoc strreset_outseq and strreset_inseq for saving the reconf request sequence, initialize them when create assoc and process init, and also to define Incoming and Outgoing SSN Reset Request Parameter described in rfc6525 section 4.1 and 4.2, As they can be in one same chunk as section rfc6525 3.1-3 describes, it makes them in one function.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a8386317 |
| 06-Jan-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: prepare asoc stream for stream reconf
sctp stream reconf, described in RFC 6525, needs a structure to save per stream information in assoc, like stream state.
In the future, sctp stream sched
sctp: prepare asoc stream for stream reconf
sctp stream reconf, described in RFC 6525, needs a structure to save per stream information in assoc, like stream state.
In the future, sctp stream scheduler also needs it to save some stream scheduler params and queues.
This patchset is to prepare the stream array in assoc for stream reconf. It defines sctp_stream that includes stream arrays inside to replace ssnmap.
Note that we use different structures for IN and OUT streams, as the members in per OUT stream will get more and more different from per IN stream.
v1->v2: - put these patches into a smaller group. v2->v3: - define sctp_stream to contain stream arrays, and create stream.c to put stream-related functions. - merge 3 patches into 1, as new sctp_stream has the same name with before.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
66b91d2c |
| 28-Dec-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: remove return value from sctp_packet_init/config
There is no reason to use this cascading. It doesn't add anything. Let's remove it and simplify.
Signed-off-by: Marcelo Ricardo Leitner <marce
sctp: remove return value from sctp_packet_init/config
There is no reason to use this cascading. It doesn't add anything. Let's remove it and simplify.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.9, openbmc-4.4-20161121-1, v4.4.33 |
|
#
7fda702f |
| 15-Nov-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: use new rhlist interface on sctp transport rhashtable
Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key to hash a node to one chain. If in one host thousands of assocs co
sctp: use new rhlist interface on sctp transport rhashtable
Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key to hash a node to one chain. If in one host thousands of assocs connect to one server with the same lport and different laddrs (although it's not a normal case), all the transports would be hashed into the same chain.
It may cause to keep returning -EBUSY when inserting a new node, as the chain is too long and sctp inserts a transport node in a loop, which could even lead to system hangs there.
The new rhlist interface works for this case that there are many nodes with the same key in one chain. It puts them into a list then makes this list be as a node of the chain.
This patch is to replace rhashtable_ interface with rhltable_ interface. Since a chain would not be too long and it would not return -EBUSY with this fix when inserting a node, the reinsert loop is also removed here.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.4.32, v4.4.31, v4.4.30, v4.4.29, v4.4.28, v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25 |
|
#
8ae808eb |
| 07-Oct-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: remove the old ttl expires policy
The prsctp polices include ttl expires policy already, we should remove the old ttl expires codes, and just adjust the new polices' codes to be compatible wit
sctp: remove the old ttl expires policy
The prsctp polices include ttl expires policy already, we should remove the old ttl expires codes, and just adjust the new polices' codes to be compatible with the old one for users.
This patch is to remove all the old expires codes, and if prsctp polices are not set, it will still set msg's expires_at and check the expires in sctp_check_abandoned.
Note that asoc->prsctp_enable is set by default, so users can't feel any difference even if they use the old expires api in userspace.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
cc6ac9bc |
| 07-Oct-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: reuse sent_count to avoid retransmitted chunks for RTT measurements
Now sctp uses chunk->resent to record if a chunk is retransmitted, for RTT measurements with retransmitted DATA chunks. chun
sctp: reuse sent_count to avoid retransmitted chunks for RTT measurements
Now sctp uses chunk->resent to record if a chunk is retransmitted, for RTT measurements with retransmitted DATA chunks. chunk->sent_count was introduced to record how many times one chunk has been sent for prsctp RTX policy before. We actually can know if one chunk is retransmitted by checking chunk->sent_count is greater than 1.
This patch is to remove resent from sctp_chunk and reuse sent_count to avoid retransmitted chunks for RTT measurements.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6 |
|
#
0605483f |
| 28-Sep-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: remove prsctp_param from sctp_chunk
Now sctp uses chunk->prsctp_param to save the prsctp param for all the prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk. We can just u
sctp: remove prsctp_param from sctp_chunk
Now sctp uses chunk->prsctp_param to save the prsctp param for all the prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk. We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices, and reuse msg->expires_at for TTL policy, as the prsctp polices and old expires policy are mutual exclusive.
This patch is to remove prsctp_param from sctp_chunk, and reuse msg's expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF polices.
Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy, as it needs a u64 variables to save the expires_at time.
This one also fixes the "netperf-Throughput_Mbps -37.2% regression" issue.
Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
73dca124 |
| 28-Sep-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: move sent_count to the memory hole in sctp_chunk
Now pahole sctp_chunk, it has 2 memory holes: struct sctp_chunk { struct list_head list; atomic_t refcnt; /*
sctp: move sent_count to the memory hole in sctp_chunk
Now pahole sctp_chunk, it has 2 memory holes: struct sctp_chunk { struct list_head list; atomic_t refcnt; /* XXX 4 bytes hole, try to pack */ ... long unsigned int prsctp_param; int sent_count; /* XXX 4 bytes hole, try to pack */
This patch is to move up sent_count to fill the 1st one and eliminate the 2nd one.
It's not just another struct compaction, it also fixes the "netperf- Throughput_Mbps -37.2% regression" issue when overloading the CPU.
Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.7.5, v4.4.22, v4.4.21, v4.7.4 |
|
#
83dbc3d4 |
| 13-Sep-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: make sctp_outq_flush/tail/uncork return void
sctp_outq_flush return value is meaningless now, this patch is to make sctp_outq_flush return void, as well as sctp_outq_fail and sctp_outq_uncork.
sctp: make sctp_outq_flush/tail/uncork return void
sctp_outq_flush return value is meaningless now, this patch is to make sctp_outq_flush return void, as well as sctp_outq_fail and sctp_outq_uncork.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b61c654f |
| 13-Sep-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: free msg->chunks when sctp_primitive_SEND return err
Last patch "sctp: do not return the transmit err back to sctp_sendmsg" made sctp_primitive_SEND return err only when asoc state is unavaila
sctp: free msg->chunks when sctp_primitive_SEND return err
Last patch "sctp: do not return the transmit err back to sctp_sendmsg" made sctp_primitive_SEND return err only when asoc state is unavailable. In this case, chunks are not enqueued, they have no chance to be freed if we don't take care of them later.
This Patch is actually to revert commit 1cd4d5c4326a ("sctp: remove the unused sctp_datamsg_free()"), commit 69b5777f2e57 ("sctp: hold the chunks only after the chunk is enqueued in outq") and commit 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg"), to use sctp_datamsg_free to free the chunks of current msg.
Fixes: 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18, v4.4.17, openbmc-4.4-20160804-1, v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1 |
|
#
e7487c86 |
| 13-Jul-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: avoid identifying address family many times for a chunk
Identifying address family operations during rx path is not something expensive but it's ugly to the eye to have it done multiple times,
sctp: avoid identifying address family many times for a chunk
Identifying address family operations during rx path is not something expensive but it's ugly to the eye to have it done multiple times, specially when we already validated it during initial rx processing.
This patch takes advantage of the now shared sctp_input_cb and make the pointer to the operations readily available.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
1f45f78f |
| 13-Jul-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: allow GSO frags to access the chunk too
SCTP will try to access original IP headers on sctp_recvmsg in order to copy the addresses used. There are also other places that do similar access to I
sctp: allow GSO frags to access the chunk too
SCTP will try to access original IP headers on sctp_recvmsg in order to copy the addresses used. There are also other places that do similar access to IP or even SCTP headers. But after 90017accff61 ("sctp: Add GSO support") they aren't always there because they are only present in the header skb.
SCTP handles the queueing of incoming data by cloning the incoming skb and limiting to only the relevant payload. This clone has its cb updated to something different and it's then queued on socket rx queue. Thus we need to fix this in two moments.
For rx path, not related to socket queue yet, this patch uses a partially copied sctp_input_cb to such GSO frags. This restores the ability to access the headers for this part of the code.
Regarding the socket rx queue, it removes iif member from sctp_event and also add a chunk pointer on it.
With these changes we're always able to reach the headers again.
The biggest change here is that now the sctp_chunk struct and the original skb are only freed after the application consumed the buffer. Note however that the original payload was already like this due to the skb cloning.
For iif, SCTP's IPv4 code doesn't use it, so no change is necessary. IPv6 now can fetch it directly from original's IPv6 CB as the original skb is still accessible.
In the future we probably can simplify sctp_v*_skb_iif() stuff, as sctp_v4_skb_iif() was called but it's return value not used, and now it's not even called, but such cleanup is out of scope for this change.
Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9e238323 |
| 13-Jul-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: allow others to use sctp_input_cb
We process input path in other files too and having access to it is nice, so move it to a header where it's shared.
Signed-off-by: Marcelo Ricardo Leitner <m
sctp: allow others to use sctp_input_cb
We process input path in other files too and having access to it is nice, so move it to a header where it's shared.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: openbmc-20160713-1, v4.4.15, v4.6.4 |
|
#
8dbdf1f5 |
| 09-Jul-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: implement prsctp PRIO policy
prsctp PRIO policy is a policy to abandon lower priority chunks when asoc doesn't have enough snd buffer, so that the current chunk with higher priority can be que
sctp: implement prsctp PRIO policy
prsctp PRIO policy is a policy to abandon lower priority chunks when asoc doesn't have enough snd buffer, so that the current chunk with higher priority can be queued successfully.
Similar to TTL/RTX policy, we will set the priority of the chunk to prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy(). So if PRIO policy is enabled, msg->expire_at won't work.
asoc->sent_cnt_removable will record how many chunks can be checked to remove. If priority policy is enabled, when the chunk is queued into the out_queue, we will increase sent_cnt_removable. When the chunk is moved to abandon_queue or dequeue and free, we will decrease sent_cnt_removable.
In sctp_sendmsg, we will check if there is enough snd buffer for current msg and if sent_cnt_removable is not 0. Then try to abandon chunks in sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and free chunks from out_queue in right order until the abandon+free size > msg_len - sctp_wfree. For the abandon size, we have to wait until it sends FORWARD TSN, receives the sack and the chunks are really freed.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a6c2f792 |
| 09-Jul-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: implement prsctp TTL policy
prsctp TTL policy is a policy to abandon chunks when they expire at the specific time in local stack. It's similar with expires_at in struct sctp_datamsg.
This pat
sctp: implement prsctp TTL policy
prsctp TTL policy is a policy to abandon chunks when they expire at the specific time in local stack. It's similar with expires_at in struct sctp_datamsg.
This patch uses sinfo->sinfo_timetolive to set the specific time for TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at. So if prsctp_enable or TTL policy is not enabled, msg->expires_at still works as before.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
826d253d |
| 09-Jul-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt
This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used to dump the prsctp statistics info from the asoc. The prsctp statistics includes ab
sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt
This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used to dump the prsctp statistics info from the asoc. The prsctp statistics includes abandoned_sent/unsent from the asoc. abandoned_sent is the count of the packets we drop packets from retransmit/transmited queue, and abandoned_unsent is the count of the packets we drop from out_queue according to the policy.
Note: another option for prsctp statistics dump described in rfc is SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics info from each stream. But by now, linux doesn't yet have per stream statistics info, it needs rfc6525 to be implemented. As the prsctp statistics for each stream has to be based on per stream statistics, we will delay it until rfc6525 is done in linux.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
28aa4c26 |
| 09-Jul-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_PR_SUPPORTED on sctp sockopt
According to section 4.5 of rfc7496, prsctp_enable should be per asoc. We will add prsctp_enable to both asoc and ep, and replace the places where it used
sctp: add SCTP_PR_SUPPORTED on sctp sockopt
According to section 4.5 of rfc7496, prsctp_enable should be per asoc. We will add prsctp_enable to both asoc and ep, and replace the places where it used net.sctp->prsctp_enable with asoc->prsctp_enable.
ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can also modify it's value through sockopt SCTP_PR_SUPPORTED.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.6.3, v4.4.14, v4.6.2, v4.4.13, openbmc-20160606-1 |
|
#
90017acc |
| 02-Jun-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: Add GSO support
SCTP has this pecualiarity that its packets cannot be just segmented to (P)MTU. Its chunks must be contained in IP segments, padding respected. So we can't just generate a big
sctp: Add GSO support
SCTP has this pecualiarity that its packets cannot be just segmented to (P)MTU. Its chunks must be contained in IP segments, padding respected. So we can't just generate a big skb, set gso_size to the fragmentation point and deliver it to IP layer.
This patch takes a different approach. SCTP will now build a skb as it would be if it was received using GRO. That is, there will be a cover skb with protocol headers and children ones containing the actual segments, already segmented to a way that respects SCTP RFCs.
With that, we can tell skb_segment() to just split based on frag_list, trusting its sizes are already in accordance.
This way SCTP can benefit from GSO and instead of passing several packets through the stack, it can pass a single large packet.
v2: - Added support for receiving GSO frames, as requested by Dave Miller. - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP) - Added heuristics similar to what we have in TCP for not generating single GSO packets that fills cwnd. v3: - consider sctphdr size in skb_gso_transport_seglen() - rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for unsupported GSO")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.6.1, v4.4.12, openbmc-20160521-1, v4.4.11, openbmc-20160518-1, v4.6, v4.4.10, openbmc-20160511-1, openbmc-20160505-1, v4.4.9 |
|
#
0970f5b3 |
| 29-Apr-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: signal sk_data_ready earlier on data chunks reception
Dave Miller pointed out that fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible") may insert latency specially if the
sctp: signal sk_data_ready earlier on data chunks reception
Dave Miller pointed out that fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible") may insert latency specially if the receiving application is running on another CPU and that it would be better if we signalled as early as possible.
This patch thus basically inverts the logic on fb586f25300f and signals it as early as possible, similar to what we had before.
Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible") Reported-by: Dave Miller <davem@davemloft.net> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.4.8, v4.4.7 |
|
#
fb586f25 |
| 08-Apr-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: delay calls to sk_data_ready() as much as possible
Currently processing of multiple chunks in a single SCTP packet leads to multiple calls to sk_data_ready, causing multiple wake up signals wh
sctp: delay calls to sk_data_ready() as much as possible
Currently processing of multiple chunks in a single SCTP packet leads to multiple calls to sk_data_ready, causing multiple wake up signals which are costy and doesn't make it wake up any faster.
With this patch it will note that the wake up is pending and will do it before leaving the state machine interpreter, latest place possible to do it realiably and cleanly.
Note that sk_data_ready events are not dependent on asocs, unlike waking up writers.
v2: series re-checked v3: use local vars to cleanup the code, suggested by Jakub Sitnicki Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
250eb1f8 |
| 08-Apr-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: compress bit-wide flags to a bitfield on sctp_sock
It wastes space and gets worse as we add new flags, so convert bit-wide flags to a bitfield.
Currently it already saves 4 bytes in sctp_sock
sctp: compress bit-wide flags to a bitfield on sctp_sock
It wastes space and gets worse as we add new flags, so convert bit-wide flags to a bitfield.
Currently it already saves 4 bytes in sctp_sock, which are left as holes in it for now. The whole struct needs packing, which should be done in another patch.
Note that do_auto_asconf cannot be merged, as explained in the comment before it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ba6f5e33 |
| 06-Apr-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: avoid refreshing heartbeat timer too often
Currently on high rate SCTP streams the heartbeat timer refresh can consume quite a lot of resources as timer updates are costly and it contains a ra
sctp: avoid refreshing heartbeat timer too often
Currently on high rate SCTP streams the heartbeat timer refresh can consume quite a lot of resources as timer updates are costly and it contains a random factor, which a) is also costly and b) invalidates mod_timer() optimization for not editing a timer to the same value. It may even cause the timer to be slightly advanced, for no good reason.
As suggested by David Laight this patch now removes this timer update from hot path by leaving the timer on and re-evaluating upon its expiration if the heartbeat is still needed or not, similarly to what is done for TCP. If it's not needed anymore the timer is re-scheduled to the new timeout, considering the time already elapsed.
For this, we now record the last tx timestamp per transport, updated in the same spots as hb timer was restarted on tx. Also split up sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the heartbeat one.
On loopback with MTU of 65535 and data chunks with 1636, so that we have a considerable amount of chunks without stressing system calls, netperf -t SCTP_STREAM -l 30, perf looked like this before:
Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000 Overhead Command Shared Object Symbol + 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore - _raw_write_unlock_irqrestore - 96,54% _raw_spin_unlock_irqrestore - 36,14% mod_timer + 97,24% sctp_transport_reset_timers + 2,76% sctp_do_sm + 33,65% __wake_up_sync_key + 28,77% sctp_ulpq_tail_event + 1,40% del_timer - 1,84% mod_timer + 99,03% sctp_transport_reset_timers + 0,97% sctp_do_sm + 1,50% sctp_ulpq_tail_event
And after this patch, now with netperf -l 60:
Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000 Overhead Command Shared Object Symbol + 5,65% netperf [kernel.vmlinux] [k] memcpy_erms + 5,59% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string - 5,05% netperf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore - _raw_spin_unlock_irqrestore + 49,89% __wake_up_sync_key + 45,68% sctp_ulpq_tail_event - 2,85% mod_timer + 76,51% sctp_transport_reset_t3_rtx + 23,49% sctp_do_sm + 1,55% del_timer + 2,50% netperf [sctp] [k] sctp_datamsg_from_user + 2,26% netperf [sctp] [k] sctp_sendmsg
Throughput-wise, from 6800mbps without the patch to 7050mbps with it, ~3.7%.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: openbmc-20160329-2, openbmc-20160329-1, openbmc-20160321-1, v4.4.6, v4.5 |
|
#
cea8768f |
| 10-Mar-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: allow sctp_transmit_packet and others to use gfp
Currently sctp_sendmsg() triggers some calls that will allocate memory with GFP_ATOMIC even when not necessary. In the case of sctp_packet_tran
sctp: allow sctp_transmit_packet and others to use gfp
Currently sctp_sendmsg() triggers some calls that will allocate memory with GFP_ATOMIC even when not necessary. In the case of sctp_packet_transmit it will allocate a linear skb that will be used to construct the packet and this may cause sends to fail due to ENOMEM more often than anticipated specially with big MTUs.
This patch thus allows it to inherit gfp flags from upper calls so that it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or similar. All others, like retransmits or flushes started from BH, are still allocated using GFP_ATOMIC.
In netperf tests this didn't result in any performance drawbacks when memory is not too fragmented and made it trigger ENOMEM way less often.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.4.5 |
|
#
133800d1 |
| 08-Mar-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: fix copying more bytes than expected in sctp_add_bind_addr
Dmitry reported that sctp_add_bind_addr may read more bytes than expected in case the parameter is a IPv4 addr supplied by the user t
sctp: fix copying more bytes than expected in sctp_add_bind_addr
Dmitry reported that sctp_add_bind_addr may read more bytes than expected in case the parameter is a IPv4 addr supplied by the user through calls such as sctp_bindx_add(), because it always copies sizeof(union sctp_addr) while the buffer may be just a struct sockaddr_in, which is smaller.
This patch then fixes it by limiting the memcpy to the min between the union size and a (new parameter) provided addr size. Where possible this parameter still is the size of that union, except for reading from user-provided buffers, which then it accounts for protocol type.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|