| Linode Forum https://forum.linode.com/ |
|
| poor performance on TCP/IP socket (Ubuntu Lucid) on Linode https://forum.linode.com/viewtopic.php?f=19&t=6151 |
Page 1 of 1 |
| Author: | icanbob [ Wed Oct 20, 2010 3:15 pm ] |
| Post subject: | poor performance on TCP/IP socket (Ubuntu Lucid) on Linode |
I'm running a custom TCP/IP socket application which listens on port 8000 by default. The app works but I'm getting poor throughput on that socket (~650ms for 8k transfer). I'm not seeing any other obvious performance issues with my Linode. eg. Ubuntu repository access and file download. Is there some throttling that Ubuntu or the Linode imposes that I'm not aware of? Thanks in advance for your response. bob |
|
| Author: | hybinet [ Wed Oct 20, 2010 5:19 pm ] |
| Post subject: | |
It's most probably your app... Does it work better on a different computer? |
|
| Author: | jed [ Wed Oct 20, 2010 5:44 pm ] |
| Post subject: | Re: poor performance on TCP/IP socket (Ubuntu Lucid) on Lino |
icanbob wrote: Is there some throttling that Ubuntu or the Linode imposes that I'm not aware of?
None that you're hitting, by those numbers. There are a couple of safety valves in place, but it takes work to hit them. |
|
| Author: | icanbob [ Wed Oct 20, 2010 9:00 pm ] |
| Post subject: | |
hybinet wrote: It's most probably your app... Does it work better on a different computer?
This is a mature app that has been used in networking situations for 10 years. To give a feel for how far off these numbers are, the exact same app running against the Linode TCP/IP stack (ie. connecting to localhost) yields a result of 30 usec. Local messaging without using the TCP/IP stack on same Linode gives 8usec for 8k messages. Connecting across the Net to the Linode one would expect performance lower than the localhost, possibly by an order of magnitude. My results are showing 4 orders of magnitude slower than the local Linode TCP/IP stack. bob |
|
| Author: | vonskippy [ Wed Oct 20, 2010 9:39 pm ] |
| Post subject: | |
icanbob wrote: My results are showing 4 orders of magnitude slower than the local Linode TCP/IP stack.
Um...no. An "order of magnitude" is TEN TIMES the compared number. http://en.wikipedia.org/wiki/Order_of_magnitude And how do you measure 8/1,000,000 seconds or 30/1,000,000 seconds with any type of accuracy on a VPS system? |
|
| Author: | hybinet [ Wed Oct 20, 2010 9:43 pm ] |
| Post subject: | |
vonskippy wrote: Um...no.
The OP is right about four orders of magnitude. 650 milliseconds is approximately 21,600 times 30 microseconds. OP: Where are you accessing this socket from? What is the latency between the two locations? Or is this on the same machine, just using the public interface instead of localhost? If the difference between localhost and the public interface is that big, you should probably open a ticket. |
|
| Author: | vonskippy [ Wed Oct 20, 2010 10:18 pm ] |
| Post subject: | |
Oopsie - I thought he was saying 30 usec was 4 orders of magnitude greater then 8 usec. |
|
| Author: | hoopycat [ Thu Oct 21, 2010 6:06 am ] |
| Post subject: | |
Sure it's not just because there's a lot of distance in the middle? I could see ~600 ms being a minimum amount of time for a full TCP session, once handshake, slow start, and shutdown are accounted for. |
|
| Author: | hoopycat [ Thu Oct 21, 2010 7:38 am ] |
| Post subject: | |
Thinking about this a bit more, it would be very interesting to see tcpdump output from both the server and client end (preferably with synchronized clocks)... assuming both are quiet, tcpdump -n -i eth0 port not 22 That will hopefully pinpoint exactly where the delay is coming in. |
|
| Author: | icanbob [ Thu Oct 21, 2010 9:50 am ] |
| Post subject: | |
hoopycat wrote: Thinking about this a bit more, it would be very interesting to see tcpdump output from both the server and client end (preferably with synchronized clocks)... assuming both are quiet,
tcpdump -n -i eth0 port not 22 That will hopefully pinpoint exactly where the delay is coming in. My client side is an older MEPIS system and it appears that only libpcap is installed. I did install tcpdump on the Linode and reran a test with exchange of 10 8K messages. The resulting Linode side tcpdump file is at here (wget www.icanprogram.com/downloads/dump.gz). I am not experienced in analysing this tcpdump output, so any help is appreciated. Interestingly, the 10 8k messages took an average of 185msec round trip today (which is a little better than the 650msec I was seeing yesterday but still very slow compared to the Linode loopback test). Interestingly if I use the domain name assigned to the Linode to setup the socket the time goes up by a factor of 2, even though I'm timing the transaction from after the socket is opened (at least the ftime() call is after the socket opening section in the code). Thanks in advance for all your help. bob |
|
| Author: | Guspaz [ Thu Oct 21, 2010 10:48 am ] |
| Post subject: | |
If your test works like this: 1) Establish a connection 2) Send 8KB of data 3) Close the connection 4) Repeat then you're going to get hundreds of milliseconds of latency, yes. You're effectively using a connection-oriented protocol to do the job of a datagram-oriented protocol. There are various reliable datagram protocols available, some of them even built into the Linux kernel (like RDS). Here are the round-trips that would be required to request a server to send a small amount of data to a client. I'll assume 50ms of latency in each direction (100ms round-trip): 1) Client host sends SYN packet to remote host. 50ms pass. 2) Server receives SYN and replies with SYN/ACK. 50ms pass. 3) Client receives SYN/ACK and sends ACK. 50ms pass. 4) Server receives ACK. What is called the "three-way handshake" is complete. The server starts sending data. A minimum of 50ms pass for the first data to reach the client. 5) The client sends one or more ACK packets for the data that it received. 50ms pass. 6) The server receives the final ACK, and knowing that all data has been sent, sends a FIN packet to terminate the connection. 50ms pass. 7) The client receives the FIN packet, and sends an ACK to acknowledge it, and sends its own FIN packet to close the local side of the connection. 50ms pass. 8 ) The server receives the ACK to its FIN, and receives the client's FIN request. The server sends its own ACK in reply. 50ms pass. 9) The client receives the server's ACK. The connection is now closed, and the client can consider the connection properly terminated. The total time required for this procedure is 8x50ms, or 400ms. This is because TCP is designed for long-lived connections, not short datagram bursts. Technically, in the above exchange, the client knows that it has all data after only 50ms, and can actually go and use the data, even though it won't know if the server has closed its side of the connection for another 50ms. Here is how an unreliable (in that there is no delivery guarantee) UDP exchange would work (again, 50ms latency each way, 100ms round-trip): 1) Client sends a datagram requesting data. 50ms pass. 2) Server sends data in reply, with the data tagged with some sort of EOF to let the client know there's no more data. 50ms pass. 3) Client receives data. Total elapsed time is 100ms, much lower than the 400ms that TCP required. It's possible to do reliable UDP with no increase in latency over normal UDP (so long as there is no packetloss). You never actually mentioned what the round-trip latency between your two hosts is, hence why I picked the 50ms of latency in each way. But it's important to understand that, because you're (mis)using TCP, there is not one round-trip involved here. There are multiple round-trips required to establish/use/tear down the TCP session. So, the difference you're seeing between localhost (where the latency for a round-trip is near-zero) and the internet (where the round-trip latency is immensely higher) makes perfect sense. That's how networking works. |
|
| Author: | hoopycat [ Thu Oct 21, 2010 11:45 am ] |
| Post subject: | |
Relative to 13:31:40.548003, here's how the times look: 13:31:40.548003 (0.000000): SYN received 13:31:40.548093 (0.000090): SYN+ACK sent 13:31:40.660762 (0.112759): ACK received 13:31:40.660987 (0.112984): Payload 1 received (168 bytes) 13:31:40.661060 (0.113057): Payload 1 ack sent 13:31:40.661828 (0.113825): Payload 2 sent (40 bytes) 13:31:40.758238 (0.210235): Payload 2 ack received I gotta run, so I'm not going to compute beyond there, but it's taking about ~100ms for each round-trip and we're already 210ms into it with only 208 payload bytes exchanged. The window will eventually expand such that it won't wait for an ack before sending more payload, but it will start off slow to avoid overdoing it. Consider moving closer to your Linode, or use UDP instead of TCP, or keep the connections up longer |
|
| Author: | icanbob [ Thu Oct 21, 2010 12:00 pm ] |
| Post subject: | |
Guspaz wrote: If your test works like this:
1) Establish a connection 2) Send 8KB of data 3) Close the connection 4) Repeat then you're going to get hundreds of milliseconds of latency, yes. The code is open source and is available from (http://www.icanprogram.com/simpl). The test is performed in the following manner. a) open a TCP/IP socket b) mark the time c) loop N times - write the 8k message from client side - write the 8k response from Linode side d) mark time on loop exit e) close the socket f) report the results Allow me to explain. My test consists of 4 processes (2 on each end of the TCP/IP socket). Two of those processes are SIMPL modules (from simpl/benchmarks/src in source above): sender is on my end; receiver is on the Linode end. SIMPL is a Send/Receive/Reply (QNX style) blocking protocol so the sender is blocked while the message is transported over the TCP/IP socket and the response is transported back. The other two processes are surrogates (simpl/surrogates/tcp/src in source above). The one on my side "pretends" to be the receiver. The one on the Linode side "pretends" to be the sender. The loopback test which exchanges 8k messages in 30usec uses exactly the same 4 processes. The only difference is the socket is opened internally to the Linode node. In the subject test the socket is opened between my desktop and the Linode and by necessity involves many other network bits. However, so does a file transfer from the Ubuntu repository and the Linode. In my test very little data transfer is occurring (8k x 2) in comparison but it appears to take 100's of milliseconds per transaction. I have used SIMPL in many different networked configurations (mostly captive networks) over the past 10 years. To my knowledge this is the first time it has been tried from a Linode. This is the first time I'm seeing this kind of latency. I'd love to understand its source. Thanks again in advance for all your help. bob |
|
| Author: | icanbob [ Thu Oct 21, 2010 12:06 pm ] |
| Post subject: | |
hoopycat wrote: Relative to 13:31:40.548003, here's how the times look:
13:31:40.548003 (0.000000): SYN received 13:31:40.548093 (0.000090): SYN+ACK sent 13:31:40.660762 (0.112759): ACK received 13:31:40.660987 (0.112984): Payload 1 received (168 bytes) 13:31:40.661060 (0.113057): Payload 1 ack sent 13:31:40.661828 (0.113825): Payload 2 sent (40 bytes) 13:31:40.758238 (0.210235): Payload 2 ack received I gotta run, so I'm not going to compute beyond there, but it's taking about ~100ms for each round-trip and we're already 210ms into it with only 208 payload bytes exchanged. The window will eventually expand such that it won't wait for an ack before sending more payload, but it will start off slow to avoid overdoing it. Consider moving closer to your Linode, or use UDP instead of TCP, or keep the connections up longer This sounds like the section of code responsible for setting up the socket and surrogates and not the section inside my timing loop. The tcpdump covers the entire time window including the socket setup and shutdown, not just my timing loop. Those payloads should be 8k in size. Can you help me see those in the tcpdump output? Thanks again in advance for all your help. bob |
|
| Author: | hoopycat [ Thu Oct 21, 2010 1:06 pm ] |
| Post subject: | |
icanbob wrote: Can you help me see those in the tcpdump output?
Most likely, the client->server burst is at 13:31:40.774513 through 13:31:40.939390: Code: 13:31:40.774513 IP 208.83.215.196.4910 > 74.207.236.47.8001: Flags [.], seq 169:1617, ack 41, win 1460, options [nop,nop,TS val 2900805 ecr 771 Lines 1 and 2 are the first hunk (~3k) of data arriving at the Linode, and line 3 is the acknowledgement leaving your Linode. Note the pause between lines 3 and 4: that is the sender waiting to get the acknowledgment from the Linode. If you have ntpd running and the clocks are well-synced on both ends, take a look at tcpdumps on both ends simultaneously: you'll be able to see the delay. For completeness, the server->client burst is at 13:31:40.939764 through 13:31:41.011796: Code: 13:31:40.939764 IP 74.207.236.47.8001 > 208.83.215.196.4910: Flags [.], seq 41:2937, ack 8293, win 758, options [nop,nop,TS val 771000279 ecr 2900970], length 2896 Haven't looked at the code yet... have to get back to the grind for a bit. |
|
| Page 1 of 1 | All times are UTC-04:00 |
| Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |
|