What are good UDP timeout and retry values?

The minimum timeout should be the path latency, or half the Round-Trip-Time (RTT).

See RFC 908 — Reliable Data Protocol.

The big question is deciding what happens after one timeout, do you reset to the same timeout or do you double up? This is a complicated decision based on the size on the frequency of the communication and how fair you wish to play with others.

If you are finding packets are frequently lost and latency is a concern then you want to look at either keeping the same timeout or having a slow ramp up to exponential timeouts, e.g. 1x, 1x, 1x, 1x, 2x, 4x, 8x, 16x, 32x.

If bandwidth isn't much of a concern but latency really is, then follow UDP-based Data Transfer Protocol (UDT) and force the data through with low timeouts and redundant delivery. This is useful for WAN environments, especially intercontinental distances and why UDT is frequently found within WAN accelerators.

More likely latency isn't that much of a concern and fairness to other protocols is preferred, then use a standard back-off pattern, 1x, 2x, 4x, 8x, 16x, 32x.

Ideally the implementation of the protocol handling should be advanced to automatically derive the optimum timeout and retry periods. When there is no data loss you do not need redundant delivery, when there is data loss you need to increase delivery. For timeouts you may wish to consider reducing the timeout in optimum conditions then slowing down when congestion occurs to prevent synonymous broadcast storms.


This is similar to question 5227520. Googling "tcp retries" and "tcp retransmission" leads to lots of suggestions that have been tried over the years. Unfortunately, no single solution appears optimum.

I'd choose T to start at 2 or 3 seconds. My increase X would be half of T (doubling T seems popular, but you quickly get long timeouts). I'd adjust R on the fly to be at least 5 and more if necessary so my total timeout is at least a minute or two.

I'd be careful not to leave R and T too high if subsequent transactions are usually quicker; you might want to lower R and T as your stats allow so you can retry and get a quick response instead of leaving R and T at their max (especially if your clients are human and you want to be responsive).

Keep in mind: you're never going to be as reliable as an algorithm that retries more than you, if those retries succeed. On the other hand, if your server is always available and always "responds essentially instantly" then if the client fails to see a response it's a failure out of your server's control and the only thing that can be done is for the client to retry (although a retry can be more than just resending, such as closing/reopening the connection, trying a backup server at a different IP, etc).