Typical URL lengths for storage calculation purposes (URL-shortener)

I'm not sure what is typical, but of 11,000 urls in our request database, the average length is 62 characters. There are hundreds of urls with several hundred characters. The longest is a Google Translate link at 1689 characters.

top 10 len(producturl):
1689
792
707
693
647
606
574
569
562
560

sample url 647 characters:

http://www.amazon.co.jp/%E9%AD%94%E7%95%8C%E6%88%A6%E8%A8%98%E3%83%87%E3%82%A3%E3%82%B9%E3%82%AC%E3%82%A4%E3%82%A24-%E5%88%9D%E5%9B%9E%E9%99%90%E5%AE%9A%E7%89%88-%E5%A0%95%E5%A4%A9%E4%BD%BF%E3%83%95%E3%83%AD%E3%83%B3-%E3%83%97%E3%83%AD%E3%83%80%E3%82%AF%E3%83%88%E3%82%B3%E3%83%BC%E3%83%89%E4%BB%98%E3%81%8D%E7%89%B9%E8%A3%BD%E3%82%AB%E3%83%BC%E3%83%89-%E3%83%88%E3%83%AC%E3%83%BC%E3%83%87%E3%82%A3%E3%83%B3%E3%82%B0%E3%82%AB%E3%83%BC%E3%83%89%E3%80%8C%E3%83%B4%E3%82%A1%E3%82%A4%E3%82%B9%E3%82%B7%E3%83%A5%E3%83%B4%E3%82%A1%E3%83%AB%E3%83%84%E3%80%8D%E9%99%90%E5%AE%9APR%E3%82%AB%E3%83%BC%E3%83%89%E4%BB%98%E3%81%8D/dp/B0043RT8UO/ref=pd_rhf_p_t_1

P.S. for estimating purposes you should extrapolate from some dataset after applying standard deviation to throw out the outliers which could distort your mean.


From RFC 2068 section 3.2.1:

The HTTP protocol does not place any a priori limit on the length of a URI. Servers MUST be able to handle the URI of any resource they serve, and SHOULD be able to handle URIs of unbounded length if they provide GET-based forms that could generate such URIs. A server SHOULD return 414 (Request-URI Too Long) status if a URI is longer than the server can handle (see section 10.4.15).

Note: Servers should be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations may not properly support these lengths.

Although IE (and probably most other browsers) support much longer URI lengths, I don't believe most forms or client-side apps rely on anything above 255 bytes working. Your server logs should provide some statistics about what kind of urls you are seeing.


This is probably unknowable without indexing the entire Internet, but according to an analysis by Kelvin Tan on a dataset of 6,627,999 unique URLs from 78,764 unique domains, the answer is 76.97:

Mean: 76.97

Standard Deviation: 37.41

95th% confidence interval: 157

99.5th% confidence interval: 218