YouTube URL algorithm?

There is no need to use a hash. It is probably just a quasi-random 64 bit value passed through base64 or some equivalent.

By quasi-random, I mean it is just a one-to-one mapping with the counting integers, just shuffled.

For example, you could take a monotonically increasing database id and multiply it by some prime near 2^64, then base64 the result. If you did not want people to be able to guess, you might choose a more complex mapping or just pick a random number that is not in the database yet.

Normal base64 would add an equals at the end, but in this case it is implied because the size is known. The character mapping could easily be something besides the standard.


YouTube uses Base64 encoding to generate IDs for each video.Characters involved in generating Ids consists of

(A-Z) + (a-z) + (0-9) + (-) + (_). (64 Characters).

Using Base64 encoding and only up to 11 characters they can generate 73+ Quintilian unique IDs.How much large pool of ID is that?

Well, it's enough for everyone on earth to produce video every single minute for 18000 years.

And they have achieved such huge number by only using 11 characters (64*64*64*64*64*64*64*64*64*64*64) if they need more IDs they will just have to add 1 more character to their IDs.

So when video is uploaded on YouTube they basically randomly select from 73+ Quintilian possibility and see if its already taken or not.if not use it otherwise look for another one.

Refer to this video for detailed explanation.


Using some non-trivial hashing function. The probability of collision is very low, depending on the function, the parameters and the input domain. Keep in mind that cryptographic hashes were specifically designed to have very low collision rates for non-random input (i.e. completely different hashes for two close-but-unequal inputs).

This post by Jeff Attwood is a nice overview of the topic.

And here is an online hash calculator you can play with.

Tags:

Algorithm