Estimating the size of a rainbow table
RainbowCrack is probably what you would be using to generate rainbow tables. Rainbow tables are always generated over a keyspace, such as alpha-numeric 5-9 bytes long and the chain length and count which will affect the rate and the size of the resulting tables. If you have an input file, then it's not a rainbow table, it's some other lookup table. A rainbow table is a speical type of lookup table with neat properties. Such as the size of the hash function (sha256
vs sha512
) doesn't affect the size of the rainbow table.
There are some matlab scripts floating around to calculate the size of rainbow table, however this site is easier to use.
A rainbow table is characterized by its average chain length. It is a parameter which is chosen at table construction time. Let's call it t.
If the table covers about N tentative passwords, then it will have a size of about N/t "entries", where an entry is a "chain end". The encoded size of a chain end depends on a lot of details, but it will typically be as large, or somewhat larger, than a field which can hold the integer value N. In other words, if N is close to 240, each entry will need at least 5 bytes, but probably a bit more than that, say 8 bytes.
To save on storage cost, you will want to make t as big as possible. However, you cannot make it as big as you want, because it raises other costs. Briefly stated:
- The table construction cost is about 1.7*N evaluations of the hash function.
- The storage cost is N/t chain ends.
- The CPU cost when attacking is about t2 evaluations of the hash function.
- The lookup cost when attacking is about t random accesses in the tables.
A modern hard disk allows about 100 lookups per second, so if you want to keep attack time below one minute, you cannot have t higher than 6000. You can have much higher values for t if you use SSD (which allow for many thousands of random accesses per second), but storage cost increases because SSD are quite expensive. Also, if t becomes too high, the quadratic CPU cost can become prohibitive.
The difference between a rainbow table, what existed before (Hellman's time-memory trade-off), and the modern variants which mix ideas from Hellman, Oechslin, Rivest, Biham and a few others, is a factor of at most 2 in CPU and lookup costs.