How to calculate the entropy of a file?

  • At the end: Calculate the "average" value for the array.
  • Initialize a counter with zero, and for each of the array's entries: add the entry's difference to "average" to the counter.

With some modifications you can get Shannon's entropy:

rename "average" to "entropy"

(float) entropy = 0
for i in the array[256]:Counts do 
  (float)p = Counts[i] / filesize
  if (p > 0) entropy = entropy - p*lg(p) // lgN is the logarithm with base 2

Edit: As Wesley mentioned, we must divide entropy by 8 in order to adjust it in the range 0 . . 1 (or alternatively, we can use the logarithmic base 256).

A simpler solution: gzip the file. Use the ratio of file sizes: (size-of-gzipped)/(size-of-original) as measure of randomness (i.e. entropy).

This method doesn't give you the exact absolute value of entropy (because gzip is not an "ideal" compressor), but it's good enough if you need to compare entropy of different sources.