Entropy[...] doesn't calculate the entropy of a probability distribution? What does it do?
It seems Mathematica's Entropy
is equivalent to the following code (at least for lists of symbols and strings):
entropy[list_List] :=
With[{p = Tally[list][[All, 2]]/Length[list]},
-p.Log[p]
]
entropy[str_String] :=
With[{p = Tally[Characters@str][[All, 2]]/StringLength[str]},
-p.Log[p]
]
You can try this on the examples on the Entropy
help page to see the result is the same:
entropy[{0, 1, 1, 4, 1, 1}] == Entropy[{0, 1, 1, 4, 1, 1}]
(* True *)
entropy["A quick brown fox jumps over the lazy dog"] ==
Entropy["A quick brown fox jumps over the lazy dog"]
(* True *)
This means that Mathematica calculates entropy using Log
base e, which is called nat entropy. With a choice of 2 for the base of the Log
you get the Shannon entropy and with 10 as base you end up with the Hartley entropy.
Borrowing from Sjoerd C. de Vries,(noticed this also matches rojolalalalalalalalalalalalala's comment), you don't need to generate a list of random number in order to calculate the entropy of a distribution, but you do need to if you want to use Entropy
.
Expectation[-Log[PDF[BernoulliDistribution[.2], q]],
q \[Distributed] BernoulliDistribution[.2]]
(* 0.500402 *)
This matches the formula for the entropy of the Bernoulli distribution,
-.2 Log[.2] - .8 Log[.8]
(* 0.500402 *)
The Entropy
function takes a list of numbers and gets the proportion of values for each unique number and applies the entropy formula you show using those proportions ($p_i$).
For a binomial distribution:
(* Sample size *)
n = 97
(* Take random sample *)
x = RandomVariate[BinomialDistribution[1, 0.5], n]
(* {0,0,1,0,1,1,1,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,1,1,1,0,0,1,1,0,0,0,
0,1,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,1,1,1,
0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,1,1,0,0,0,1,1,0,1,0,0,1,1} *)
(* Calculate entropy *)
Entropy[x]
(* Totals for each unique value *)
x1 = Total[x]
(* 41 *)
x0 = n - Total[x]
(* 56 *)
For a random sample from a normal distribution where all values are unique:
n = 97
x = RandomVariate[NormalDistribution[0, 1], n]
Entropy[x]
(* Log[97] *)