What is maximum entropy?

Before knowing what is maximum entropy, the term entropy ($ -\sum \pi * log(\pi)$) should be very clear in your mind. Let me explain the entropy via an example directly. Take a coin and flip it. You have two possibilities, whether to obtain "head" or "tail". Now, imagine you flipped this coins 100 times and at the end you count how many times you got "head" and how many times you got "tail". There is one of the following possibilities:

  • You obtain only "head". Then P(head) = 1 and P(tail) = 0. The entropy here is 0.
  • You obtain only "tail". Then P(tail) = 1 and P(head) = 0. The entropy here is 0.
  • You obtain 50% "head" and 50% "tail". Then P(head) = 0.5 and P(tail) = 0.5. The entropy here is log(2).
  • for all other case, the entropy is less than log(2).

If you compare the 4 points above, you see that when you are "certain" abou the result (you get only head or you get only tail), your entropy is 0, which means there is not place for uncertainety. However, when you get log(2), this is because you have only two possibile states (tail or head) and they share the same probability and you are not sure at all which one to get. By this (log(2)), you have a maximum of certainty, which means the entropy is maximum.

Now, Maximum Entropy really is? It is a statistical inference method to construct a probability distribution with what we are given as information about the system. Information about the system are the constraints. The mandatory constraints is that the sum of probabilities should be equal to 1. You can add other constraints such as expected value (mean) or fluctuations (variance). - If you constraint is only sum of probabilities = 1, then you get a uniform distribution (like when tail and head have equal probability). - If you add a mean, you get an exponential distribution. - If you add mean and variance then you get a Gaussian distribution.

This method is inspired from physics and more precisely "thermodynamics". That is why we say it's about statistical physics. However, this method does not at all explain or (model) physical problem, but only statistical problems.

The subject is however far to be simple. In physics, we usually talk about Maximum Entropy Production and Minimum Entropy Production which represent the real "pysical" entropy production by the physical/biological/chemical systems and not the statistical data (although, there is an analogy, but the point should be crystal clear).

Hope this help ;)


Typically, we have some constraints on a distribution (for example, the marginal distributions), and we seek a max-entropy distribution satisfying these constraints. This basically means that, apart from the known constraints, we know as little as possible about the distribution. We do not want to specify anything about the distribution that is not already implied by the known constraints.