Deriving the canonical ensemble from the microcanonical ensemble: Why expand the logarithm of the probability and not some other function

The applicability of the canonical ensemble depends on the form of $\Omega(E)$. For some $\Omega(E)$, the canonical ensemble can't even be defined, much less derived! The canonical ensemble applies only in systems for which the quantity $$ \frac{\partial \log\Omega(E)}{\partial E} $$ is a decreasing function of $E$. The point is that the derivation of the canonical ensemble from the microcanonical ensemble relies on an assumption about the form of the function $\Omega(E)$, as well as on a thermodynamic limit. These two things are the reason why we keep the first-order term in the expansion of $\log\Omega(E)$ instead of in the expansion of some other function of $\Omega(E)$. This is illustrated below with two explicit examples.

There are systems of interest that do not satisfy these conditions, and for those systems, the canonical ensemble is not applicable, at least not strictly (though it may still be an excellent approximation). So, in a sense, the textbook derivation of the canonical ensemble from the microcanonical ensemble does assume the canonical ensemble! More accurately, the derivation assumes certain conditions that are true of many systems of interest, and those are the conditions under which the canonical ensemble is applicable. The following examples illustrate those conditions.


Example 1: Ideal gas

The entropy of an ideal gas is $$ S(E)=N\log V +\frac{ND}{2}\log E \tag{1} $$ where $N$ is the number of atoms, $V$ the total volume, $E$ the total energy, and $D$ the number of spatial dimensions (normally $D=3$). The number of states is $$ \Omega(E)\propto e^{S(E)}. \tag{2} $$ If we partition the system into two parts, a large part $L$ and a small part $S$, then assuming the microcanonical ensemble for $L+S$ with total energy $E$ is equivalent to assigning the probability $$ p(\epsilon)\propto \Omega(E-\epsilon) \tag{3} $$ to each state (each energy eigenstate in the quantum case) of $S$ with energy $\epsilon$. Now, consider the expansion $$ S(E-\epsilon)=S(E)-\epsilon\frac{\partial S}{\partial E} +\frac{\epsilon^2}{2}\frac{\partial^2 S}{\partial E^2} +\cdots \tag{4} $$ Equation (1) implies $$ \frac{\partial^n S}{\partial E^n} \propto\frac{N}{E^n} \tag{5} $$ with a coefficient that is independent of $N$ and $E$. This can also be written $$ \frac{\partial^n S}{\partial E^n} \propto\frac{1}{N^{n-1}(E/N)^n}. \tag{6} $$ The thermodynamic limit is $N\to \infty$ with $E/N$ and $V/N$ held fixed. Here's the key: The only $\epsilon$-dependent term in (4) that survives this limit is the $n=1$ term, so equation (3) becomes exactly $$ p(\epsilon)\propto \exp(-\beta \epsilon) \hskip2cm \beta := \frac{\partial S}{\partial E}\propto \frac{1}{E/N} \tag{7} $$ in this limit. This is why we expand $S(E)=\log\Omega(E)$ instead of some other function of $\Omega(E)$, at least in the case of an ideal gas.


Example 2: Photon gas

Now suppose $$ S(E)=\left( \left(\frac{E}{\hbar c}\right)^D V\right)^{1/(D+1)}. \tag{8} $$ This is the entropy of a gas of photons. The point of considering this example is that the canonical ensemble still applies even though this system does not involve any given number of particles $N$. To define the thermodynamic limit, we can use $V\to \infty$ with $E/V$ fixed. Equations (2)-(4) still apply here, and equation (5) is replaced with $$ \frac{\partial^n S}{\partial E^n} \propto \frac{\big( E^D V\big)^{1/(D+1)}}{E^n} \propto V^{1-n} (E/V)^{D/(D+1)-n} \tag{9} $$ with a coefficient that is independent of $E$ and $V$. Once again, the only $\epsilon$-dependent term in (4) that survives this limit is the $n=1$ term, so equation (3) becomes exactly $$ p(\epsilon)\propto \exp(-\beta \epsilon) \hskip2cm \beta := \frac{\partial S}{\partial E}\propto (V/E)^{1/(D+1)}. \tag{10} $$ This is why we expand $S(E)=\log\Omega(E)$ instead of some other function of $\Omega(E)$, at least in the case of a photon gas.


Summary

The point is that the canonical ensemble is strictly applicable only in a thermodynamic limit and only for a certain class of functions $\Omega(E)$. Many systems of interest do satisfy those conditions, and that's the justification for keeping only the first-order term in the expansion of $\log\Omega(E-\epsilon)$ rather than in the expansion of some other function of $\Omega(E -\epsilon)$.