How to count microstates in quantum statistical mechanics
Summary: the question "what is the probability of state $|\psi \rangle$" is the wrong question to ask, because it's experimentally unobservable. What you really care about is the results of measurements, and the number of possible different measurement outcomes is equal to the number of microstates.
Consider a bunch of noninteracting spin 1/2 particles in thermal equilibrium, and look at a single spin inside.
On the classical level, at thermal equilibrium, the spin of this particle should be a vector with random direction. Now we might ask, on the quantum level, what's the state of this particle?
However, this is the wrong question to ask. Given any spinor $|\psi \rangle$, it's possible to find an axis $\mathbf{n}$ so that $|\psi\rangle$ is the positive eigenstate of $S_{\mathbf{n}}$, i.e. the spin is definitely up along $\mathbf{n}$. So individual elements of the Hilbert space are insufficient to represent a thermal state.
Instead, we need to do the same thing we did in the classical case: replace the state of a system (i.e. $(\mathbf{x}, \mathbf{p})$ classically and $|\psi \rangle$ in quantum) with a probability distribution over those states. One such probability distribution is $$50\% \text{ chance } |\uparrow \rangle, \quad 50\% \text{ chance } |\downarrow \rangle.$$ The miraculous thing is that this distribution is rotationally invariant! That is, the distributions $$50\% \text{ chance } | \mathbf{n} \rangle, \quad 50\% \text{ chance } |-\mathbf{n}\rangle$$ are indistinguishable by measurement for any direction $\mathbf{n}$. Every spin measurement of any $\mathbf{n}$ ensemble, along any direction $\mathbf{m}$, gives a 50/50 result.
This tells us that the correct description of a thermal ensemble of quantum particles can't be an assignment of probabilities to quantum states, because such a construction isn't unique: there are many probability distributions that are experimentally indistinguishable. (The quantity that corresponds to probability distributions "up to distinguishability" is called the density matrix $\rho$.)
With this setup, the answers to your questions are:
- Why don't we consider superpositions? We do! You can't directly ask "what is the probability the system is in state $|\psi \rangle$", because this is not an experimentally observable quantity, as explained above. But you can perform a measurement of an operator $\hat{A}$ with eigenvector $|\psi \rangle$ and eigenvalue $\lambda$, and there's some probability you get $\lambda$.
- Why do we count microstates? This is the dimension of our Hilbert space, or more physically, the number of different numbers you can get while measuring $\hat{A}$. In the example above, the probability of getting $\lambda$ is $1/2$ where $2$ is the number of microstates. We aren't privileging the spin up or spin down states, because you get a 50/50 chance for any spin operator $S_{\mathbf{n}}$.
- Why do we count specifically energy eigenstates? This is just for convenience. What we really want is the dimension of the Hilbert space, and $\hat{H}$ is often easy to deal with/reason about.
First of all, independently of the ensamble, if you want a density matrix that describes a state of equilibrium, then it has to commute with the hamiltonian (see Liouville theorem for QM). This means that both the density matrix and the hamiltonian can be diagonalized in the same basis and, thus, the system is indeed in a statistichal mixture of energy eigenstates.
Now, if the system is in the microcanonical ensemble then "counting microstates" means counting how many non-zero terms does the density matrix has in its diagonal. That's why you don't take into account other possible states like superpositions of energy eigenstates. You need to count the diagonal terms of the density matrix written in a particular basis that diagonalizes the hamiltonian too.