Chemical potential
consider the grand canonical ensemble, $$ \rho \sim \exp[-\beta (E-\mu N)] $$ In the exponent, the inverse temperature $\beta = 1/kT$ is the coefficient in front of one conserved quantity, the (minus) energy, while another coefficient, $\beta\mu$, is in front of the number of particles $N$. The chemical potential is therefore the coefficient in front of the number of particles - except for the extra $\beta$. The number of particles has to be conserved as well if $\mu$ is non-zero.
As long as the sign of $N$ is well-defined, the sign of the chemical potential is well-defined, too.
Now, for bosons, $\mu$ can't be positive because the distribution would be an exponentially increasing function of $N$. Note that in the grand canonical ensemble - which is really the ensemble in which $\mu$ is sharply defined - the dual variable to $\mu$, namely $N$, is not sharply defined. However, the probability for ever larger - infinite $N$ would be larger, so the distribution would be peaked at $N=\infty$. Such a distribution couldn't be well-defined. We want, in the thermodynamic limit, the grand canonical ensemble, while assuming a fixed $\mu$, also generate a finite and almost well-defined $N$, within an error margin that goes to zero in the thermodynamic limit. That couldn't happen for bosons and a positive $\mu$.
This catastrophe would be possible because $N=\sum_i n_i$, a summation over microstates $i$, and every $n_i$ can be an arbitrarily high integer for bosons. For fermions, the problem doesn't occur because $n_i=0$ or $1$ for each state $i$. So for fermions, we can't argue that $\mu$ has to be positive. Note that $E-\mu N$ in the exponent is the sum $\sum_i N_i (e_i-\mu)$. For bosons, the problem occurred for states for which $e_i-\mu$ was negative i.e. $e_i$ was low enough. For fermions, however, the number of such states - and therefore the maximum number of fermions in them - is finite, so the divergence doesn't occur if the chemical potential is positive. For fermions, positive $\mu$ is OK.
In fact, for fermions, both positive and negative $\mu$ is OK. Also, it is easy to see that if both particles and antiparticles exist, $\mu$ of the antiparticle has to be minus $\mu$ of the particle because only the difference $N_{\rm particles} - N_{\rm antiparticles}$ is conserved; this is true both for bosons and fermions.
So if the potential for electrons is positive, the potential for positrons or holes (which play the very same role) has to be negative, and vice versa.
Nothing changes about the meaning of the chemical potential when one switches from classical physics to quantum physics: in fact, above, I was assuming that there are "discrete states" for the particles, just like in quantum physics - otherwise we wouldn't be talking about bosons and fermions which are only relevant in the quantum setup. Classical physics is a limit of quantum physics in which the number of states is infinite because $\hbar$ goes to zero, so a finite number of particles never ends up in "exactly the same state". In some sense, Ludwig Boltzmann, while working in the context of classical statistical physics, was inherently using the thinking and intuition of quantum statistical physics - he was a truly ingenious "forefather" of quantum physics.
In relativity, one has to be careful how we define the energy of a state. Note that the physically meaningful combination that appears in the exponent is $e_i-\mu$, so if one shifts $e_i$ e.g. by $mc^2$, the latent energy, one has to shift $\mu$ in the same direction by the same amount. The notions of chemical potential obviously work in relativity, too. Relativistic physics is not a "completely new type of physics". It is just a type of the old physics that happens to respect a symmetry - the Lorentz symmetry.
Again, in quantum field theory which combines both quantum mechanics and relativity, statistical physics including the notion of the chemical potential also works but one must be careful that particle-antiparticle pairs may be created with enough energy. That implies $\mu_{p}=-\mu_{np}$, as I said.
There can't be any general ban on a negative chemical potential of fermions: fermionic $\mu$ can have both signs. However, in the particular theory that Allan wanted to describe, he could have had more detailed reasons why $\mu$ should have been positive for his fermions. I am afraid that this would be an entirely different, more specific question - one about superconductors. As stated, your question above was one about statistical physics and I tend to believe that the text above exhausts all universal facts about the sign of the chemical potential in statistical physics.
I think the above answers are good and correct. They are just too long for my taste. Here is a more "simple" answer (according to me of course). Only for non-interacting particles to keep it simple:
When we talk about chemical potential, we are talking about a system in contact with a huge reservoir of particles. These particles can go and come from the reservoir to our system. The reservoir (free) energy per particle is the chemical potential $\mu$.
We assume equilibrium: the sub-system is in equilibrium with the reservoir. This means the combined system has lowest free energy (or highest entropy), and that is how the particles arrange themselves.
Now lets' say the lowest energy state of the subsystem is $E_0$ into which particles can go. We measure chemical potential $\mu$ with respect to this number. I mean the value "zero" for chemical potential $\mu$ is sort of a convention where we assume the lowest energy state has value zero.
To lower the total free energy, a particle from the reservoir would happily go into this state if $E_0$ < $\mu$. For fermions, this just fills the state $E_0$ and then we would move on to the next state of higher energy $E_1$; if $E_1$ < $\mu$, fill it and keep going. So you see that at some point you stop since $E_n$ > $\mu$ for some $n$. But for bosons there is trouble If $E_0$ < $\mu$, you can keep putting more and more particles into $E_0$ from the reservoir, each transfer lowers total free energy, and you keep going and going... until the reservoir is empty which makes no sense. So basically this situation is not tenable with an actual reservoir. Either $\mu$ < $E_0$ and you have finite number of particles in $E_0$, or you can't have a reservoir and you get some fixed population of particles in $E_0$. You can't have a reservoir in contact with a system where $\mu$ > $E_0$. It just is contradictory. In practice what it means, is $\mu$ can be quite close to $E_0$ and one would get macroscopic occupation of the ground-state (Bose condensation) but not exactly equal to or larger in order to have a reservoir with which to exchange particles.
I guess that sums it up for me.
In ordinary substances, we have the system in thermal equilibrium with the environment, and with a fixed number of particles. However we go to the grand canonical system for efficiency, where we do not have to restrict ourselves to a fixed particle number $N_{0}$, and that simplifies calculations enormously. In this, the system is coupled to the environment which is kept at a fixed Temperature $T$ and chemical potential $\mu$.
In reality, in our actual problem,we are in a "canonical" system, and the quantities $T$ and $N_{0}$ are fixed. However we go to the grand canonical system, and work with a chemical potential $\mu$, along with the fixed $T$, that makes sure to produce an $\textbf{average particle number} \ <N>$ equal to $N_{0}$, in the grand canonical system. Thus, the chemical potential is brought forth in an artificial way.
In a way, we have two degrees of freedom, actually $T$ and $N_{0}$, but we chose to work with independent parameters $T$ and $\mu$ while getting our results from the grand canonical theory.
Now, in the grand canonical ensemble, each state characterized by a particular total energy $E$ and total particle number $N$, has the probability of appearing
\begin{equation} \rho\sim e^{-\beta E+\beta \mu N} \end{equation}
Now, let's deal with fermions first: as $T\rightarrow 0$, the probability of having higher energy states becomes negligible. However, with $\mu>0$ as $T\rightarrow 0$, we the probabilities become exponentially larger as we keep adding more particles. Combining the two facts, we predominantly end up with states with simultaneously "low" total energy and with "high" number of particles. (The use of the terms "high" and "low" is subjective, but should be clear from context.). But this does not give a divergingly large contribution for fermions, because with a particular "small" total energy, due to the Pauli Exclusion Principle, the total number of particles cannot exceed a certain value. Hence as $T\rightarrow 0$, we're required to have a positive value of $\mu$. There are only so many states which give a meaningful contribution.[See note in the end.]
If we end up averaging over the states that do contribute, we'll find that they produce a given average number of particles $N_{0}$, with a suitable choice of $\mu(T=0)$. (That is of course, exactly what is done).
Now as the temperature $T$ increases, higher energy states start becoming more probable, and if $\mu$ were to stay constant or increase, we would start having a problem, as there would be more states with more energy and more number of particles that give a meaningful contribution, and thus the average number of particles $<N>$ in the system would shoot up. But of course, the whole point is to keep the particle number $<N>=N_{0}$,a constant. Thus we have to postulate that $\mu$ decreases and becomes negative,(very rapidly, as we'll argue in a moment).
When $T$ is very large and thus $\beta$ is very small , we argue that $\mu$ is negative , which we write as $\mu=-|\mu|$, and thus, now,
\begin{equation} \rho\sim e^{-\beta E-\beta |\mu| N} \end{equation}
As $\beta\rightarrow 0$, higher energy states are more probable, but we need to ensure that $|\mu|$ increases at a tremendously fast rate so that $-\beta|\mu|$ is a very large negative quantity(even when $\beta\rightarrow 0$) so that states with very high number of particles give a very negligible contribution. Thus only states with a broad spectrum of energies and simultaneously with low particle number contribute. For this to be possible,again, this would give on average , the same value for the average number of particles $<N>=N_{0}$ in the system( see the Note in the end).
Thus , for all temperatures, to keep the particle number constant, essentially, we end up varying $\mu$ suitably.
The high temperature limit argument is the same for bosons as well, but as pointed out in Lubos Motl's answer, when $T\rightarrow 0$, we absolutely cannot have a positive value of $\mu$ because the Pauli Exclusion Principle does not hold for bosons, and we would get a violent divergence for the average $<N>$ as $T\rightarrow 0$. Thus, for bosons, we need to start with a value of $\beta\mu=0$ as $\beta\rightarrow \infty$.
[In fact, there is divergence nonetheless for $T\leq T_{critical}$ and stricty speaking our formalism holds only for temperature values above the (very small)critical value. $\beta\mu\approx 0$ at $T=T_{critical}$, and formally, it is also taken $0$ for all values of $T$ below the critical value. For example, as shown in Pathria Figure 7.2.]
Thus overall, in case of both fermions and bosons, we can use the average particle number $<N>$ that we get, as the value of the actual fixed value of $N_{0}$ in the original problem.
[Note: Here , we basically are using a density of states argument for states simultaneously at a particular total energy $E$ and particular total $N$ value. After we get the probability of the state appearing from the standard probability of states formula (which is the result of entropy maximisation, in a way), we need to find out how many such states are actually there, from the density of states, as a function simultaneously of $E$ and $N$. That can in principle be estimated from the individual particle density of states, essentially a combinatorial argument].