Picard's Little Theorem Proofs
There is an essentially elementary proof that can be presented to an audience having only little background in complex analysis. Apart from miraculous trickery and some simple estimates, the only ingredients are Cauchy's integral formula and the existence of holomorphic logarithms on simply connected domains.
A much more complete exposition can be found in Remmert, Classical Topics in Complex Function Theory, Springer GTM 172, chapter 10. Let me emphasize: the following is only a distillate of the parts from Remmert's chapter 10 needed for a proof of Picard's little theorem. Said chapter contains a lot more: extensive historical remarks and references, variants of the proofs and further developments, improvements of the results, some nice applications of Picard's theorem and it culminates in a proof of Picard's great theorem.
I once presented this argument to a group of talented students in a two hours “Christmas special lecture” and I think it worked quite well, but admittedly it is ambitious and the argument is flabbergasting at various points.
The main ingredient in the proof is the amazing:
Theorem (Bloch). If $f$ is holomorphic in a neighborhood of the closed unit disk $\overline{\mathbb D}$ and $f'(0) = 1$ then $f(\mathbb{D})$ contains a disk of radius $\frac{3}{2} - \sqrt 2 \gt 0$.
Remmert prefaces the section containing this result by a statement of J.E. Littlewood:
One of the queerest things in mathematics, ... the proof itself is crazy.
I'll give a proof at the end of this answer.
The way this is applied is:
Exercise. If $f: \mathbb{C} \to \mathbb{C}$ is holomorphic and non-constant then $f(\mathbb{C})$ contains disks of arbitrary radius.
Hint: If $f'(0) \neq 0$ then $g(z) = \frac{f(rz)}{r |f'(0)|}$ satisfies the hypothesis of Bloch's theorem.
There's a second tricky ingredient, due to Landau and refined by König:
Let $G \subset \mathbb{C}$ be a simply connected domain and let $f: G \to \mathbb{C}$ be holomorphic. If $f(G)$ does not contain $0$ and $1$ then there is a holomorphic $g: G \to \mathbb{C}$ such that $$f = \frac{1}{2}\big(1+ \cos{(\pi\cos{(\pi g)})}\big).$$ Moreover, if $g$ is any such function then $g(G)$ does not contain a disk of radius one.
Simple connectedness is used in guise of existence of roots and logarithms of holomorphic functions omitting the value $0$. Let us show first that for a function $h$ on a simply connected domain $G$ such that $\pm 1 \notin h(G)$ there is a holomorphic $H:G \to \mathbb{C}$ such that $h = \cos{H}$: The trick is that $1-h^2$ has no zero, hence there exists $k$ such that $k^2 = 1-h^2$, so $1 = h^2 + k^2 = (h+ik)(h-ik)$. But this means that $h+ik$ doesn't have a zero either, hence it has a logarithm: $h+ik = e^{iH}$ and thus $h = \frac{1}{2}(e^{iH}+e^{-iH})$. Applying this to $h = 2f-1$ (which leaves out the values $\pm 1$ by hypothesis) we get an $F$ such that $h=\cos{(\pi F)}$, but $F$ must leave out all integer values in its range, hence $F = \cos{(\pi g)}$ and unwinding the construction gives us the desired $f=\frac{1}{2}\big(1+ \cos{(\pi\cos{(\pi g)})}\big)$.
The “moreover” part follows from the observation that $g(G)$ must not hit the set $$A = \left\{m \pm \frac{i}{\pi} \log{\big(n+\sqrt{n^2 - 1}\big)},\;m\in\mathbb{Z},\;n \in \mathbb{N}\smallsetminus\{0\}\right\}$$ since for $a \in A$ we have $\cos{(\pi a)} = (-1)^m \cdot n$ by a short calculation. Thus $\cos{(\pi\cos{(\pi a)})} = \pm 1$ and if there were $z \in G$ such that $g(z) \in A$ we would have $f(z) \in \{0,1\}$ contradicting the assumptions. It is not hard to convince oneself that every point $w \in \mathbb{C}$ is within distance $\lt 1$ of some point of $A$ (a picture would help!), hence $g(G)$ can't contain a disk of radius $1$.
Armed with these two ingredients the proof of Picard's little theorem is immediate:
Picard's little theorem. If there exist two complex numbers $a,b$ such that $f: \mathbb{C} \to \mathbb{C}\smallsetminus \{a,b\}$ is holomorphic then $f$ is constant.
Proof. We may assume $\{a,b\} = \{0,1\}$. By the Landau–König theorem we have $f(z) = \frac{1}{2}\big(1+ \cos{(\pi\cos{(\pi g)})}\big)$ for some $g$ whose image does not contain a disk of radius $1$ and by the exercise to Bloch's theorem $g$ must be constant.
Now for the proof of Bloch's theorem:
Lemma. Let $f$ be holomorphic in a neighborhood of the closure of the disk $D = B_r(a)$ and assume that $|f'(z)| \lt 2|f'(a)|$ for $z \in D$. Put $\rho = (3-2\sqrt{2})\cdot r \cdot |f'(a)|$ then $B_{\rho}(f(a)) \subset f(D)$.
Proof. Assume $a = f(a) = 0$ for simplicity of notation and write $C = \sup\limits_{z \in D}{\,|f'(z)|}$.
Put $A(z) = f(z) - f'(0)\cdot z$. Then $A(z) = \int_{0}^{z} (f'(w) - f'(0))\,dw$, so $$|A(z)| \leq \int_{0}^{1} |f'(zt) - f'(0)|\,|z|\,dt.$$ For $d \in D$ we have by Cauchy's integral formula $$f'(d) - f'(0) = \frac{d}{2\pi i} \int_{|w|= r} \frac{f'(w)}{w(w-d)} \,dw,$$ hence $$|f'(d) - f(0)| \leq \frac{|d|}{r - |d|} C$$ and thus $$|A(z)| \leq \int_{0}^{1} \left(\frac{|zt|}{r - |zt|}C\right)|z|\,dt \leq \frac{1}{2} \frac{|z|^2}{r - |z|} C.$$ Let $x = |z| \in (0,r)$ and observe that $|f(z) - f'(0)z| \geq |f'(0)| x - |f(z)|$. The last inequality together with the hypothesis $C \leq 2 |f'(0)|$ gives $$|f(z)| \geq \underbrace{\left(x - \frac{x^2}{r- x}\right)}_{h(x)} |f'(0)|.$$ Now $h(x)$ assumes its maximum $(3 - 2\sqrt{2})r$ at the point $\tilde{x} = (1-\frac{\sqrt{2}}{2})r$. Thus we have shown that for $|z| = \tilde{x}$ we have $$|f(z)| \geq (3 - 2\sqrt{2})\cdot r \cdot |f'(0)| = \rho.$$ But this implies that $B_{\rho}(f(0)) \supset f(B_{\tilde{x}}(0))$. Why? This is because $B_{\tilde{x}}(0)$ is a domain whose boundary is mapped outside the ball $B_{\rho}(f(0))$ by $f$, as $f(0) = 0$, see here (1) at the bottom of the page for more details.
Proof of Bloch's theorem. Assume that $f$ is holomorphic in a neighborhood of the closed unit disk and assume that $f'(0) = 1$. Consider the function $z \mapsto |f'(z)|(1-|z|)$. It takes on its maximum at some point $p \in \mathbb{D}$. Putting $t = \frac{1}{2}(1-|p|)$ we have $B_{t}(p) \subset \mathbb{D}$ and $1-|z| \geq t$ for all $z \in B_{t}(p)$. Therefore $|f'(z)|(1-|z|) \leq 2t|f'(p)|$ and hence $|f'(z)| \leq 2|f'(p)|$ for all $z \in B_t(p)$. Hence the lemma gives us $B_{\rho}(f(p)) \subset f(\mathbb{D})$ for $\rho = (3-2\sqrt{2}) \frac{1}{2} t |f'(p)| \geq \frac{3}{2} - \sqrt{2}$.
The basic argument behind Picard's little theorem is the following:
If $D$ denotes the open unit disk in $\mathbb C$, then there is a holomorphic map $\pi: D \to \mathbb C\setminus \{a,b\}$ which is a covering map in the sense of topology. (Another way to say this is that this map gives a concrete realization of $D$ as the universal cover of $\mathbb C \setminus \{a,b\}$.)
Now if $f: \mathbb C \to \mathbb C\setminus \{a,b\},$ then since $\mathbb C$ is simply connected, covering space theory implies that $f$ lifts to a holomorphic map $\tilde{f}: \mathbb C \to D$, which then must be constant, by Liouville.
If one wants to present this proof in a class, then the hardest part is to construct the map $\pi$. (The covering space arguments can be made without appeal to general theory, by instead constructing the map $\tilde{f}$ concretely via a contour integral.)
In constructing $\pi$, without loss of generality one may assume that $\{a,b\} = \{0,1\}$. One may also replace the disk $D$ by the upper half-plane (since they are conformally isomorphic). The corresponding map $\pi$ was then first discovered in the context of the theory of elliptic integrals and modular functions; indeed, it can be taken to be a certain modular function traditionally denoted $\lambda$. My memory is that Rudin constructs $\lambda$ by hand, and proves enough about it to have the above argument go through; however, I found his argument somewhat mysterious when I first read it, because most of the effort is spent in constructing $\lambda$, and it is not made terribly clear why it exists, or why the theorem pivots on such a construction.
On the other hand, the map $f$ also exists for general reasons, namely the uniformization theorem. (The Riemann surface $\mathbb C \setminus \{a,b\}$ has negative Euler characteristic, and so its universal cover is the open unit disk.)
Unfortunately, the uniformization theorem is probably outside the scope of a one semseter complex analysis course, and I don't know a proof of the uniformization theorem in this particular case that doesn't hinge on the explicit construction of $\lambda$. (Perhaps it's also worth noting that theory of modular functions such as $\lambda$ played an important historical role in the development of that branch of the theory of Riemann surfaces that culminated in the uniformization theorem.)
There is a proof in Ahlfors's book "Complex Analysis", in the chapter on global analytic functions. The proof uses analytic continuation, monodromy theorem and a modular function "$\lambda$".
The monodromy theorem can be stated without proof if the audience is unfamiliar with it. The modular function is not so hard to define. Besides, this proof is instructive in the kind of methods it uses.
I will type in a succinct version of this proof when I have more time. For now, the reference is chapter 8, thm 5. If somebody knows how to link to google books or some such place, it will be even better.