No maximum(minimum) of rationals whose square is lesser(greater) than $2$.

The point here is that iterations of the Möbius transformation $p \mapsto \frac{2p+2}{p+2}$ converge to $\sqrt{2}$, so every time you apply the transformation you get closer to $\sqrt{2}$. This can be thought of as describing a generalized continued fraction

$$2 - \cfrac{2}{4 - \cfrac{2}{4 - \cfrac{2}{\cdots}}}$$

for $\sqrt{2}$. The dynamics of Möbius transformations in general are fairly well-understood; for the transformation $p \mapsto \frac{ap+b}{cp+d}$ the possible fixed points are the roots of the quadratic polynomial $cp^2 + dp = ap + b$, and using the Banach fixed point theorem (or a more specific closed form of iterations of Möbius transformations using linear algebra) one can determine which of the fixed points are attractive or repellent.

So if one wanted to design a Möbius transformation converging to $\sqrt{n}$ for some non-square $n$, this would require that $d = a, c = 1, b = n$, giving

$$p \mapsto \frac{ap + n}{p + a}$$

for some $a$, and it should not be hard to find a value of $a$ such that the fixed point $\sqrt{n}$ is attractive.

While this technique is a nice trick, because the polynomial $cp^2 + dp = ap + b$ is quadratic, it does not generalize to prove the corresponding result for roots of cubic or higher degree polynomials.


Below I show that Rudin's approximation arises simply by applying by the secant method - a difference analog of Newton's method for finding successively better approximations to roots.enter image description here

As the linked Wikipedia article shows, the recurrence relation for the secant method is as below.

$$\rm S_{n+1}= \dfrac{S_{n-1}\ f\:(S_n) - S_n\ f\:(S_{n-1})}{f\:(S_n)-f\:(S_{n-1})}\qquad\qquad\qquad\qquad$$

For $\rm\ (S_{n-1},S_n,S_{n+1}) = (q,p,p')\ $ and $\rm\ f\:(x) = x^2-d\:,\:$ we obtain

$$\rm p'\ =\ \dfrac{q\:(p^2-d) - p\:(q^2-d)}{p^2-d-(q^2-d)}\ =\ \dfrac{(p-q)\:(p\:q+d)}{p^2-q^2}\ =\ \dfrac{p\:q+d}{p+q}$$

Finally specializing $\rm\: q = 2 = d\: $ yields Rudin's approximation $\rm\displaystyle\ p'\ =\ \frac{2\:p+2}{\ \:p+2}$

The secant method has beautiful connections with the group law on conics. To learn about this folklore, I highly recommend Sam Northshield's Associativity of the Secant Method. The reader already familiar with the group law on elliptic curves, but unfamiliar with the degenerate case of conics, might also find helpful some of Franz Lemmermeyer's expositions, e.g. Conics - a poor man's elliptic curves.

Finally, note this the approximation can be derived purely algebraically as follows.

Given lower and upper approximations to a square-root, we may obtain a better lower approximation $\rm\ p'\ $ by $\:$ "composing" $\:$ them,$\ $ namely:

THEOREM $\rm\displaystyle\quad\ \ q\ >\ \sqrt d\ > \ p\ \ \:\Rightarrow\:\ \ \sqrt d\ > \ p'\ >\ p\quad\ \ for\quad\ p' \:=\ \frac{p\:q+d}{p+q} $

Proof: $\rm\quad\displaystyle 0\ \: >\ (q-\sqrt d)\ \ (p-\sqrt d)\ =\ p\:q+d - (p+q)\:\sqrt d\ \ \Rightarrow\ \ \sqrt d\ >\ p'$

Finally $\rm\quad\quad\displaystyle p'-p\ =\ \frac{p\:q+d}{p+q} - p\ =\ \frac{\ d - p^2}{p+q}\: >\ 0\ \ \Rightarrow\ \ p'\ >\ p$


I had this same question.

For a very clear explanation, see page 25 and 26 of this.