Fastest Square Root Algorithm
If you use Halley's method, you exhibit cubic convergence! This method is second in the class of Householder's methods.
Halley's method is:
$$
x_{n+1} = x_n - \frac{2f(x_n)f'(x_n)}{2[f'(x_n)]^2-f(x_n)f''(x_n)}
$$
If we let $$f(x) = x^2 - a$$
which meets the criteria, (continuous second derivative)
Then Halley's method is:
$$
x_{n+1} = x_n - \frac{\left(2x_n^3 - 2ax_n\right)}{3x_n^2 + a}
$$
Which has the simplification:
$$
x_{n+1} = \frac{x_n^3 + 3ax_n}{3x_n^2 + a}
$$
I also will add this document which discusses extensions of the newtonian method.
There exists an extension due to Potra and Pták called the “two-step method” that may be re-written as the iterative scheme
$$x_{n+1} =x_n − \frac{f(x_n)+f\left(x_n − \frac{f(x_n)}{f′(x_n)}\right)}{f'(x_n)}$$
that converges cubically in some neighborhood of of the root $x^{*}$ which does not require the computation of the second derivative.
See: On Newton-type methods with cubic convergence for more information on this topic.
As Hurkyl and others have noted, your best bet is to just use Newton's Method. These alternative methods generally come with more operations per iteration. They aren't really worth the computational cost, but they are a good comparison.
Newton's method for solving $f(x)=x^2-N=0$ leads to the recurrence
$x_{n+1}=x_n-\frac{x_n^2-N}{2x_n}=\frac{x_n+N/x_n}2$, also known as Heron's method. Since $f'(x)\ne0$ at the root, the convergence is quadratic (i.e. the number of correct decimals doubles with each step once a threshold precision is reached).
The results depend on the starting value, of course. Simply guessing $x_0=1$ leads to
$$x_{1} = 493827161.00000000000\\
x_{2} = 246913581.49999999899\\
x_{3} = 123456792.74999998937\\
x_{4} = 61728400.374999909634\\
x_{5} = 30864208.187499266317\\
x_{6} = 15432120.093744108961\\
x_{7} = 7716092.0468278285538\\
x_{8} = 3858110.0230600438248\\
x_{9} = 1929183.0086989850523\\
x_{10} = 964847.48170274167713\\
x_{11} = 482935.55973452582660\\
x_{12} = 242490.33277426247529 \\
x_{13} = 123281.64823302696814 \\
x_{14} = 65646.506775513694016 \\
x_{15} = 40345.773393104621684 \\
x_{16} = 32412.760144718719221 \\
x_{17} = 31441.958847358050036 \\
x_{18} = 31426.971626562861740 \\
x_{19} = 31426.968052932067262 \\
x_{20} = 31426.968052931864079 $$
with small enough error.
Not a bona fide "alogrithm", but a cute hack nevertheless that I once used in code that required taking an inverse square root millions of times (back when I was doing computational astrophysics) is found here:
http://en.wikipedia.org/wiki/Fast_inverse_square_root
It does use a few iterations of Newton's method, but only after some very, very clever trickery.
I remember naively using trial-and-error optimization to find a "magic number" that would come closest to a direct square root, though of course it was much slower (still faster than just called "sqrt" from math.h) and had a higher error than the above hack.