How do you determine which representation of a function to use for Newton's method?
For your curiosity.
Asking this far-from-innocent question, you are almost asking me what I have been doing during the last sixty years.
My answer is : what is the transform of $f(x)$ which makes it the most linear around the solution ? If you find it, you will save a lot of iterations.
One case I enjoy for demonstration is $$f(x)=\sum_{i=1}^n a_i^x- b^x$$ where $1<a_1<a_2< \cdots < a_n$ and $b > a_n$; on this site, you could find many problems of this kind.
As written, this function is very bad since it varies very fast and it is very nonlinear. Moreover, it goes through a maximum (not easy to identify) and you must start on the right of it to converge.
Now, consider the transform $$g(x)=\log\left(\sum_{i=1}^n a_i^x \right)- x\log(b)$$ It is almost a straight line ! This means that you can start from almost anywhere and converge fast.
For illustration purposes, trying with $n=6$, $a_i=p_i$ and $b=p_{n+1}$ ($p_i$ being the $i^{th}$ prime number). Being very lazy and using $x_0=0$, the iterates would be $$\left( \begin{array}{cc} n & x_n \\ 0 & 0 \\ 1 & 1.607120621 \\ 2 & 2.430003204 \\ 3 & 2.545372693 \\ 4 & 2.546847896 \\ 5 & 2.546848123 \end{array} \right)$$ Using $f(x)$, you must start iterating with $x_0 > 2.14$ to have convergence (big work to know it !). Let us try with $x_0=2.2$ to get as successive iterates
$$\left( \begin{array}{cc} n & x_n \\ 0 & 2.200000000 \\ 1 & 4.561765400 \\ 2 & 4.241750505 \\ 3 & 3.929819520 \\ 4 & 3.629031502 \\ 5 & 3.344096332 \\ 6 & 3.082467015 \\ 7 & 2.856023930 \\ 8 & 2.682559774 \\ 9 & 2.581375720 \\ 10 & 2.549595979 \\ 11 & 2.546866878 \\ 12 & 2.546848124 \\ 13 & 2.546848123 \end{array} \right)$$
The last point I would like to mention : even if you have a good guess $x_0$ of the solution, search for a transform $g(x)$ such that $g(x_0)\,g''(x_0) >0$ (this is Darboux theorem) in order to avoid any overshoot of the solution during the path to it.
Newton's method is very robust once you are near a root - exactly what function you chose to run it on is not of that much consequence; it will converge quickly no matter what. You might as well just use whatever is easiest.
To be more precise, let $f$ be the function we wish to find a root of and define $g(x)=x-\frac{f(x)}{f'(x)}$ to be the function performing a single step of Newton's method. First: it should be clear that things go badly if $f'$ and $f$ share a root, so let's assume they don't - and we'll assume that $f$ is smooth near the root, for convenience.
The idea is that $g$ pulls everything near a root to the root - and to quantify how fast, we can take a look at its Taylor series at a root. Let $x_0$ be a root of $f$. If we differentiate $g$ then simplify at $x_0$ by noting that $f(x_0)=0$, we can see the following: $$g(x_0)=x_0$$ $$g'(x_0)=0$$ $$g''(x_0)=\frac{f''(x_0)}{f'(x_0)}$$ In particular, this tells us that, once we are close enough to $x_0$, if the current amount of error is $e$, one more application of $g$ makes the error something more like $g''(x_0)e^2$ - this is pretty fantastic irrespective of what $g''(x_0)$ is - if you take a small number and square it over and over and over, it gets really small really fast - which is what we want to happen to the error. I suppose if you were really eager, you could try to make $g''(x_0)$ small by choosing $f$ carefully - but realistically, it does not matter, since it tends to be the case that running an additional step of the method completely dwarfs any gains from fine-tuning $f$.
This said, Newton's method can behave pretty badly on a global scale - even for polynomials it can do really nasty things*. Choosing your function carefully isn't likely to help you much here either, since when functions have really small derivatives, the method is unstable, but when they have rapidly growing derivatives (like near an asymptote), the method converges very very slowly - basically, this isn't a good method to look for a root without having any idea of where one might be, and fine tuning $f$ won't help you there either.
(*Example: Look up "Newton's method basins of attraction" on Google - you'll get these amazing plots of where the method converges if you run it on polynomials as simple as $x^3-1$ starting at various points in the complex plane. Basically, when it's not near a root, anything can happen)
It depends on the derivative of your function at the zeros of your function.
The formula suggests that $$x_{n+1} = x_n-\frac {f(x_n)}{f'(x_n)}$$
Thus you get a better (faster) result if the derivative of your function is higher at the zeros of $f(x)$.
The process becomes very chaotic if your derivative at a staritng point is close to zero.
As a very interesting case you may consider $$y=\sin x$$ and try to find $x=\pi$ starting at a point close to $\pi/2$
Without knowing the zeros of $f(x)$ making a decision is not an easy task.