Why is the approximation of Hessian$= J^TJ$ reasonable?
The quadratic model based on the true Hessian is derived from truncating a Taylor series of the objective function as a whole, whereas the quadratic model based on the gauss-Newton hessian is based on truncating a Taylor series of the residual.
Starting with the optimization problem: $$\min_x \frac{1}{2} ||y-f(x)||^2,$$ Consider taking a Taylor series of $f$: $$f(x)=f(x_0)+J(x-x_0)+\text{higher order terms}.$$ The approximate optimization problem formed by truncating the Taylor series, $$\min_x \frac{1}{2} ||y-f(x_0)-J(x-x_0)||^2,$$ has Hessian $J^TJ$.
In general this is not exactly equal to the true Hessian, owing to potential second order cross reactions between other terms in the Taylor series of the residual, but they are equal when $y=f(x_0)$.