Question about manifolds and coordinate transformations
(This answer assumes you know the differential geometry and just want to know how the physicist gets that expression.)
Let $V,W\subset\mathbb{R}^n$ and $\psi : V\to U$ and $\phi : W\to U'$ be charts for $U,U'\subset M$ on some manifold $M$. Then the "change of coordinates" on $U\cap U'$ is the transition function $$ \phi^{-1}\circ \psi : V\to W.$$ This function induces a natural map between tangent vectors, the pushforward $$ \mathrm{d}(\phi^{-1}\circ \psi) : TV\to TW,$$ which is the Jacobian of the transformation, i.e. at every point $x\in V$, we have $$ \mathrm{d}(\phi^{-1}\circ \psi)_x : T_xV\to T_xW, v\mapsto J(\phi^{-1}\circ\psi)(x)\cdot v.$$ Written in the standard coordinates of the $\mathbb{R}^n$ both $V$ and $W$ are subsets of, the Jacobian is precisely the matrix with components $\frac{\partial x'^a}{\partial x^b}$ you ask about, where $x' = \phi^{-1}\circ \psi$.
We want the functions $\psi$ to be differentiable precisely because we want them to yield this map between tangent vectors. If the maps are not differentiable, there is no natural map induced on the tangent vectors.
Assume that $\varphi:U\rightarrow \mathbb{R}^n$ is a chart function. If $p\in M$ is a point, then we write $$ \varphi(p)=(x^1(p),...,x^n(p)), $$ so $\varphi$ as a local $\mathbb{R}^n$-valued function is equal to $n$ local $\mathbb{R}$-valued functions, which are the coordinate functions of the chart.
Consequently (and because $\varphi$ is invertible), the inverse function is given by $\varphi^{-1}:\mathbb{R}^n\rightarrow M$ (I am abusing notation, because usually it doesn't map from all of $\mathbb{R}^n$, interpret this as a partial function). It's value on a given $n$-tuple is described as $$ \varphi^{-1}(x^1,...,x^n). $$ Here I am once again abusing notation, because the $x^\mu$s are now variables in $\mathbb{R}^n$. I am not sure what is the source and topic of your confusion exactly, but I assume it has something to do with this. We use $x^\mu$ both as a (local) function from $M$ to $\mathbb{R}$, and as a coordinate/variable within $\mathbb{R}^n$.
We can also say that $$ p=\varphi^{-1}(x^1(p),...,x^n(p)) $$ and here we didn't abuse notation.
Now let $\psi:V\rightarrow \mathbb{R}^n$ also be a chart function, and let us assume that $U\cap V\neq\emptyset$. To make notation easier, I'll reduce $U$ and $V$ both so that they'll coincide, and I'll just use $U$ for both coordinate domains.
We can write $$ \psi(p)=(y^1(p),...,y^n(p)) $$, so the coordinate functions of $\psi$ are now denoted with $y$. The inverse statement is $$ p=\psi^{-1}(y^1(p),...,y^n(p)), $$ so we once again abuse notation and think of the inverse function $\psi^{-1}$ as being the function of the variables $y^1,...,y^n$.
With these notations, the transition function $\psi\circ\varphi^{-1}$ is an $\mathbb{R}^n\rightarrow\mathbb{R}^n$ function, whose value on a given element of its domain can be written as $$ (\psi\circ\varphi^{-1})(x^1,...,x^n)=(y^1(\varphi^{-1}(x^1,...,x^n)),...,y^1(\varphi^{-1}(x^1,...,x^n)))=(y^1(x^1,...,x^n),...,y^n(x^1,...,x^n)). $$
Here in the last equation, we committed a heinous abuse of notation, and "forgot about" $\varphi^{-1}$ - we simply viewed the transition function $\psi\circ\varphi^{-1}$ as a functional relationship between the dependent variables $y^\mu$ and the independent varibles $x^\mu$.
This abuse of notation is very common in differential geometry - even among mathematicians. Because even simple things would be more or less untractable, if we used a very pedantic notation.
About the actual question: The optimal answer depends on how you like to think about tangent vectors. Usually it either involves point-derivations on the ring of smooth functions, eg. maps of the form $ f\mapsto v(f)\in\mathbb{R} $ such that this map is $\mathbb{R}$-linear and satisfies $$ v(fg)=v(f)g(p)+f(p)v(g), $$ or as tangent vectors to curves, in which case there is an equivalence relation at play between smooth curves passing through $p$.
The connection between the two can be given by the following: If $\gamma$ is a smooth curve on $M$, passing through $p$ at $t_0$, and $f$ is a smooth function defined in an open neighborhood containing $p$, then the tangent vector of the curve $\gamma$ at $p$ is given by the derivation (at $p$) described as $$ v(f)=\left.\frac{d}{dt}(f\circ\gamma)\right|_{t=t_0}. $$ Furthermore, it can be shown that all derivations arise this way.
I'll use this as an example, because it is very easy to examine the behaviour of vector components this way.
Because $\varphi^{-1}\circ\varphi=\text{Id}$ the identity function, we can write $$ \left.\frac{d}{dt}(f\circ\gamma)\right|_{t=t_0}=\left.\frac{d}{dt}(f\circ\varphi^{-1}\circ\varphi\circ\gamma)\right|_{t=t_0}. $$
But what's $f\circ\varphi^{-1}$? It is the multivariable function that maps the $x$-coordinates to numbers instead of abstract points $p$. And what is $\varphi\circ\gamma$? It is the $\mathbb{R}^n$-valued curve $(\varphi\circ\gamma)(t)=(x^1(t),...,x^n(t))$ (warning!! Heavy abuse of notation here!) that describes a one-parameter family of $x$-coordinates instead of abstract $p$-points!
In particular, we can use the usual chain-rule of ordinary calculus to evaluate this derivative, and we get $$ \left.\frac{d}{dt}(f\circ\varphi^{-1}\circ\varphi\circ\gamma)\right|_{t=t_0}=\frac{\partial(f\circ\varphi^{-1})}{\partial x^\mu}\frac{d (x^\mu\circ\gamma)}{d t}=\frac{\partial f}{\partial x^\mu}\frac{d x^\mu}{dt}, $$ where 1) all derivatives are evalued where needed, 2) in the last equation we did a massive abuse of notation once again, 3) the summation convention is in effect.
But this is of cource $v(f)$, so we can "decouple" $f$ from this, and write $v$ as $$ v=\frac{d x^\mu}{dt}\frac{\partial}{\partial x^\mu}. $$ Once again, the $t$-derivative is evaluated at the correct place, and we note that rigorously, $\partial/\partial x^\mu$ is not a partial derivative, but a derivation that acts by taking the partial derivative of the function's $x$-coordinate representation (!!!) (so $\partial/\partial x^\mu$ acts on $f$, but the actual partial derivatives act on $f\circ\varphi^{-1}$). Here we can write $$ v=v^\mu\frac{\partial}{\partial x^\mu}, $$ where $v^\mu=dx^\mu/dt|_{t=t_0}$ and we call the $v^\mu$ the components of $v$ in the chart $\varphi$.
We can also check that $$ \frac{\partial}{\partial x^\mu}(x^\nu)=\frac{\partial (x^\nu\circ\varphi^{-1})}{\partial x^\mu}=\delta^\nu_\mu, $$ so we have $$ v^\nu=v(x^\nu). $$
We can then ask what are the components of $v$ with respect to the coordinates $y$? We evaluate $$ v(y^\nu)=v^\mu\frac{\partial}{\partial x^\mu}(y^\nu)=v^\mu\frac{\partial(y^\nu\circ\varphi^{-1})}{\partial x^\mu}=v^\mu\frac{\partial y^\nu}{\partial x^\mu}, $$ where the last equation is - essentially an abuse of notation.
Recall that if $X$ is a vector space with basis $(e_1,\ldots,e_n)$ and corresponding dual basis $(e^1,\ldots,e^n)$, then $w = e^i(w)e_i$, for all $w\in X$.
We apply this for every tangent space of the manifold. The basis is $(\partial/\partial x^1,\ldots,\partial/\partial x^n)$ and the dual is $(dx^1,\ldots,dx^n)$. Meaning that $V^a=dx^a(V)$. Similarly, $V^{'a}=dx^{'a}(V) $. Now, the chain rule says that $$dx^{'a} =\frac{\partial x^{'a}}{\partial x^b} dx^b , $$whence applying it all in $V$ gives$$V^{'a} =\frac{\partial x^{'a}}{\partial x^b} V^b, $$as wanted.