What is the derivative of the expected value of a continuous random variable?
I believe what you need is a direct application of Leibniz's Rule. It requires continuity on the integrand and its partial derivative, so I believe that distributions can be constructed in which this does not hold (though perhaps such cases can be handled by the measure-theoretic formulation).
The question is why $M'(t_0) = E(Xe^{t_0X})$. To answer that, it is reasonable to assume that $M$ is finite and bounded in a neighborhood of $t_0$, say $(t_0-\epsilon, t_0+\epsilon)$. Consider $h_n$ a sequence that goes to $0$ as $n\to \infty$ and note that
$$\frac{M(t_0+h_n)-M(t_0)}{h_n} = E\left(e^{t_0X}\frac{e^{h_nX}-1}{h_n} \right)$$
To conclude it suffices to prove that $\displaystyle \lim_n E\left(e^{t_0X}\frac{e^{h_nX}-1}{h_n} \right) =E(Xe^{t_0X}) $.
By definition of the derivative, $\displaystyle e^{t_0X}\frac{e^{h_nX}-1}{h_n}$ converges pointwise to $Xe^{t_0X}$, so it suffices to dominate it. Note that $$\forall h\in \mathbb R, \forall x\in \mathbb R, \left|\frac{e^{hx}-1}{h}\right| = \left|\int_0^x e^{hu}du\right|\leq |x|e^{|hx|}\leq |x|(e^{-hx}+e^{-hx})$$
hence $$\left|e^{t_0X}\frac{e^{h_nX}-1}{h_n}\right|\leq |X|(e^{(t_0-h_n)X}+e^{(t_0+h_n)X})$$
Let $N$ be such that $n\geq N\implies |h_n|\leq \frac \epsilon 2$. Note that there exists some $A_\epsilon>0$ such that $$|x|\geq A_\epsilon \implies |x|\leq e^{(\epsilon/ 2) |x|}$$ so that $\forall x\in \mathbb R, |x|\leq A_\epsilon + e^{(\epsilon/ 2) |x|}\leq A_\epsilon + e^{-(\epsilon/ 2) x}+e^{(\epsilon/ 2) x}$.
For $n\geq N$, $$\begin{aligned} |X|(e^{(t_0-h_n)X}+e^{(t_0+h_n)X}) &\leq (A_\epsilon + e^{-(\epsilon/ 2) x}+e^{(\epsilon/ 2) x})(e^{(t_0-h_n)X}+e^{(t_0+h_n)X})\\ &= A_\epsilon (e^{(t_0-h_n)X}+e^{(t_0+h_n)X}) \begin{aligned}[t]&+ e^{(t_0-h_n-(\epsilon/ 2))X} + e^{(t_0-h_n+(\epsilon/ 2))X}\\ &+e^{(t_0+h_n-(\epsilon/ 2))X} +e^{(t_0+h_n+(\epsilon/ 2))X}\end{aligned} \end{aligned}$$
By the assumption that $M$ is bounded on $(t_0-\epsilon, t_0+\epsilon)$ and the previous bound, we get the domination needed to apply the dominated convergence theorem.
So $$M'(t_0) = E(Xe^{t_0X})$$
Actually, the reason is that both the operators ${d\over dt}$ and $\int$ are linear operators over the functions space which we will recall as $F$. The reason is easy. For the derivative operator we have $$\forall f_1,f_2\in F\quad,\quad {d\over dt}(af_1+bf_2)=a{d\over dt}f_1+b{d\over dt}f_2$$and for the integral $$\int_X af_1+bf_2=a\int_Xf_1+b\int_Xf_2$$
The general result is that
$$\text{Any two linear operators can be swapped over the operands.}$$
The concept of expected value
As intuition says, to obtain a simple average from a set of data, we sum the data up and divide the result over their total number. Say, the data $\{x_1,x_2,\cdots ,x_n\}$ are given. The average of these data, which we denote by $\bar x$, becomes$$\bar x={1\over n}\sum_{i=1}^{n}x_i$$similarly for a continuous data such as a function like $g(x)$, the average over an interval $[a,b]$ becomes $$\bar g={1\over b-a}\int_a^b g(x)dx$$as long as the integral exists and is bounded.
To some reasons, one subset of data matters to us more than the others so as they must also be more determining in the total average. To respect this case, we define a weighted average with a weight function $f(x)$ to apply to $g(x)$. The weighted will then become$$\bar g_w=\int_a^b{g(x)f(x)dx}$$where$${\int_a^b f(x)=1\\f(x)\ge 0\quad,\quad x\in [a,b]}$$The mathematical expectation of a function $g(x)$ under the PDF of a random variable is then defined as $$E\{g(x)\}=\int_Xg(x)f(x)dx$$where the PDF $f(x)$ actually determines which parts of $g(x)$ matter more and which matter less.