What is the motivation of defining weak derivative as it is?
Laurent Schwartz had the idea. Distributions are linear functionals defined on test functions. They generalize functions in this sense. If $u$ is an actual function, then the corresponding distribution is the linear functional $$ \phi \mapsto \int u \;\phi\;dx $$ A linear functional not of this form may still be considered a generalized function.
Now if $u$ is a function and the derivative $v = D^\alpha u$ actually exists, then integration by parts shows $$ \int_{\Omega}v\;\phi\,dx = (-1)^{|\alpha|}\int_{\Omega} u \,D^{\alpha}\phi\;dx $$ [no boundary terms because $\phi$ vanishes outside a compact subset of $\Omega$]
Next, if $u$ s a function but $D^\alpha u$ does not exist in the classical sense, it is still true that the functional $$ \phi \mapsto (-1)^{|\alpha|}\int_{\Omega} u\; \,D^{\alpha}\phi\;dx $$ makes sense and defines a generalized function (a.k.a.Schwartz distribution). So things work out if we go ahead and call this functional $D^\alpha u$. [It is not a function, but a distribution.]