Deriving Lagrangian density for electromagnetic field
Abstract
In the following we'll prove that a compatible Lagrangian density for the electromagnetic field in empty space is \begin{equation} \mathcal{L}_{em}\:=\:\epsilon_{0}\cdot\dfrac{\left|\!\left|\mathbf{E}\right|\!\right|^{2}-c^{2}\left|\!\left|\mathbf{B}\right|\!\right|^{2}}{2}-\rho \phi + \mathbf{j} \boldsymbol{\cdot} \mathbf{A} \tag{045} \end{equation} that is the Euler-Langrange equations produced from this Lagrangian are the Maxwell equations for the electromagnetic field.
This Lagrangian density is derived by a trial and error (1) procedure, not by guessing.
1. Introduction
The Maxwell's differential equations of electromagnetic field in empty space are \begin{align} \boldsymbol{\nabla} \boldsymbol{\times} \mathbf{E} & = -\frac{\partial \mathbf{B}}{\partial t} \tag{001a}\\ \boldsymbol{\nabla} \boldsymbol{\times} \mathbf{B} & = \mu_{0}\mathbf{j}+\frac{1}{c^{2}}\frac{\partial \mathbf{E}}{\partial t} \tag{001b}\\ \nabla \boldsymbol{\cdot} \mathbf{E} & = \frac{\rho}{\epsilon_{0}} \tag{001c}\\ \nabla \boldsymbol{\cdot}\mathbf{B}& = 0 \tag{001d} \end{align} where $\: \mathbf{E} =$ electric field intensity vector, $\:\mathbf{B}=$ magnetic-flux density vector, $\:\rho=$ electric charge density, $\:\mathbf{j} =$ electric current density vector. All quantities are functions of the three space coordinates $\:\left( x_{1},x_{2},x_{3}\right) \equiv \left( x,y,z\right)\:$ and time $\:t \equiv x_{4}\:$.
From equation (001d) the magnetic-flux vector $\:\mathbf{B}\:$ may be expressed as the curl of a vector potential $\:\mathbf{A}\:$ \begin{equation} \mathbf{B}=\boldsymbol{\nabla} \boldsymbol{\times} \mathbf{A} \tag{002} \end{equation} and from (002) equation (001a) yields \begin{equation} \boldsymbol{\nabla} \boldsymbol{\times}\left(\mathbf{E}+\frac{\partial \mathbf{A}}{\partial t}\right) =\boldsymbol{0} \tag{003} \end{equation} so the parentheses term may be expressed as the gradient of a scalar function \begin{equation*} \mathbf{E}+\frac{\partial \mathbf{A}}{\partial t} =-\boldsymbol{\nabla}\phi \end{equation*} that is \begin{equation} \mathbf{E} =-\boldsymbol{\nabla}\phi -\frac{\partial \mathbf{A}}{\partial t} \tag{004} \end{equation} So the six scalar variables, the components of vectors $\:\mathbf{E}\:$ and$\:\mathbf{B}\:$, can be expressed as functions of 4 scalar variables, the scalar potential $\:\phi\:$ and three components of vector potential $\:\mathbf{A}$.
Inserting the expressions of $\:\mathbf{E}\:$ and $\:\mathbf{B}\:$, equations (002) and (004) respectively, in equations (001b) and (001c) we have
\begin{equation}
\boldsymbol{\nabla} \boldsymbol{\times} \left(\boldsymbol{\nabla} \boldsymbol{\times} \mathbf{A}\right) =\mu_{0}\mathbf{j}+\frac{1}{c^{2}}\frac{\partial }{\partial t}\left(-\boldsymbol{\nabla}\phi -\frac{\partial \mathbf{A}}{\partial t}\right)
\tag{005}
\end{equation}
and
\begin{equation}
\boxed{\:
-\nabla^{2}\phi-\frac{\partial }{\partial t}\left(\nabla \boldsymbol{\cdot}\mathbf{A}\right) =\frac{\rho}{\epsilon_{0}}
\:}
\tag{006}
\end{equation}
Given that
\begin{equation}
\boldsymbol{\nabla} \boldsymbol{\times} \left( \boldsymbol{\nabla} \boldsymbol{\times} \mathbf{A}\right) =\boldsymbol{\nabla}\left(\nabla \boldsymbol{\cdot}\mathbf{A}\right)- \nabla^{2}\mathbf{A}
\tag{007}
\end{equation}
equation (005) yields
\begin{equation}
\boxed{\:
\frac{1}{c^{2}}\frac{\partial^{2}\mathbf{A}}{\partial t^{2}}-\nabla^{2}\mathbf{A}+ \boldsymbol{\nabla}\left(\nabla \boldsymbol{\cdot} \mathbf{A}+\frac{1}{c^{2}}\frac{\partial \phi}{\partial t}\right) =\mu_{0}\mathbf{j}
\:}
\tag{008}
\end{equation}
2. The Euler-Lagrange equations of EM Field
Now, our main task is to find a Lagrangian density $\:\mathcal{L}\:$, function of the four ''field coordinates'' and their 1rst order derivatives
\begin{equation}
\mathcal{L}=\mathcal{L}\left(\eta_{\jmath}, \overset{\centerdot}{\eta}_{\jmath}, \boldsymbol{\nabla}\eta_{\jmath}\right) \qquad \left(\jmath=1,2,3,4\right)
\tag{009}
\end{equation}
such that the four scalar electromagnetic field equations (006) and (008) are derived from the Lagrange equations
\begin{equation}
\frac{\partial }{\partial t}\left[\frac{\partial \mathcal{L}}{\partial \left(\dfrac{\partial \eta_{\jmath}}{\partial t}\right)}\right]+\sum_{k=1}^{k=3}\frac{\partial }{\partial x_{k}}\left[\frac{\partial \mathcal{L}}{\partial \left(\dfrac{\partial \eta_{\jmath}}{\partial x_{k}}\right)}\right]- \frac{\partial \mathcal{L}}{\partial \eta_{\jmath}}=0\:, \quad \left(\jmath=1,2,3,4\right)
\tag{010}
\end{equation}
simplified in notation to
\begin{equation}
\boxed{\:
\dfrac{\partial }{\partial t}\left(\dfrac{\partial \mathcal{L}}{\partial \overset{\centerdot}{\eta}_{\jmath}}\right) + \nabla \boldsymbol{\cdot}\left[\dfrac{\partial \mathcal{L}}{\partial \left(\boldsymbol{\nabla}\eta_{\jmath}\right)}\right]- \frac{\partial \mathcal{L}}{\partial \eta_{\jmath}}=0, \quad \left(\jmath=1,2,3,4\right)
\:}
\tag{011}
\end{equation}
Here the Lagrangian density $\:\mathcal{L}\:$ is a function of
- the four ''field coordinates''
\begin{align} \eta_{1}&=\mathrm{A}_1\left( x_{1},x_{2},x_{3},t\right) \tag{012.1}\\ \eta_{2}&=\mathrm{A}_2\left( x_{1},x_{2},x_{3},t\right) \tag{012.2}\\ \eta_{3}&=\mathrm{A}_3\left( x_{1},x_{2},x_{3},t\right) \tag{012.3}\\ \eta_{4}&=\:\;\phi \left( x_{1},x_{2},x_{3},t\right) \tag{012.4} \end{align}
- their time derivatives
\begin{align} \overset{\centerdot}{\eta}_{1} & \equiv \dfrac{\partial \eta_{1}}{\partial t} =\dfrac{\partial \mathrm{A}_{1}}{\partial t}\equiv\overset{\centerdot}{\mathrm{A}}_{1} \tag{013.1}\\ \overset{\centerdot}{\eta}_{2} & \equiv \dfrac{\partial \eta_{2}}{\partial t} =\dfrac{\partial \mathrm{A}_{2}}{\partial t}\equiv \overset{\centerdot}{\mathrm{A}}_{2} \tag{013.2}\\ \overset{\centerdot}{\eta}_{3} & \equiv \dfrac{\partial \eta_{3}}{\partial t} =\dfrac{\partial \mathrm{A}_{3}}{\partial t}\equiv\overset{\centerdot}{\mathrm{A}}_{3} \tag{013.3}\\ \overset{\centerdot}{\eta}_{4} & \equiv \dfrac{\partial \eta_{4}}{\partial t} =\dfrac{\partial \phi}{\partial t}\equiv\overset{\centerdot}{\phi} \tag{013.4} \end{align}
and
- their gradients
\begin{equation} \begin{array}{cccc} \boldsymbol{\nabla}\eta_{1}=\boldsymbol{\nabla}\mathrm{A}_1 \:,\: & \boldsymbol{\nabla}\eta_{2}=\boldsymbol{\nabla}\mathrm{A}_{2} \:,\: \boldsymbol{\nabla}\eta_{3}=\boldsymbol{\nabla}\mathrm{A}_3 \:,\: & \boldsymbol{\nabla}\eta_{4}=\boldsymbol{\nabla}\phi \end{array} \tag{014} \end{equation}
We express equations (006) and (008) in forms that are similar to the Lagrange equations (011) \begin{equation} \boxed{\: \dfrac{\partial }{\partial t}\left(\nabla \boldsymbol{\cdot} \mathbf{A}\right)+\nabla \boldsymbol{\cdot} \left(\boldsymbol{\nabla}\phi \right) -\left(-\frac{\rho}{\epsilon_{0}}\right) =0 \:} \tag{015} \end{equation} and \begin{equation} \boxed{\: \dfrac{\partial}{\partial t}\left(\frac{\partial \mathrm{A}_{k}}{\partial t}+\frac{\partial \phi}{\partial x_{k}}\right)+\nabla \boldsymbol{\cdot} \left[c^{2}\left(\frac{\partial \mathbf{A}}{\partial x_{k}}- \boldsymbol{\nabla}\mathrm{A}_{k}\right)\right] -\frac{\mathrm{j}_{k}}{\epsilon_{0}}=0 \:} \tag{016} \end{equation} The Lagrange equation (011) for $\:\jmath=4\:$, that is for $\:\eta_{4}=\phi \:$, is \begin{equation} \frac{\partial }{\partial t}\left(\frac{\partial \mathcal{L}}{\partial \overset{\centerdot}{\phi}}\right) + \nabla \boldsymbol{\cdot} \left[\frac{\partial \mathcal{L}}{\partial \left(\boldsymbol{\nabla}\phi\right)}\right]- \frac{\partial \mathcal{L}}{\partial \phi}=0 \tag{017} \end{equation}
Comparing equations (015) and (017), we note that the first could be derived from the second if
\begin{equation}
\dfrac{\partial \mathcal{L}}{\partial \overset{\centerdot}{\phi}}=\nabla \boldsymbol{\cdot} \mathbf{A}\:, \qquad \dfrac{\partial \mathcal{L}}{\partial \left(\boldsymbol{\nabla}\phi\right)}=\boldsymbol{\nabla}\phi \:, \qquad \frac{\partial \mathcal{L}}{\partial \phi}=-\frac{\rho}{\epsilon_{0}}
\tag{018}
\end{equation}
so that the Lagrangian density $\:\mathcal{L}\:$ must contain respectively the terms
\begin{equation}
\mathcal{L}_{\boldsymbol{\alpha_{1}}}\equiv\left(\nabla \boldsymbol{\cdot} \mathbf{A}\right)\overset{\centerdot}{\phi}\:, \qquad \mathcal{L}_{\boldsymbol{\alpha_{2}}}\equiv\tfrac{1}{2}\Vert \boldsymbol{\nabla}\phi \Vert^{2}\:, \qquad \mathcal{L}_{\boldsymbol{\alpha_{3}}}\equiv-\frac{\rho \phi}{\epsilon_{0}}
\tag{019}
\end{equation}
and consequently their sum
\begin{equation}
\mathcal{L}_{\boldsymbol{\alpha}}=\mathcal{L}_{\boldsymbol{\alpha_{1}}}+\mathcal{L}_{\boldsymbol{\alpha_{2}}} +\mathcal{L}_{\boldsymbol{\alpha_{3}}}=\left(\nabla \boldsymbol{\cdot} \mathbf{A}\right)\overset{\centerdot}{\phi}+\tfrac{1}{2}\Vert \boldsymbol{\nabla}\phi \Vert^{2}-\frac{\rho \phi}{\epsilon_{0}}
\tag{020}
\end{equation}
We suppose that an appropriate Lagrangian density $\:\mathcal{L}\:$ would be of the form \begin{equation} \mathcal{L}=\mathcal{L}_{\boldsymbol{\alpha}}+\mathcal{L}_{\boldsymbol{\beta}} \tag{021} \end{equation} and since $\:\mathcal{L}_{\boldsymbol{\alpha}}\:$ produces equation (015), we expect that $\:\mathcal{L}_{\boldsymbol{\beta}}\:$, to be determined, will produce equations (016). This expectation would be right if equations (015) and (016) were decoupled, for example if the first contains $\:\phi $-terms only and the second $\:\mathbf{A} $-terms only. But here this is not the case : $\:\mathcal{L}_{\boldsymbol{\alpha}}\:$ as containing $\:\mathbf{A} $-terms would participate to the production of equations (016) and moreover $\:\mathcal{L}_{\boldsymbol{\beta}}\:$ would participate to the production of equation (015), possibly destroying mutually the production of the equations as we expected. But here we follow a trial and error procedure, which will direct to the right answer as we'll see in the following.
Now, the Lagrange equations (011) for $\:\jmath=k=1,2,3\:$, that is for $\:\eta_{k}=\mathrm{A}_{k} \:$, are \begin{equation} \frac{\partial }{\partial t}\left(\dfrac{\partial \mathcal{L}}{\partial \overset{\centerdot}{\mathrm{A}}_{k}}\right) + \nabla \boldsymbol{\cdot} \left[\dfrac{\partial \mathcal{L}}{\partial \left(\boldsymbol{\nabla}\mathrm{A}_{k}\right)}\right]- \frac{\partial \mathcal{L}}{\partial \mathrm{A}_{k}}=0 \tag{022} \end{equation}
Comparing equations (016) and (022), we note that the first could be derived from the second if \begin{equation} \dfrac{\partial \mathcal{L}}{\partial \overset{\centerdot}{\mathrm{A}}_{k}}= \overset{\centerdot}{\mathrm{A}}_{k}+\frac{\partial \phi}{\partial x_{k}}\:, \quad \dfrac{\partial \mathcal{L}}{\partial \left(\boldsymbol{\nabla}\mathrm{A}_{k}\right)}=c^{2}\left(\frac{\partial \mathbf{A}}{\partial x_{k}}- \boldsymbol{\nabla}\mathrm{A}_{k}\right)\:, \quad \frac{\partial \mathcal{L}}{\partial \mathrm{A}}_{k}=\frac{\mathrm{j}_{k}}{\epsilon_{0}} \tag{023} \end{equation}
From the 1rst of equations (023) the $\:\mathcal{L}_{\boldsymbol{\beta}}\:$ part of the Lagrange density $\:\mathcal{L}\:$ must contain the terms \begin{equation} \frac{1}{2}\left\Vert \overset{\centerdot}{\mathrm{A}}_{k}\right\Vert^{2}+\frac{\partial \phi}{\partial x_{k}}\overset{\centerdot}{\mathrm{A}}_{k}\:, \quad k=1,2,3 \tag{024} \end{equation} and so their sum with respect to $\:k\:$ \begin{equation} \mathcal{L}_{\boldsymbol{\beta_{1}}}\equiv \tfrac{1}{2}\left\Vert \mathbf{\dot{A}}\right\Vert^{2}+\boldsymbol{\nabla}\phi \boldsymbol{\cdot} \mathbf{\dot{A}} \tag{025} \end{equation}
From the 2nd of equations (023) the $\:\mathcal{L}_{\boldsymbol{\beta}}\:$ part of the Lagrange density $\:\mathcal{L}\:$ must contain the terms \begin{equation} \tfrac{1}{2}c^{2}\left[\frac{\partial \mathbf{A}}{\partial x_{k}} \boldsymbol{\cdot} \boldsymbol{\nabla}\mathrm{A}_{k} -\Vert \boldsymbol{\nabla}\mathrm{A}_{k}\Vert^{2}\right] \:, \quad k=1,2,3 \tag{026} \end{equation} and so their sum with respect to $\:k\:$ \begin{equation} \mathcal{L}_{\boldsymbol{\beta_{2}}}\equiv\tfrac{1}{2}c^{2}\sum^{k=3}_{k=1}\left[ \frac{\partial \mathbf{A}}{\partial x_{k}}\boldsymbol{\cdot} \boldsymbol{\nabla}\mathrm{A}_{k}-\Vert\boldsymbol{\nabla}\mathrm{A}_{k}\Vert^{2}\right] \tag{027} \end{equation} From the 3rd of equations (023) the $\:\mathcal{L}_{\boldsymbol{\beta}}\:$ part of the Lagrange density $\:\mathcal{L}\:$ must contain the terms \begin{equation} \frac{\mathrm{j}_{k}\mathrm{A}_{k}}{\epsilon_{0}} \:, \quad k=1,2,3 \tag{028} \end{equation} and so their sum with respect to $\:k\:$ \begin{equation} \mathcal{L}_{\boldsymbol{\beta_{3}}}\equiv \frac{\mathbf{j} \boldsymbol{\cdot} \mathbf{A}}{\epsilon_{0}} \tag{029} \end{equation}
From equations (025), (027) and (029) the $\:\mathcal{L}_{\boldsymbol{\beta}}\:$ part of the Lagrange density $\:\mathcal{L}\:$ is \begin{align} \mathcal{L}_{\boldsymbol{\beta}} & = \mathcal{L}_{\boldsymbol{\beta_{1}}}+\mathcal{L}_{\boldsymbol{\beta_{2}}} +\mathcal{L}_{\boldsymbol{\beta_{3}}} \tag{030}\\ & = \tfrac{1}{2}\left\Vert \mathbf{\dot{A}}\right\Vert^{2}+\boldsymbol{\nabla}\phi \boldsymbol{\cdot}\mathbf{\dot{A}}+\tfrac{1}{2}c^{2}\sum^{k=3}_{k=1}\left[ \frac{\partial \mathbf{A}}{\partial x_{k}} \boldsymbol{\cdot} \boldsymbol{\nabla}\mathrm{A}_{k}-\Vert\boldsymbol{\nabla}\mathrm{A}_{k}\Vert^{2}\right]+\frac{\mathbf{j} \boldsymbol{\cdot} \mathbf{A}}{\epsilon_{0}} \nonumber \end{align}
Finally, from the expressions (020) and (030) for the densities $\:\mathcal{L}_{\boldsymbol{\alpha}},\mathcal{L}_{\boldsymbol{\beta}}\:$ the Lagrange density $\:\mathcal{L}=\mathcal{L}_{\boldsymbol{\alpha}}+\mathcal{L}_{\boldsymbol{\beta}}\:$ is \begin{align} \mathcal{L}& = \mathcal{L}_{\boldsymbol{\alpha}} + \mathcal{L}_{\boldsymbol{\beta}} \tag{031}\\ & = \left(\nabla \boldsymbol{\cdot} \mathbf{A}\right)\overset{\centerdot}{\phi}+\tfrac{1}{2}\Vert \boldsymbol{\nabla}\phi \Vert^{2}-\frac{\rho \phi}{\epsilon_{0}} \nonumber\\ & + \tfrac{1}{2}\left\Vert \mathbf{\dot{A}}\right\Vert^{2}+\boldsymbol{\nabla}\phi \boldsymbol{\cdot} \mathbf{\dot{A}}+\tfrac{1}{2}c^{2}\sum^{k=3}_{k=1}\left[ \frac{\partial \mathbf{A}}{\partial x_{k}} \boldsymbol{\cdot} \boldsymbol{\nabla}\mathrm{A}_{k}-\Vert\boldsymbol{\nabla}\mathrm{A}_{k}\Vert^{2}\right]+\frac{\mathbf{j}\boldsymbol{\cdot}\mathbf{A}}{\epsilon_{0}} \nonumber\\ & \text{(this is a wrong Lagrange density)} \nonumber \end{align}
3. Error-Trial-Final Success
Insertion of this Lagrange density expression in the Lagrange equation with respect to $\:\phi \:$, that is equation (017), doesn't yield equation (006) but
\begin{equation}
-\nabla^{2}\phi-\frac{\partial }{\partial t}\left(2\nabla \boldsymbol{\cdot} \mathbf{A}\right) =\frac{\rho}{\epsilon_{0}}\:, \quad (\textbf{wrong})
\tag{032}
\end{equation}
The appearance of an extra $\:\left( \nabla \boldsymbol{\cdot} \mathbf{A}\right) \:$ is due to the term $\:\left( \boldsymbol{\nabla}\phi \boldsymbol{\cdot} \mathbf{\dot{A}}\right) \:$ of $\:\mathcal{L}_{\boldsymbol{\beta}}\:$ and that's why the Lagrange density given by equation (031) is not an appropriate one.
In order to resolve this problem we must look at (015), that is (006), from a different point of view as follows \begin{equation} \nabla \boldsymbol{\cdot}\left(\boldsymbol{\nabla}\phi + \mathbf{\dot{A}}\right) -\left(-\frac{\rho}{\epsilon_{0}}\right) =0 \tag{033} \end{equation}
Comparing equations (033) and (017), we note that the first could be derived from the second if in place of (018) we have
\begin{equation}
\dfrac{\partial \mathcal{L}}{\partial \overset{\centerdot}{\phi}}=0\:, \qquad \dfrac{\partial \mathcal{L}}{\partial \left(\boldsymbol{\nabla}\phi\right)}=\boldsymbol{\nabla}\phi + \mathbf{\dot{A}} \:, \qquad \frac{\partial \mathcal{L}}{\partial \phi}=-\frac{\rho}{\epsilon_{0}}
\tag{034}
\end{equation}
so in place of (019) and (020) respectively the equations
\begin{equation}
\mathcal{L}^{\prime}_{\boldsymbol{\alpha_{1}}}\equiv 0\:, \quad \mathcal{L}^{\prime}_{\boldsymbol{\alpha_{2}}}\equiv\tfrac{1}{2}\Vert \boldsymbol{\nabla}\phi \Vert^{2} +\boldsymbol{\nabla}\phi \boldsymbol{\cdot} \mathbf{\dot{A}}\:, \quad \mathcal{L}^{\prime}_{\boldsymbol{\alpha_{3}}}=\mathcal{L}_{\boldsymbol{\alpha_{3}}}\equiv-\frac{\rho \phi}{\epsilon_{0}}
\tag{035}
\end{equation}
\begin{equation}
\mathcal{L}^{\prime}_{\boldsymbol{\alpha}}=\mathcal{L}^{\prime}_{\boldsymbol{\alpha_{1}}}+\mathcal{L}^{\prime}_{\boldsymbol{\alpha_{2}}} +\mathcal{L}^{\prime}_{\boldsymbol{\alpha_{3}}}=\tfrac{1}{2}\Vert \boldsymbol{\nabla}\phi \Vert^{2}+\boldsymbol{\nabla}\phi \boldsymbol{\cdot} \mathbf{\dot{A}}-\frac{\rho \phi}{\epsilon_{0}}
\tag{036}
\end{equation}
Now, it's necessary to omit from $\:\mathcal{L}_{\boldsymbol{\beta_{1}}}\:$, equation (025), the second term $\:\left( \boldsymbol{\nabla}\phi \boldsymbol{\cdot} \mathbf{\dot{A}}\right) \:$ since it appears in $\:\mathcal{L}^{\prime}_{\boldsymbol{\alpha_{2}}} \:$, see the second of above equations (035).
So we have in place of (025) \begin{equation} \mathcal{L}^{\prime}_{\boldsymbol{\beta_{1}}}\equiv \tfrac{1}{2}\left\Vert \mathbf{\dot{A}}\right\Vert^{2} \tag{037} \end{equation} while $\:\mathcal{L}_{\boldsymbol{\beta_{2}}},\mathcal{L}_{\boldsymbol{\beta_{3}}}\:$ remain unchanged as in equations (027) and (029) \begin{align} \mathcal{L}^{\prime}_{\boldsymbol{\beta_{2}}} &=\mathcal{L}_{\boldsymbol{\beta_{2}}}\equiv\tfrac{1}{2}c^{2}\sum^{k=3}_{k=1}\left[ \frac{\partial \mathbf{A}}{\partial x_{k}}\boldsymbol{\cdot}\boldsymbol{\nabla}\mathrm{A}_{k}-\Vert\boldsymbol{\nabla}\mathrm{A}_{k}\Vert^{2}\right] \tag{038} \\ \mathcal{L}^{\prime}_{\boldsymbol{\beta_{3}}} &=\mathcal{L}_{\boldsymbol{\beta_{3}}}\equiv \frac{\mathbf{j} \boldsymbol{\cdot} \mathbf{A}}{\epsilon_{0}} \tag{039} \end{align}
In place of (030) \begin{align} \mathcal{L}^{\prime}_{\boldsymbol{\beta}} & = \mathcal{L}^{\prime}_{\boldsymbol{\beta_{1}}}+\mathcal{L}^{\prime}_{\boldsymbol{\beta_{2}}} +\mathcal{L}^{\prime}_{\boldsymbol{\beta_{3}}} \tag{040} \\ & = \tfrac{1}{2}\left\Vert \mathbf{\dot{A}}\right\Vert^{2}+\tfrac{1}{2}c^{2}\sum^{k=3}_{k=1}\left[ \frac{\partial \mathbf{A}}{\partial x_{k}} \boldsymbol{\cdot} \boldsymbol{\nabla}\mathrm{A}_{k}-\Vert\boldsymbol{\nabla}\mathrm{A}_{k}\Vert^{2}\right]+\frac{\mathbf{j} \boldsymbol{\cdot} \mathbf{A}}{\epsilon_{0}} \nonumber \end{align} and finally for the new Lagrangian density we have in place of (031)
\begin{align} \mathcal{L}^{\prime}& = \mathcal{L}^{\prime}_{\boldsymbol{\alpha}} + \mathcal{L}^{\prime}_{\boldsymbol{\beta}} \tag{041} \\ & = \tfrac{1}{2}\Vert \boldsymbol{\nabla}\phi \Vert^{2} +\boldsymbol{\nabla}\phi \boldsymbol{\cdot} \mathbf{\dot{A}} -\frac{\rho \phi}{\epsilon_{0}} \nonumber\\ & + \tfrac{1}{2}\left\Vert \mathbf{\dot{A}}\right\Vert^{2}+\tfrac{1}{2}c^{2}\sum^{k=3}_{k=1}\left[ \frac{\partial \mathbf{A}}{\partial x_{k}} \boldsymbol{\cdot} \boldsymbol{\nabla}\mathrm{A}_{k} -\Vert\boldsymbol{\nabla}\mathrm{A}_{k}\Vert^{2}\right]+\frac{\mathbf{j} \boldsymbol{\cdot} \mathbf{A}}{\epsilon_{0}} \nonumber \end{align}
Density $\:\mathcal{L}^{\prime}\:$ of (041) is obtained from density $\:\mathcal{L}\:$ of (031) if we omit the term $\:\left(\nabla \boldsymbol{\cdot} \mathbf{A}\right)\overset{\centerdot}{\phi}\:$. So $\:\mathcal{L}^{\prime}\:$ is independent of $\:\overset{\centerdot}{\phi}$.
In the following equations the brace over the left 3 terms groups that part of the density $\:\mathcal{L}^{\prime}\:$ that essentially participates to the production of the electromagnetic equation (006) from the Lagrange equation with respect to $\:\phi \:$, equation (017), while the brace under the right 4 terms groups that part of the density $\:\mathcal{L}^{\prime}\:$ that essentially participates to the production of the electromagnetic equations (008) from the Lagrange equations with respect to $\:\mathrm{A}_{1},\mathrm{A}_{2},\mathrm{A}_{3} \:$, equation (022).
\begin{equation*} \mathcal{L}^{\prime}=\overbrace{\tfrac{1}{2}\Vert \boldsymbol{\nabla}\phi \Vert^{2}-\frac{\rho \phi}{\epsilon_{0}}+\boldsymbol{\nabla}\phi \boldsymbol{\cdot} \mathbf{\dot{A}}}^{\text{with respect to }\phi}+\tfrac{1}{2}\left\Vert \mathbf{\dot{A}}\right\Vert^{2}+\tfrac{1}{2}c^{2}\sum^{k=3}_{k=1}\left[\frac{\partial \mathbf{A}}{\partial x_{k}} \boldsymbol{\cdot} \boldsymbol{\nabla}\mathrm{A}_{k}-\Vert \boldsymbol{\nabla}\mathrm{A}_{k}\Vert^{2}\right]+\frac{\mathbf{j} \boldsymbol{\cdot} \mathbf{A}}{\epsilon_{0}} \end{equation*}
\begin{equation*} \mathcal{L}^{\prime}=\tfrac{1}{2}\Vert \boldsymbol{\nabla}\phi \Vert^{2}-\frac{\rho \phi}{\epsilon_{0}}+\underbrace{\boldsymbol{\nabla}\phi\boldsymbol{\cdot} \mathbf{\dot{A}}+\tfrac{1}{2}\left\Vert \mathbf{\dot{A}}\right\Vert^{2}+\tfrac{1}{2}c^{2}\sum^{k=3}_{k=1}\left[\frac{\partial \mathbf{A}}{\partial x_{k}} \boldsymbol{\cdot} \boldsymbol{\nabla}\mathrm{A}_{k}-\Vert\boldsymbol{\nabla}\mathrm{A}_{k}\Vert^{2}\right]+\frac{\mathbf{j}\boldsymbol{\cdot} \mathbf{A}}{\epsilon_{0}}}_{\text{with respect to }\mathbf{A}} \end{equation*}
Note the common term $\:\left( \boldsymbol{\nabla}\phi \boldsymbol{\cdot} \mathbf{\dot{A}}\right)$.
Reordering the terms in the expression (041) of the density $\:\mathcal{L}^{\prime}\:$ we have \begin{equation} \mathcal{L}^{\prime}=\underbrace{\tfrac{1}{2}\left\Vert \mathbf{\dot{A}}\right\Vert^{2}+\tfrac{1}{2}\Vert \boldsymbol{\nabla}\phi \Vert^{2}+\boldsymbol{\nabla}\phi \boldsymbol{\cdot} \mathbf{\dot{A}}}_{\tfrac{1}{2}\left\Vert - \boldsymbol{\nabla}\phi -\frac{\partial \mathbf{A}}{\partial t}\right\Vert^{2}}-\tfrac{1}{2}c^{2}\underbrace{\sum^{k=3}_{k=1}\left[\Vert \boldsymbol{\nabla}\mathrm{A}_{k}\Vert^{2}-\frac{\partial \mathbf{A}}{\partial x_{k}}\boldsymbol{\cdot} \boldsymbol{\nabla}\mathrm{A}_{k}\right]}_{\left\Vert \boldsymbol{\nabla} \boldsymbol{\times} \mathbf{A}\right\Vert^{2}}+\frac{1}{\epsilon_{0}}\left( -\rho \phi + \mathbf{j}\boldsymbol{\cdot} \mathbf{A}\right) \end{equation} \begin{equation} ----------------- \tag{042} \end{equation}
that is \begin{equation} \mathcal{L}^{\prime}=\tfrac{1}{2}\left|\!\left|- \boldsymbol{\nabla}\phi -\frac{\partial \mathbf{A}}{\partial t}\right|\!\right|^{2}-\tfrac{1}{2}c^{2}\left|\!\left| \boldsymbol{\nabla} \boldsymbol{\times} \mathbf{A}\right|\!\right|^{2}+\frac{1}{\epsilon_{0}}\left( -\rho \phi + \mathbf{j} \boldsymbol{\cdot} \mathbf{A}\right) \tag{043} \end{equation} or \begin{equation} \mathcal{L}^{\prime}=\frac{\left|\!\left|\mathbf{E}\right|\!\right|^{2}-c^{2}\left|\!\left|\mathbf{B}\right|\!\right|^{2}}{2}+\frac{1}{\epsilon_{0}}\left( -\rho \phi + \mathbf{j}\boldsymbol{\cdot}\mathbf{A}\right) \tag{044} \end{equation}
Now, if the density $\:\mathcal{L}^{\prime}\:$ must have dimensions of energy per unit volume we define $\:\mathcal{L}_{em}=\epsilon_{0}\mathcal{L}^{\prime} \:$ so \begin{equation} \boxed{\: \mathcal{L}_{em}\:=\:\epsilon_{0}\cdot\dfrac{\left|\!\left|\mathbf{E}\right|\!\right|^{2}-c^{2}\left|\!\left|\mathbf{B}\right|\!\right|^{2}}{2}-\rho \phi + \mathbf{j} \boldsymbol{\cdot} \mathbf{A} \:} \tag{045} \end{equation} having in mind that \begin{align} \left\Vert\mathbf{E}\right\Vert^{2} & = \left\Vert - \boldsymbol{\nabla}\phi -\dfrac{\partial \mathbf{A}}{\partial t}\right\Vert^{2} = \left\Vert \mathbf{\dot{A}}\right\Vert^{2}+\Vert \boldsymbol{\nabla}\phi \Vert^{2}+2\left(\boldsymbol{\nabla}\phi \boldsymbol{\cdot} \mathbf{\dot{A}}\right) \tag{046a}\\ & \nonumber\\ \left\Vert\mathbf{B}\right\Vert^{2} & = \left\Vert\boldsymbol{\nabla} \boldsymbol{\times} \mathbf{A}\right\Vert^{2}=\sum^{k=3}_{k=1}\left[\Vert \boldsymbol{\nabla}\mathrm{A}_{k}\Vert^{2}-\dfrac{\partial \mathbf{A}}{\partial x_{k}}\boldsymbol{\cdot} \boldsymbol{\nabla}\mathrm{A}_{k}\right] \tag{046b} \end{align}
The scalar $\:\left(\left|\!\left|\mathbf{E}\right|\!\right|^{2}-c^{2}\left|\!\left|\mathbf{B}\right|\!\right|^{2}\right)\:$ is one of the two Lorentz invariants (2) of the field (the other is $\:\mathbf{E}\boldsymbol{\cdot}\mathbf{B}$) essentially equal to a constant times $\:\mathcal{E}_{\mu\nu}\mathcal{E}^{\mu\nu}\:$, where $\:\mathcal{E}^{\mu\nu}\:$ the antisymmetric field(2) tensor.
On the other hand, the scalar $\: \left(-\rho \phi + \mathbf{j} \boldsymbol{\cdot} \mathbf{A}\right)\:$ is essentially the inner product $\:J_{\mu}A^{\mu}\:$ in Minkowski space of two 4-vectors : the 4-current density $\:J^{\mu}=\left(c\rho,\mathbf{j}\right)\:$ and the 4-potential $\:A^{\mu}=\left(\phi/c,\mathbf{A}\right)\:$ , a Lorentz invariant scalar too.
So, the Lagrange density $\:\mathcal{L}_{em}\:$ in equation (045) is Lorentz invariant.
(1) By a trial and error procedure I found the Lagrangian in a more difficult and complicated case : see my answer as user82794 here Obtain the Lagrangian from the system of coupled equation
(2) Following W.Rindler in "Introduction to Special Relativity" Ed.1982, this tensor is derived in equation (38.15) \begin{equation} \mathcal{E}_{\mu\nu}= \begin{bmatrix} 0 & E_{1} & E_{2} & E_{3} \\ -E_{1} & 0 & -cB_{3} & cB_{2} \\ -E_{2} & cB_{3} & 0 & -cB_{1} \\ -E_{3} & -cB_{2} & cB_{1} & 0 \end{bmatrix} \quad \text{so} \quad \mathcal{E}^{\mu\nu}= \begin{bmatrix} 0 & -E_{1} & - E_{2} & -E_{3} \\ E_{1} & 0 & -cB_{3} & cB_{2} \\ E_{2} & cB_{3} & 0 & -cB_{1} \\ E_{3} & -cB_{2} & cB_{1} & 0 \end{bmatrix} \tag{38.15} \end{equation} which by making the (duality) replacements $\:\mathbf{E}\to -c\mathbf{B}\:$ and $\:c\mathbf{B}\to \mathbf{E}\:$ yields \begin{equation} \mathcal{B}_{\mu\nu}= \begin{bmatrix} 0 & -cB_{1} & -cB_{2} & - cB_{3} \\ cB_{1} & 0 & -E_{3} & E_{2} \\ cB_{2} & cE_{3} & 0 & -E_{1} \\ cB_{3} & -E_{2} & E_{1} & 0 \end{bmatrix} \quad \text{so} \quad \mathcal{B}^{\mu\nu}= \begin{bmatrix} 0 & cB_{1} & cB_{2} & cB_{3} \\ -cB_{1} & 0 & -E_{3} & E_{2} \\ -cB_{2} & cE_{3} & 0 & -E_{1} \\ -cB_{3} & -E_{2} & E_{1} & 0 \end{bmatrix} \tag{39.05} \end{equation} The two invariants of $\:\mathcal{E}^{\mu\nu}\:$-immediately recognizable as such from their mode of formation - can be expressed as follows: \begin{align} X & =\dfrac{1}{2}\mathcal{E}_{\mu\nu}\mathcal{E}^{\mu\nu}=-\dfrac{1}{2}\mathcal{B}_{\mu\nu}\mathcal{B}^{\mu\nu}=c^{2}\left|\!\left|\mathbf{B}\right|\!\right|^{2}-\left|\!\left|\mathbf{E}\right|\!\right|^{2} \tag{39.06}\\ Y & =\dfrac{1}{4}\mathcal{B}_{\mu\nu}\mathcal{E}^{\mu\nu}=c\mathbf{B}\boldsymbol{\cdot}\mathbf{E} \tag{39.07} \end{align}
Ultimately the reasoning must be that (as you stated) it must be constructed so the Euler-Lagrange equations are Maxwell's equations. So in a sense you have to guess the Lagrangian that produces this as is done here for example.
However you can get some guidance from the fact that we need to construct a Lagrangian for a massless non self interacting field. So we need a gauge and lorentz invariant combination of the 4-vector potential which only has a kinetic term (quadratic in derivatives of the fields). You are then not left with many options apart from $F^{\mu\nu}F_{\mu\nu}$. The source term is then trivial to add in if needed.
I'm almost 100% sure the Lagrangian is an assumption of the theory. It cannot be derived. I don't have any references for this claim. I just know that from every course I've been taught and every book I've read, the Lagrangian (assuming it is being used at all) is where you start. It is the "first principle" in this case.