Rigorous procedure of gluing together two spacetimes
Boy oh boy you're in luck because it's one of my favorite topic!
You are indeed correct, there isn't a whole lot of literature on the topic of rigorous treatment of spacetime gluing. There's a whole pile of articles and books to peruse to get some unified picture of what this involves. Not particularly rigorous by the way but it's a good idea to peruse Synge's paper on the topic, as it's one of the very first one and doesn't assume any pre-existing notions on the topic.
First of all, let's consider the topic of gluing manifolds together. Gluing manifolds can refer to two different things : either you are identifying points of a boundary, or points of an open set. In most contexts in GR, we're talking about identifying boundary points, so let's focus on that. Some good references on the topic are Wall and Milnor.
Generally speaking, gluing manifolds is talked about with two manifolds with boundaries, but for the sake of generality, I'll talk about gluing a single (possibly disconnected) manifold, which gives more general applications.
Take a manifold $M$, which contains two connected elements $S_1$ and $S_2$ in its boundary $\partial M$. $S_1$ and $S_2$ here are diffeomorphic. Via the collared neighbourhood theorem, both those boundaries have a collared neighbourhood, that is, a neighbourhood diffeomorphic to $S_i \times [0,1)$ (basically define a Riemannian metric on the manifold, and do a map from geodesics orthogonal to the boundary). We have two maps :
$$c_i : S_i \times [0,1) \to M$$
As the boundaries are themselves manifolds, you can also take the atlas of the boundaries to form a map
$$c_i : \mathbb{R}^{n-1} \times [0,1) \to M$$
This will be useful to form the atlas later on. Take a gluing function $h$, defined as a diffeomorphism
$$h : S_1 \to S_2$$
and then define an equivalence relation
$$\sim_h = \left\{ (p,q) | \text{if $p \in S_1$ and $q \in S_2$, $h(p) = q$, else $p = q$} \right\}$$
The glued manifold is then the quotient
$$\bar{M} = M / \sim_h$$
This can be shown to be a manifold. Every point from the original manifold belongs to the same atlas as previously, so we only need to do a chart for the junction, which can be done by considering the two collared neighbourhoods and the map
\begin{equation} \begin{array}{r@{}l} c : \mathbb{R}^{n-1} \times (-1,1) &{}\to M\\ (x, r) &{}\mapsto c(x,r) = \begin{cases} c_1(x,r) & r \geq 0 \\ c_2(h(x),-r) & r \leq 0 \end{cases} \end{array} \end{equation}
The overlap with other charts only happens for $r > 0$ or $r < 0$, so that the transition is always smooth. We therefore have a smooth manifold out of it.
If we want to get the gluing of a manifold that already has a metric on it, things get trickier. We need to not only glue the manifold but also the tensor bundle. If the original manifold has some tensor bundle $(T^r_sM, M, \pi)$, then we'll consider the glued manifold
$$T^r_s M / \sim_\psi$$
where $\psi$ is similarly a diffeomorphism on the boundary of $T^r_s M$, such that $\psi = h_*$, the pushforward of $h$, gluing the boundaries $\pi^{-1}(S_i)$ together. The original metric $g$ gives rise to a continuous metric $\bar{g}$ if $h_* g(S_1) = g(S_2)$. You can find some more details on the topic in Takiguchi.
From that point on, we are dealing with a $C^0$ metric most of the time (I suppose technically you could even dealt with a discontinuous metric if you wanted to, but as far as I know nobody bothered with that). This is bad news for a whole variety of reasons and from that point onward things are getting awful. We can't use most of differential geometry as it is usually taught from that point onward.
First, as the metric is simply continuous, we can't assume differentiability, so that we have to resort to weak derivatives, the theory for which can be found in Geroch and Traschen. Roughly speaking, assuming all things orientable (so that integration on tensor fields make sense), we define tensor test fields in the same manner as scalar ones. A tensor test field $\mathfrak{t}$ is a tensor density of weight $-1$ and of rank $(r,s)$ with compact support, so that there is some compact subset $\Omega$ of $M$ such that
$$\mathfrak{t}(M \setminus \Omega) = 0$$
A tensor distribution of rank $(s,r)$ is then a linear functional from tensor densities of rank $(r,s)$ to $\mathbb{R}$.
\begin{array}{r@{}l} T : \mathscr T^r_s(M) &{}\to \mathbb{R}\\ \mathfrak{t} &{}\mapsto T(\mathfrak{t}) \end{array}
Similarly to distributions, any tensor field $T \in T^s_r$ can be mapped to a distribution via the map
\begin{equation} T[\mathfrak{t}] = \int {T^{abc...}}_{\alpha\beta\gamma...} {\mathfrak{t}^{\alpha\beta\gamma...}}_{abc...} \end{equation}
Then we will consider the Geroch-Traschen class of metrics : if a metric is a locally bounded, $C^0$, admits an inverse that is also locally bounded, and its first weak derivative is locally square integrable, then it is of the Geroch-Traschen class. You can check that this means that the Riemann tensor is then a tensor distribution.
There is a theorem (found in said paper) which says that, given a distributional metric of this class, the distribution of matter must be concentrated on a hypersurface, which is quite nice as it is what we'll get here. If the metric was any less regular we'd have to deal with even worse things such as Colombeau algebras, as can happen for such things as string distributions.
Alright then. As A.V.S. mentionned, Mars and Senovilla's paper on the topic is a very good paper for what happens to the metric at this point, as well as Clarke and Dray. You may have noticed that, from the coordinate patch around the junction, we have a coordinate $r \in (-1,1)$ such that its negative values are entirely in the first region of the junction while positive values are in the second. On that patch, we can define the metric as
$$g = \xi^+ g^+ + \xi^- g^-$$
where $g^{\pm}$ is the value of the metric in the first/second region of the junction, and $\xi^\pm$ is an indicator function, such that $\xi^+ = \theta(r)$ and $\xi^- = (1 - \theta(r))$, $\theta$ the step function such that $\theta(0) = 1/2$.
The derivatives of the metric are then
$$g_{ab,c} = \theta(r) g^+_{ab,c} + (1 - \theta(r)) g^-_{ab,c} + \theta_{,c}(r) (g^+_{ab} - g^-_{ab})$$
By continuity, the third term disappears since the two quantities are equal on the support of $\theta_,c$. From there we can define the Christoffel symbols as
$${\Gamma^a}_{bc} = \theta(r) {\Gamma^{a}}^+_{bc} + (1 - \theta(r)) {\Gamma^{a}}^-_{bc}$$
Now let's consider coordinates in a bit more detail for convenience here. We have a hypersurface $S_1 \sim S_2$, with some coordinate system $x_i \in \mathbb{R}^{n-1}$, as well as the coordinate $r$ defined by the distance to that hypersurface along orthogonal geodesics. Let's assume that the hypersurface is timelike and the normal vector $n^\mu$ is spacelike here (I do it because it's the case for all my favorite spacetimes. The process is very much the same if it is reversed, and a bit more complex if null).
From Clarke and Dray, you can check out that there is quite a long and tedious proof in section 3 in which you can construct a local coordinate system such that, in this atlas, the metric is of the form
$$ds^2 = dr^2 + {}^{(n-1)}g_{ab} dx^a dx^b$$
with $x$ corresponding to coordinates on the hypersurface for $r = 0$.
This helps out because it allows us to compute the discontinuity of the derivative as
\begin{equation} [g_{ab,c}] = \gamma_{ab} n_c \end{equation}
for some tensor $\gamma$. Then we have that the discontinuity of the Christoffel symbols is
\begin{equation} [{\Gamma^\sigma}_{\mu\nu}] = n_\mu {\gamma^\sigma}_\nu + n_\nu {\gamma^\sigma}_\mu - n^\sigma \gamma_{\mu\nu} \end{equation}
And, through some additional computations, the Ricci tensor and Ricci scalar end up as
\begin{equation} R_{\mu\nu} = \xi^+ R^+_{\mu\nu} + \xi^- R^-_{\mu\nu} + \delta \rho_{\mu\nu} \end{equation}
\begin{equation} R = \xi^+ R^+ + \xi^- R^- + \delta (n^\mu n^\nu \gamma_{\mu\nu} - n^\mu n_\mu {\gamma^\nu}_\nu) \end{equation}
with
\begin{equation} \rho_{\mu\nu} = 2 n^\sigma \gamma_{\sigma(\mu} n_{\nu)} - n^\sigma n_\sigma \gamma_{\mu\nu} - n_\mu n_\nu {\gamma^\sigma}_\sigma \end{equation}
which leads to the stress-energy tensor as
\begin{equation} T_{\mu\nu} = \xi^+ T_{\mu\nu}^+ + \xi^- T_{\mu\nu}^- + \delta \tau_{\mu\nu} \end{equation}
with the discontinuity
\begin{equation} \tau_{\mu\nu} = n^\sigma \gamma_{\sigma(\mu} n_{\nu)} - \frac{1}{2}[n^\sigma n_\sigma \gamma_{\mu\nu} + n_\mu n_\nu {\gamma^\sigma}_\sigma + g_{\mu\nu } (n^\sigma n^\rho \gamma_{\sigma\rho} - n^\sigma n_\sigma {\gamma^\rho}_\rho)] \end{equation}
Our trials and tribulations are far from over, unfortunately. You may be aware that a very important aspect of spacetimes is the existence of convex normal neighbourhood, guaranteed by the existence and uniqueness of geodesics in a small enough neighbourhood, there again guaranteed by the fact that the connection is Lipschitz continuous. This is not the case anymore here, and we'd best get to prove it here if we don't want to throw most of general relativity away.
Unfortunately, the geodesic equation here is in very bad shape, as it is of the form
$$\dot{y}(\lambda) = F(y(\lambda))$$
with $y = (x^\mu, u^\mu)$ and $F(y) = (u^\mu, -{\Gamma^\mu}_{\alpha\beta}(x^\mu) u^\alpha u^\beta)$, so that $F$ is discontinuous with respect to $y$. There is fortunately a theory for dealing with differential equations that pathological, which is the theory of Filippov differential inclusion. The basic idea being that rather than finding some equality for the derivative of the solution (as that may be ill-defined at discontinuities), we only require that the solution itself be continuous (it would be weird to have discontinuous geodesics) and that it belongs to some subset :
$$\dot{y}(\lambda) \in \mathcal{F}[F](y(\lambda))$$
where we have the essential convex hull of our function :
$$\mathcal{F}\left[F\right](y) = \bigcap_{\delta > 0} \bigcap_{\mu(S) = 0} \overline{\operatorname{co}}(F(B(y, \delta) \setminus S))$$
This roughly means that we're taking the smallest interval of values around that point. For instance, most importantly for us, $\mathcal{F}[\theta(0)] = [0,1]$.
It's fairly hard to find a lot of theorems on the topic, but we're in luck because in addition to a nice introduction on the topic, we even have some theorems [1][2] for spacetimes with such connections. The most important one being for us that, given a spacetime with a $C^{0,1}$ (Lipschitz continuous) metric, then the geodesic equation does indeed have some solutions which are $C^1$ curves. Given some more regularity, we can even insure uniqueness.
As far as I'm aware, there is no proof that for any gluing of a spacetime, this regularity is guaranteed, but it does seem to be true for at least some of them.
There are probably a whole lot more one could say on the topic of glued spacetimes, but the references given should be a good enough source to find more on the topic. Of more or less rigor, I also recommend Visser and Poisson on the topic.