Diffeomorphism group vs. $GL(4,\mathbb{R})$ in General Relativity
Let there be given a 4-dimensional real manifold$^1$ $M$. As OP says, the set ${\rm Diff}(M)$ is the group of globally defined $C^{\infty}$-diffeomorphisms $f:M\to M$. The set ${\rm Diff}(M)$ is an infinite-dimensional Lie group. (To actually explain mathematically what the previous sentence means, one would have to define what an infinite-dimensional manifold is, which is beyond the scope of this answer.)
There is also the groupoid ${\rm LocDiff}(M) \supseteq {\rm Diff}(M)$ of locally defined $C^{\infty}$-diffeomorphisms $f:U\to V$ (i.e. the invertible morphisms in the category). Here $U,V\subseteq M$ are open neighborhoods (i.e. objects in the category).
The above is part of the active picture. Conversely, in the passive picture, there is the groupoid $LCT(M)$ of local coordinate transformations $f:U\to V$, where $U,V\subseteq\mathbb{R}^4$. Heuristically, due to the dual active & passive formulations, the two groupoids ${\rm LocDiff}(M)$ and $LCT(M)$ must be closely related at "the microscopic level". (We leave it to the reader to try to make the previous sentence precise.)
Next let us consider the frame bundle $F(TM)$ of the tangent bundle $TM$. It is a principal bundle with structure group $GL(4,\mathbb{R})$, which is a 16-dimensional Lie group.
Given two locally defined sections $$(e_0, e_1, e_2, e_3), (e^{\prime}_0, e^{\prime}_1, e^{\prime}_2, e^{\prime}_3)~\in~ \Gamma(F(TM_{|W})), \tag{1}$$ in some neighborhood $W\subseteq M$, then there is a locally defined $GL(4,\mathbb{R})$-valued section $$\Lambda~\in~\Gamma(GL(4,\mathbb{R})\to W), \tag{2}$$ such than the two sections (1) are related via$^2$ $$e^{\prime}_{b}~=~ \sum_{a=0}^3 e_{a} \Lambda^{a}{}_{b}, \qquad b~\in~\{0,1,2,3\} \tag{3}. $$
Conversely, given only one of the sections in eq. (1), we can use an arbitrary $GL(4,\mathbb{R})$-valued section (2) to define the other frame via eq. (3).
Let us now return to the groupoid $LCT(M)$ and see how the structure group $GL(4,\mathbb{R})$ comes in. In details, let there be given two local coordinate charts $U,U^{\prime}\subseteq M$, with non-empty overlap $U\cap U^{\prime}\neq \emptyset$, and with local coordinates $(x^0,x^1,x^2,x^3)$ and $(x^{\prime 0},x^{\prime 1},x^{\prime 2},x^{\prime 3})$, respectively. Then we have two locally defined sections $$\left(\frac{\partial}{\partial x^0}, \frac{\partial}{\partial x^1}, \frac{\partial}{\partial x^2},\frac{\partial}{\partial x^3}\right)~\in~ \Gamma(F(TM_{|U})), $$ $$\left(\frac{\partial}{\partial x^{\prime 0}}, \frac{\partial}{\partial x^{\prime 0}}, \frac{\partial}{\partial x^{\prime 0}},\frac{\partial}{\partial x^{\prime 0}}\right)~\in~ \Gamma(F(TM_{|U^{\prime}})), \tag{4}$$ in the frame bundle. The analogue of the $GL(4,\mathbb{R})$-valued section (2) is given by the (inverse) Jacobian matrix$^1$ $$ \Lambda^{\mu}{}_{\nu} ~=~\frac{\partial x^{\mu}}{\partial x^{\prime \nu}},\qquad \mu,\nu~\in~\{0,1,2,3\}, \tag{5} $$ cf. the chain rule.
Conversely, note that not all $GL(4,\mathbb{R})$-valued sections (2) are of the form of a Jacobian matrix (5). Given one local coordinate system $(x^0,x^1,x^2,x^3)$ and given a $GL(4,\mathbb{R})$-valued section (2), these two inputs do not necessarily define another local coordinate system $(x^{\prime 0},x^{\prime 1},x^{\prime 2},x^{\prime 3})$. The $GL(4,\mathbb{R})$-valued section (2) in that case evidently needs to satisfy the following integrability condition $$ \frac{\partial (\Lambda^{-1})^{\nu}{}_{\mu}}{\partial x^{\lambda}} ~=~ (\mu \leftrightarrow \lambda). \tag{6}$$
So far we have just discussed an arbitrary $4$-manifold $M$ without any structure. For the rest of this answer let us consider GR, namely we should equip $M$ with a metric $g_{\mu\nu}\mathrm{d}x^{\mu}\odot \mathrm{d}x^{\nu}$ of signature $(3,1)$.
Similarly, we introduce a Minkowski metric $\eta_{ab}$ in the standard copy $\mathbb{R}^4$ used in the bundle $GL(4,\mathbb{R})\to W$. We now restrict to orthonormal frames (aka. as (inverse) tetrads/vierbeins) $$(e_0, e_1, e_2, e_3)~\in~ \Gamma(F(TM_{|W})), \tag{7}$$ i.e. they should satisfy the orthonormal condition $$ e_a\cdot e_b~=~\eta_{ab}. \tag{8} $$
Correspondingly, the structure group $GL(4,\mathbb{R})$ of the frame bundle $F(TM)$ is replaced by the proper Lorentz group $SO(3,1;\mathbb{R})$, which is a $6$-dimensional Lie group. In particular the sections (2) are replaced by locally defined $SO(3,1;\mathbb{R})$-valued sections $$\Lambda~\in~\Gamma(SO(3,1;\mathbb{R})\to W). \tag{9}$$ This restriction is needed in order to ensure the existence of finite-dimensional spinorial representations, which in turn is needed in order to describe fermionic matter in curved space. See also e.g. this Phys.SE post and this MO.SE post.
Consider a covariant/geometric action functional $$S[g, \ldots; V]~=~\int_V \! d^4~{\cal L} \tag{10}$$ over a spacetime region $V\subseteq M$, i.e. $S[g, \ldots; V]$ is independent of local coordinates, i.e. invariant under the groupoid $LCT(M)$. The action $$S[g, \ldots; V]~=~S[f^{\ast}g, \ldots; f^{-1}(V)] \tag{11}$$ is then also invariant under pullback with locally defined diffeomorphisms $f\in{\rm LocDiff}(M)$.
In summary, the symmetries of GR are:
- Pullbacks by the group ${\rm Diff}(M)$ of globally defined diffeomorhisms.
- Pullbacks by the groupoid ${\rm LocDiff}(M)$ of locally defined diffeomorphisms.
- The groupoid $LCT(M)$ of local coordinate transformations, and
- The local $SO(3,1;\mathbb{R})$ Lorentz transformations (9) of the tetrads/vierbeins.
--
$^1$ In most of this answer, we shall use the language of a differential geometer where e.g. a point/spacetime-event $p\in M$ or, say, a worldline have an absolute geometric meaning. However, the reader should keep in mind that a relativist would say that two physical situations which differ by an active global diffeomorphism are physically equivalent/indistinguishable, and hence a point/spacetime-event $p\in M$ does only have an relative geometric meaning.
$^2$ Conventions: Greek indices $\mu,\nu,\lambda, \ldots,$ are so-called curved indices, while Roman indices $a,b,c, \ldots,$ are so-called flat indices.
A very good answer to the above question has already been provided here, where Marek summarises the differences between the symmetry group of a theory and the groups of coordinates transformations leaving the equations invariant.
In a nutshell (but it is more complex) let $f\colon U\to V$ be any coordinates transformation on charts of a manifold $U,V\subset\mathcal{M}$ (i. e. a change of coordinates). Under such transformation fields $\phi(x)$ are sent into $\phi'(f(x)) = S(x)\phi(x)$.
In order the equations of motion to be satisfied, one must require certain appropriate conditions on the factor $S(x)$ (in particular one can see that these could be related to the representations of the underlying group of transformations $f$). The set of all allowed operators $S(x)$ defines the symmetry group of the theory for the general mapping $f$ as defined above. In the case of general relativity $f$ are the diffeomorphisms and $S(x)$ span the general linear group (up to isomorphisms and cartesian products).