What is a gauge in a gauge theory?
In normal usage, a gauge is a particular choice, or specification, of vector and scalar potentials $\mathbf A$ and $\phi$ which will generate a given set of physical force fields $\mathbf E$ and $\mathbf B$.
More specifically, a physical situation is specified by the electric and magnetic fields, $\mathbf E$ and $\mathbf B$. A set of potentials $\mathbf A$ and $\phi$ generates the force fields if it obeys the equations \begin{align} \mathbf B & =\nabla\times\mathbf A \\ \mathbf E & = -\nabla\phi-\frac{\partial \mathbf A}{\partial t}. \end{align} As you know, for a given set of force fields, the potentials are not unique. A gauge is a specific, additional requirement on the potentials. One good example of a gauge is the Coulomb gauge, which is mostly embodied by the requirement that $\mathbf A$ also be divergenceless, $$\nabla \cdot\mathbf A=0.$$ "The Coulomb gauge" refers to the set of potentials which satisfy this.
Gauges are usually thought of as specifying the potentials uniquely. This is not really true, but they do tend to specify the potentials "uniquely up to reasonable physical assumptions". The Coulomb gauge is a good example of this: the gauge transformation to \begin{align} \mathbf A'&=\mathbf A+\nabla \chi(\mathbf r)\\ \phi'&=\phi \end{align} preserves the physical fields, and if $$\nabla^2 \chi(\mathbf r)=0$$ then it also preserves the gauge condition that $\nabla \cdot\mathbf A'=0$. This is not great for unicity, because there are a lot of harmonic functions that satisfy the above condition. However, for a function to really be harmonic throughout all of space - with no exceptions and no singularities - then it must diverge at infinity, which is not really palatable in most cases. Because of that, saying that $\mathbf A$ is the vector potential in the Coulomb gauge usually means that $\nabla \cdot\mathbf A=0$ and that such 'infinite-self-energy' terms have been set to zero; this is usually a unique set of potentials in situations where the energy in the physical fields themselves is not infinite.
It is worth noting that, in certain situations, the word gauge can be naturally free of this ambiguity. In my field, strong-field physics, the words 'length gauge' and 'velocity gauge' are taken to mean that the total energy of an electron interacting with a laser field, at position $\mathbf r$ and with momentum $\mathbf p$, is of the form $$E=\tfrac1{2m}\mathbf p^2-e\mathbf r\cdot \mathbf E$$ and $$E=\tfrac1{2m}\left(\mathbf p-e\mathbf A\right)^2,$$ respectively. For a uniform field (i.e. in the 'dipole approximation') the two energies are equivalent via a gauge transformation. However, here the word 'gauge' is completely unambiguous except for a total constant energy which can very safely be ignored.
Thus far for technical matters. I think, though, that a lot of what worries you is the word 'gauge' itself, which is indeed a weird choice. In everyday usage, a gauge is a generic form of meter or dial. The phrase 'gauge invariance' seems to have come into physics via German, in Hermann Weyl's use of the word 'Eichinvarianz', which loosely means 'scale invariance' or 'gauge invariance' (in the sense that a choice of measuring instrument (gauge) determines the measured physical values in a given setting, i.e. determines the scale).
This invariance under changes of scale is exactly (part of) the (technical) gauge invariance in general relativity, which is invariant under coordinate transformations.
Note, though, that my source for this history is Wikipedia, so if someone can chime in with a better source it would be fantastic.
Continuous symmetries of the action of a system which are global, that is, do not depend on where they act, give rise through Noether's theorem to conserved quantities. For example, a translation in time $t \to t+\epsilon$ for $\epsilon \in \mathbb{R}$ is a global transformation, and leads to energy conservation.
On the other hand, if an action is invariant under local or gauge transformations which do depend on the point wherein they act, then the system possesses a redundancy. For example, in the case of,
$$\mathcal{L}=-\frac{1}{4}F_{\mu\nu}F^{\mu\nu}$$
which describes electromagnetism, where $F_{\mu\nu} = \partial_{[\mu}A_{\nu]}$, we have a gauge symmetry,
$$A_\mu \to A_\mu + \partial_\mu \epsilon(x)$$
since the field-strength $F$ will be the same. To convince yourself, write out the field-strength explicitly:
$$F'_{\mu\nu} = \partial_\mu A_\nu - \partial_\nu A_\mu + \partial_\mu\partial_\nu \epsilon(x) - \partial_\nu\partial_\mu\epsilon(x)=\partial_\mu A_\nu - \partial_\nu A_\nu = F_{\mu\nu}$$
since $[\partial_\mu,\partial_\nu]\epsilon=0$. So, if I have a system with a 4-potential $A_\mu$, my action cannot distinguish it from the system with $A_\mu$ differing by a total derivative $\partial_\mu \epsilon(x)$. To go beyond your question now, notice a gauge symmetry allows us to simplify our problem often. If we choose to identify $A_\mu$ and $A'_\mu$ as the same system, then for any $A_\mu$ we can always make it satisfy,
$$\partial_\mu A^\mu =0$$
by choosing the right $\epsilon(x)$ such that $\partial_\mu \partial^\mu \epsilon(x)=-\partial_\mu A^\mu$. We call the former the 'gauge' or 'gauge condition.' This particular gauge is due to Lorenz.