What, in simplest terms, is gauge invariance?

The reason that it's so hard to understand what physicists mean when they talk about "gauge freedom" is that there are at least four inequivalent definitions that I've seen used:

  • Definition 1: A mathematical theory has a gauge freedom if some of the mathematical degrees of freedom are "redundant" in the sense that two different mathematical expressions describe the exact same physical system. Then the redundant (or "gauge dependent") degrees of freedom are "unphysical" in the sense that no possible experiment could uniquely determine their values, even in principle. One famous example is the overall phase of a quantum state - it's completely unmeasurable and two vectors in Hilbert space that differ only by an overall phase describe the exact same state. Another example, as you mentioned, is any kind of potential which must be differentiated to yield a physical quantity - for example, a potential energy function. (Although some of your other examples, like temperature, are not examples of gauge-dependent quantities, because there is a well-defined physical sense of zero temperature.)

    For physical systems that are described by mathematical structures with a gauge freedom, the best way to mathematically define a specific physical configuration is as an equivalence class of gauge-dependent functions which differ only in their gauge degrees of freedom. For example, in quantum mechanics, a physical state isn't actually described by a single vector in Hilbert space, but rather by an equivalence class of vectors that differ by an overall scalar multiple. Or more simply, by a line of vectors in Hilbert space. (If you want to get fancy, the space of physical states is called a "projective Hilbert space," which is the set of lines in Hilbert space, or more precisely a version of the Hilbert space in which vectors are identified if they are proportional to each other.) I suppose you could also define "physical potential energies" as sets of potential energy functions that differ only by an additive constant, although in practice that's kind of overkill. These equivalence classes remove the gauge freedom by construction, and so are "gauge invariant."

    Sometimes (though not always) there's a simple mathematical operation that removes all the redundant degrees of freedom while preserving all the physical ones. For example, given a potential energy, one can take the gradient to yield a force field, which is directly measurable. And in the case of classical E&M, there are certain linear combinations of partial derivatives that reduce the potentials to directly measurable ${\bf E}$ and ${\bf B}$ fields without losing any physical information. However, in the case of a vector in a quantum Hilbert space, there's no simple derivative operation that removes the phase freedom without losing anything else.

  • Definition 2: The same as Definition 1, but with the additional requirement that the redundant degrees of freedom be local. What this means is that there exists some kind of mathematical operation that depends on an arbitrary smooth function $\lambda(x)$ on spacetime that leaves the physical degrees of freedom (i.e. the physically measurable quantities) invariant. The canonical example of course is that if you take any smooth function $\lambda(x)$, then adding $\partial_\mu \lambda(x)$ to the electromagnetic four-potential $A_\mu(x)$ leaves the physical quantities (the ${\bf E}$ and ${\bf B}$ fields) unchanged. (In field theory, the requirement that the "physical degrees of freedom" are unchanged is phrased as requiring that the Lagrangian density $\mathcal{L}[\varphi(x)]$ be unchanged, but other formulations are possible.) This definition is clearly much stricter - the examples given above in Definition 1 don't count under this definition - and most of the time when physicists talk about "gauge freedom" this is the definition they mean. In this case, instead of having just a few redundant/unphysical degrees of freedom (like the overall constant for your potential energy), you have a continuously infinite number. (To make matters even more confusing, some people use the phrase "global gauge symmetry" in the sense of Definition 1 to describe things like the global phase freedom of a quantum state, which would clearly be a contradiction in terms in the sense of Definition 2.)

    It turns out that in order to deal with this in quantum field theory, you need to substantially change your approach to quantization (technically, you need to "gauge fix your path integral") in order to eliminate all the unphysical degrees of freedom. When people talk about "gauge invariant" quantities under this definition, in practice they usually mean the directly physically measurable derivatives, like the electromagnetic tensor $F_{\mu \nu}$, that remain unchanged ("invariant") under any gauge transformation. But technically, there are other gauge-invariant quantities as well, e.g. a uniform quantum superposition of $A_\mu(x) + \partial_\mu \lambda(x)$ over all possible $\lambda(x)$ for some particular $A_\mu(x).$

    See Terry Tao's blog post for a great explanation of this second sense of gauge symmetry from a more mathematical perspective.

  • Definition 3: A Lagrangian is sometimes said to posses a "gauge symmetry" if there exists some operation that depends on an arbitrary continuous function on spacetime that leaves it invariant, even if the degrees of freedom being changed are physically measurable.

  • Definition 4: For a "lattice gauge theory" defined on local lattice Hamiltonians, there exists an operator supported on each lattice site that commutes with the Hamiltonian. In some cases, this operator corresponds to a physically measurable quantity.

The cases of Definitions 3 and 4 are a bit conceptually subtle so I won't go into them here - I can address them in a follow-up question if anyone's interested.

Update: I've written follow-up answers regarding whether there's any sense in which the gauge degrees of freedom can be physically measurable in the Hamiltonian case and the Lagrangian case.

I only understood this after taking a class in general relativity (GR), differential geometry and quantum field theory (QFT). The essence is just a change of coordinates systems that needs to be reflected in the derivative. I'll explain what I mean.

You have a theory that is invariant under some symmetry group. So in quantum electrodynamics you have a Lagrangian density for the fermions (no photons yet) $$ \mathcal L = \bar\psi(x) [\mathrm i \gamma^\mu \partial_\mu - m] \psi(x) \,.$$ This $\bar\psi $ is just $\psi^\dagger \gamma^0$, important is that it is complex conjugated. The fact that it is a four-vector in spin-space is of no concern here. What one can do now is transform $\psi \to \exp(\mathrm i \alpha) \psi$ with some $\alpha \in \mathbb R$. Then $\bar\psi \to \bar\psi \exp(-\mathrm i \alpha)$ and the Lagrangian will be invariant as the derivative does not act on the exponential function, it is just a phase factor. There you have a global symmetry.

Now promote the symmetry to a local one, why not? Instead of a global $\alpha$ one now has $\alpha(x)$. This means we choose a different $\alpha$ at each point in spacetime. The problem is that when we transform now, one picks up the $\partial_\mu \alpha(x)$ with the chain and product rules of differentiation. That seems like a technical complication at first.

There is a more telling way to see this:
You take a deriviative of a field $\psi(x)$. This means taking a difference quotient like $$ \partial_\mu \psi(x) = \lim_{\epsilon \to 0} \frac{\psi(x + \epsilon \vec e_\mu) - \psi(x)}{\epsilon} \,.$$ This works just fine with a global transformation. But with the local transformation, you basically subtract two values that are gauged differently. In differential geometry you have that the tangent spaces at the different points of the manifold are different and therefore one cannot just compare vectors by their components. One needs a connection with connection coefficients to provide parallel transport. It is similar here. We now have promoted $\phi$ from living on $\mathbb R^4$ to living in the bundle $\mathbb R^4 \times S^1$ as we have an U(1) gauge group. Therefore we need some sort of connection in order to transport the transformed $\phi$ from $x + \epsilon \vec e_\mu$ to $x$. This is where one has to introduce some connection which is $$ \partial_\mu \to \mathrm D_\mu := \partial_\mu + \mathrm i A_\mu \,.$$

If you plug that into the Lagrange density to make it $$ \mathcal L = \bar\psi(x) [\mathrm i \gamma^\mu \mathrm D_\mu - m] \psi(x)$$ and then choose $A_\mu = \partial_\mu \alpha$ you will see that the Lagrangian density does stay invariant even under local transformations as the connection coefficient will just subtract the unwanted term from the product/chain rule.

In general relativity you have the symmetry under arbitrary diffeomorphism, the price is that you have to change the derivative to a connection, $$ \partial \to \nabla := \partial + \Gamma + \cdots \,.$$

Since you mentioned coming from a mathematics background, you might find it nice to take an answer in terms of equivalence classes.

A gauge theory is physical theory where the observable quantities, as in, things you could measure with an experiment given perfect measuring equipment, are equivalence classes in a vector space.

Electromagnitism is the most common example. Modern physics theories are always written as fiber bundles where the underlying manifold is spacetime and the fibers are some tangent space associated with each point (called an event) in spacetime. E&M in free space (no charges present) is described by associating a 4 component object called $A_{\mu}$ to each spacetime point, $x$, and requiring $A_{\mu}(x)$ to satisfy maxwell's equations.

However, the observable, equally measurable, quantities in nature are the electric and magnetic fields, $\vec{E}(x)$ and $\vec{B}(x)$. These are derived from $A_{\mu}(x)$ using the definition given in this wiki (look at the matrix elements of $F_{\mu \nu}(x)$).

It turns out that the transformation $A_{\mu}(x) \rightarrow A_{\mu}(x) + \partial_{\mu}f(x)$ for any twice differentiable function $f(x)$ gives the same values of the observable fields $\vec{E}(x)$ and $\vec{B}(x)$. So there is an equivalence relation

$A_{\mu}(x) \approx A_{\mu}(x) + \partial_{\mu} f(x)$.

And in general, gauge theories are theories where the observable quantities are functions on equivalence classes of some vectors in a vector space. In this case our vectors were $A_{\mu}(x)$ (these are vectors in the function space of twice differentiable functions on spacetime), and our equivalence relation was given above.

As to your final question about whether things like the total energy of system being determined only up to constant factor in any reference frame makes Newtonian dynamics a gauge theory. The answer is no, not really. Basically, if you're not talking about a field theory, a physicist won't call the thing a gauge theory.