Why do we need to reverse a function in the convolution operaton?
Well, shifting is enough is the sense that reversing does not really change in an essential way the mathematical object of convolution. But the reason we choose the reversing definition conventionally may be because of several conveniences. For example:
(1) The property of commutativity, that is, $f*g=g*f$, is lost without reversing;
(2) The property that convolution is multiplication on the Fourier side, that is $\mathcal{F}(f*g)=\mathcal{F}(f)\mathcal{F}(g)$ where $\mathcal{F}$ denotes the Fourier transform, is lost without reversing;
Etc.
Here is one way to discover the discrete convolution operation. Let $S: \mathbb R^n \to \mathbb R^n$ be the circular shift operator defined by $$ S \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_{n-1} \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_{n-1} \\ x_0 \end{bmatrix}. $$ Assume that the linear operator $A:\mathbb R^n \to \mathbb R^n$ is "shift-invariant" in the sense that $A(Sx) = S(Ax)$ for all $x$. In other words, if you shift the input, the output is shifted the same way. We can easily check that $A S^n x = S^n A x$ for all integers $n$, positive or negative. Shift-invariant linear operators are of fundamental importance, and for example they are useful in signal processing and numerical differentiation.
Key idea: Let's find the solution to $Ax = b$, assuming that we have already found a "fundamental solution" $x_0$ which satisfies $$ Ax_0 = \begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix} = e_0. $$
We are off to a good start if we notice that \begin{align} A S^{-1} x_0 = \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix} = e_1 \end{align} and in fact $$ A S^{-j} x_0 = e_j \quad \text{for } j = 0,\ldots,n-1. $$ It follows that \begin{align} A(b_0 x_0 + b_1 S^{-1} x_0 + b_2 S^{-2} x_0 + \cdots + b_{n-1} S^{n-1} x_0) &= b_0 e_0 + e_1 b_1 + \cdots + b_{n-1} e_{n-1} \\&=b. \end{align} We have solved $Ax = b$.
The solution to $Ax = b$ is a particular combination of $x_0$ and $b$, and this combination of $x_0$ and $b$ is called the "convolution" of $x_0$ and $b$, and is denoted $x_0 \ast b$. We have discovered the convolution operation. This explains why we care about convolution, and why convolution is defined in the way that it is.
We can discover the convolution of functions using a similar line of reasoning, with the delta function $\delta$ in place of $e_0$.
It depends what you want out of it: Just as there are different ways to define the Fourier-Transformation - which have very similar, but slightly different algebraic properties - one can also define different convolution type integrals. The most common alternativ to the convolution is the cross-correlation which is frequently used in signal processing.
\begin{align} \text{Convolution:}&& [f*g](t) &= \int f(\tau)g(t-\tau)d\tau = \int f(t-\tau)g(\tau)d\tau\\ \text{Cross correlation:}&& [f\star g](t) &=\int \overline{f(\tau)}g(t+\tau)d\tau=\int f(\tau)\overline{g(\tau-t)}d\tau \end{align}
Cross correlation is nice since it can measure how similar two signals are making it very useful in application. But mathematicians typically tend to use the convolution instead since it offers the nicest package of algebraic properties.
In particular what stands out to me is that the dirac delta $\delta$ acts like a multiplicative unit under convolution ($f*\delta =f$ for all $f$) which leads to the beautiful solution theory of linear PDEs.
On the other hand properties like the convolution theorem $\mathcal F(f*g) = \mathcal F(f)\cdot \mathcal F(g)$ like shrinklemma mentioned are not that unique; for example the cross correlation satisfies the analogous $\mathcal F(f\star g) = \overline{\mathcal F(f)}\cdot \mathcal F(g)$