Distributive Law and how it works
Foundations: Asymmetry in definitions
When mathematics starts at the very basic, we practically define multiplication so that it is distributive.
On the natural numbers $1,2,\dots$ you start with:
$$1\cdot n = n\\ (m+1)\cdot n = (m\cdot n) + n$$
This is the recursive way of saying that $m\cdot n$ is the sum of $m$ copies of $n$, but we already see that the second rule is looking like the distributive law.
From this definition of multiplication, can prove both directions of the distributive law more easily than we can prove that multiplication is commutative.
Example: Matrices
In general, we start with a relatively easier notion of "plus" or addition. For example, adding matrices is very simple:
$$\begin{pmatrix}a_1&b_1\\ c_1&d_1\end{pmatrix}+\begin{pmatrix}a_2&b_2\\ c_2&d_2\end{pmatrix}=\begin{pmatrix}a_1+a_2&b_1+b_2\\ c_1+c_2&d_1+d_2\end{pmatrix} $$
Multiplication is a much uglier and less obvious (when you first learn it) thing. Multiplication is not even commutative in matrices.
$$\begin{pmatrix}a_1&b_1\\ c_1&d_1\end{pmatrix}\cdot\begin{pmatrix}a_2&b_2\\ c_2&d_2\end{pmatrix}=\begin{pmatrix}a_1a_2+b_1c_2&a_1b_2+b_1d_2\\ c_1a_2+d_1c_2&c_1b_2+d_1d_2\end{pmatrix} $$
Geometric meaning
So the deep "why" is in the complexity - multiplication is usually defined in terms of addition.
Multiplication of real numbers is an example of this. If real numbers are thought of geometrically as lengths, then to understand addition of positive reals, you can "stay on a line." The only geometric views of multiplication of reals require a step into a higher dimension - you need to talk about areas, or use parallel lines.
The geometry also just makes more sense for the original distributive law - it means that you can connect two rectangles with at edges of the same size to get a larger rectangle, and the new rectangle has area the sum of the two original rectangles.
There is no such geometric meaning for $a+bc$ - it even fails a basic "units" test. One is the measure of area, the other is a measure of length. It is a wonder that we can turn area back into a length at all to add $a$ to $bc$, but the geometry is all wrong to get $a+b^2=(a+b)(a+c)$.
Distributive Lattices: Algebras with symmetry
There isn't a way to "make" two defined operations both distribute, but there are algebras with two operations where both distribute, such distributive lattices and boolean algebras.
You could create two operations on $\mathbb R,$ $+_m$ and $\cdot_m$ which are defined as:
$$a+_m b = \min(a,b)\\ a\cdot_m b = \max(a,b)$$
This is just treating the order on the reals as a distributive lattice.
For lattices in general, we generally use the operator symbols $\cap$ and $\cup$ or $\land$ and $\lor$. These symbols indicate the symmetry. You'd never see a lattice using $+$ and $\cdot$, because they don't fit the "pattern" of addition and multiplication. (In Boolean algebras, you occasionally see $\cap$ written as $\cdot$, but $+$ is something weirder in that case.)
These two operators are very different from the usual $+,\cdot$. There is no identity for either operation (although you could add $+\infty$ and $-\infty$ for identities,) and you have $a\cdot_m a = a+_ma=a$.
Deep dive into Algebra: Rings of endomorphisms
This will really only make sense if you have taken some algebra.
Given any Abelian group $(A,+)$, the set $R=\mathrm{End}(A)$ of homomorphisms $A\to A$ forms a ring with identity by defining $(f+g)(a)=f(a)+g(a)$ and $(f\cdot g)(a)==(f\circ g)(a)=f(g(a))$.
It turns out, every ring $R$ is basically a subset of such an endomorphism ring. The distributive law is what makes that possible:
$$R\hookrightarrow End(R,+)$$
That is the most trivial "representation" of $R$, but, for example, in the case of matrices, there are simpler abelian groups to use.
So, let's say we only have the notion of addition and an order ($<$) on the real numbers.
It turns out, for each $r\in\mathbb R$ there is a unique $f_r\in\mathrm{End}(\mathbb R)$ which has $f_r(1)=r$ and is "nice" with respect to the notion of between-ness - if $c$ is between $a$ and $b$ (inclusive) then $f_r(c)$ is between $f_r(a),f_r(b)$ (inclusive.) (This "between-ness condition can be seen as requiring $f_r$ to be continuous.)
Thus there is a map:
$$\mathbb R\hookrightarrow \mathrm{End}(\mathbb R,+)\\r\mapsto f_r$$
Now we can define multiplication in $\mathbb R$ as $r\cdot s=f_r(s)$. It takes some effort to prove that this works, but the more remarkable thing is not that multiplication is distributive - that is built in by the nature of how we defined endomorphisms. The remarkable thing is that multiplication is associative and commutative.
[It's easier to define multiplication on the integers and rationals, because you don't require the order part - if $A=(\mathbb Q,+)$ or $(\mathbb Z,+)$ then there is exactly one $f_p\in\mathrm{End}(A)$ which satisfies $f_p(1)=p$.]
This notion of endomorphisms is actually pretty useful. For example, a vector space can be defined as an abelian group and a field $k$ with a ring homomorphism: $$k\hookrightarrow \mathrm{End}(A)$$
And the same for $R$-modules, although in that case the map $R\to \mathrm{End}(A)$ is not required to be an inclusion.
Units of measure
One last thing to note is to wonder what happens when units are in use.
You can only add two numbers if they are in the same "units." You don't add $1$ inch to $2$ meters. You have to convert units in that case. You certainly can't add $1$ meter per second $2$ meters.
However, you can multiply any two numbers with any units, you just get a different unit. for example: $1\text{ m/s}\times 3\text{ s}=3m$.
So, in the equation $a(b+c)=ab+ac$, you get the same units and are "allowed" to do this operation if $b$ and $c$ have the same units.
But if you had $a+bc=(a+b)(a+c)$, then the right side would require $a,b,c$ and $bc$ to all have the same units. That essentially means that they can't have units, unless you can find a unit $u$ such that $u^2$ is the same unit as $u$.
My way of interpreting the distributive law in an image:
The reason why people get annoyed with stuff like this is because they see the operations $+$ and $\times$ as symbols, not actual processes.
Standard index laws when using $\times$ allows us to relate multiplication with dimensions.
Standard addition laws regarding 'like terms' means we can't just simplify two entities from these 'separate' dimensions.
From image: "The output of the * operator can be visualised by adding dimensions (as per the index laws, x^2 * x^3 = x^5" "The output of the + operator can be visualised as lumping two of the same dimension objects together." "If you imagine x=a=b=c and apply the distributive law, you'll end up with non-similar terms. i.e. you can't group x^2 and x terms"
Update: The previous version of this answer was somewhat misleading, suggesting that there might be non-trivial rings where the sum distributes over the product.
Suppose that $(A,+,*)$ is a ring where $+$ distributes over $*$, i.e. where $$ a + b*c = (a+b)*(a+c) \quad \text{for every } a,b,c \in A. \tag{1} \label{eq:dist} $$ By definition of ring, there must be an element $0 \in A$ that acts as the identity for $+$, therefore $$ a = a + 0 = (a+0)*(a+0) = a*a \quad \text{for every } a \in A $$ and a ring where every element is idempotent is called a Boolean ring. On the other hand, since $A$ is a ring we know that $0*a = 0$ for every $a \in A$, so $\eqref{eq:dist}$ implies that we must also have $$ a = a + 0*a = (a + 0)*(a+a) = a*a + a*a = a+a $$ and summing to both sides the additive inverse of $a$ — which must exist for $A$ to be a ring — gives $$ 0 = a. $$ Therefore the only ring where $\eqref{eq:dist}$ can hold is the trivial ring, $A = \{0\}$.
In case this wasn't clear, $(\Bbb{R},+,*)$ is a ring, and clearly $\Bbb{R} \neq \{0\}$.
On the other hand, rings are not the only kind of algebraic structure comprised of a set with two operations defined on it, and for each of those we can define a notion of distributivity.
For example, it can be proved that in a lattice $(L,\vee,\wedge)$ if one operation distributes over the other, then the converse must also hold, too. In other words, we have that $$ a \vee (b \wedge c) = (a \vee b) \wedge (a \vee c) \quad \text{for every } a,b,c \in L $$ if and only if $$ a \wedge (b \vee c) = (a \wedge b) \vee (a \wedge c) \quad \text{for every } a,b,c \in L $$ and in this case the lattice is simply said to be distributive. In his answer Thomas Andrews gave an example of distributive lattice structure on $\Bbb{R}$.