Why should the tensor product of $\mathcal{D}_X$-modules over $\mathcal{O}_X$ be a $\mathcal{D}_X$-module?
This is a replacement for an old confused answer. There is a related context in which I know a good answer. Suppose $A$ is a ring and $S$ is a central subring. If $M$ and $N$ are $A$ modules, then $M \otimes_S N$ is an $A \otimes_S A$-module and, if we have a map of $S$-algebras $\Delta: A \to A \otimes_S A$, then this makes $M \otimes_S N$ into an $A$-module again. The data of such a $\Delta$ makes $A$ into a bi-algebra.
Similarly, $\mathrm{Hom}(M,S)$ is an $A^{op}$-module and, if we have a map of $S$-algebras $r: A \to A^{op}$, then $\mathrm{Hom}(M,S)$ becomes an $A$-module again. Such an $r$ and $\Delta$, if they obey the correect compatabilities, make $A$ into a Hopf algebra.
But this is not the right description for $D$-modules. $R$ is not central in $D$. And I am confused about how to fix this. Indeed, $D \otimes_R D$ is not a ring at all! (If you think it is, do $(d/dx) \otimes 1$ and $1 \otimes x$ commute? What about $(d/dx) \otimes 1$ and $x \otimes 1$? But $x \otimes 1 = 1 \otimes x$!)
The best description I can give of the action of $D$ on $M \otimes_R N$ is to take the unique map of rings $D \to D \otimes_k D$ sending $X \mapsto X \otimes 1 + 1 \otimes X$ for $X$ a vector field. $D \otimes_k D$ acts on $M \otimes_k N$ and, for some unclear reason, the image of $D$ under this map passes to the quotient $M \otimes_R N$. Similarly, the action on $\mathrm{Hom}(M,N)$ uses the map $X \mapsto -X \otimes 1 + 1 \otimes X$ to $D \otimes_k D^{op}$.
I don't understand why this works. Other people have suggested that the term "bi-algebroid" is the right context to understand this, but I have to admit I don't understand the sources on bi-algebroids.
Finally, on a smooth projective variety, there need not be any map $D \to D^{op}$. For example, consider $\mathbb{P}^1$ with open chart $\mathrm{Spec}\ k[x]$ and suppose $\phi: D \to D^{op}$ is a map of rings and of $\mathcal{O}_{\mathbb{P}^1}$-modules. Then $\phi(d/dx) x - x \phi(d/dx) = -1$, so $\phi(d/dx)$ is of the form $-d/dx + h(x)$ for some $h(x) \in k[x]$. But $x^2 (d/dx)$ is defined on all of $\mathbb{P}^1$, so $\phi(x^2 d/dx) = - x^2 d/dx + 2x + x^2 h(x)$ must extend to all of $\mathbb{P}^1$, and $2x+x^2 h$ can not be a global regular function on $\mathbb{P}^1$.
OK, I'll give it a shot. The bi-algebra structure on $D$ is something that I found very confusing too, so I will try to spell it out as best I understand. These ideas were explained to me by Pavel Safronov, and I found these notes by Gabriella Bohm be helpful https://arxiv.org/abs/0805.3806 (though they deal with a more general case than we need here). See also the original papers by Sweedler and Takeuchi from the `70's.
The $D$-module set-up
Suppose $X$ is a smooth algebraic variety, and $D=D_X$. The situation we have is the following: the category $D-mod$ and the forgetful functor to $\mathcal O-mod$, are equipped with (symmetric) monoidal structures (the duality of $D$-modules will be discussed later).
There are many ways to understand why this should be the case, as some of the other answers indicate. For example, the category $D-mod$ can be understood as quasi-coherent sheaves on the de Rham space $X_{dR}$ (and the ring $D$ expresses the descent data on the pullback to quasi-coherent sheaves on $X$). Alternatively, if one thinks of $D$ as a deformation quantization of $T^\ast X$, then the monoidal structure arises from the fact that the cotangent bundle is a symplectic group(oid) acting (trivially) on $X$. One can think of $T^\ast X$ as being a commutative group object in the category of symplectic varieties and lagrangian correspondences (I find this last persepective helpful in unpacking the notion of bialgebroid).
However, I think what you are after is not why $D$-modules have this structure, but what structure on the ring $D$ endows $D-mod$ with these structures. The answer (as has already been mentioned) is that $D$ is a bialgebroid over $\mathcal O$. Let me try my best to unpack what that means below.
The categorical structure
(You can ignore this bit if you don't like it).
Consider the following situation: we have monoidal categories $\mathcal C$ and $\mathcal D$ and a monoidal functor $$F:\mathcal D \to \mathcal C$$ Suppose also that the functor $F$ is monadic, so that $\mathcal D$ can be expressed as modules for a monad $T$ acting on $\mathcal C$. The monoidal structures on $\mathcal D$, $\mathcal C$ and $F$ must then be reflected in the monad $T$. Such a structure on a monad (acting on a monoidal category $\mathcal C$) is called a bimonad. Rather than saying what this all this means in general, let's consider a special case.
Bialgebroids (over a commutative base)
Suppose $R$ is a commutative ring, and let $\mathcal C = R-mod$. Then a (colimit preserving) monad acting on $R-mod$ is nothing more than a $R$-ring, i.e. a ring $B$ with a ring homomorphism $R\to B$ (note that $R$ need not be central in $B$). In the case we are interested in $B=D$ and $R=\mathcal O$.
Before giving an algebraic definition of a bialgebroid, we note that the point of all this is that a (left) bialgebroid structure on $B$ is exactly equivalent to data of a monoidal structure on $B-mod$ and on the forgetful functor to $R-mod$. Note that if $R$ is central in $B$, this is the usual Tannakian theory, and an $R$-bialgebroid is just an $R$-bialgebra.
So what is an $R$-bialgebroid? Well, we already know that $B$ is an $R$-ring, so there is a product: $$ B_{\bullet} \otimes_{R} {}_\bullet B \to B $$ where the dots indicate on which side $R$ is acting on $B$. As one might expect, there is also a coproduct, which tells you how $B$ should act on the tensor product $M \otimes_R N$ of two left $B$-modules, but one has to be careful about which monoidal category the coalgebra structure on $B$ lives in. If you unwind the definitions, you see that the coproduct is given by a map $$ B \to {}_\bullet B \otimes_R {}_\bullet B $$ Note that, unlike in the product map, $R$ is acting on the left on both factors. This is a little confusing at first, but perhaps not so surprising if you consider that in the category $B-mod$ we want to understand how to tensor two left $B$-modules.
Of course, there are some axioms. The one that I found hardest to digest involves something called the Takeuchi product. Let me try to motivate that a bit.
Takeuchi Product
In the usual theory of bialgebras, there is an axiom which says that the coproduct is an algebra map. This doesn't make sense for bialgebroids as ${}_\bullet B\otimes_R {}_\bullet B$ is not an algebra under componentwise multiplication. The Takeuchi product is a certain subspace of this object, defined by: $$ B {}_R \times B := \left\{ \sum b_i \otimes b_i' \in {}_\bullet B\otimes_R {}_\bullet B \mid \sum b_i r \otimes b_i' = b_i \otimes b_i'r \right\} $$ Note that the the $r$'s in the condition are acting on the right, whereas the relative tensor product is using multiplication on the left. Note also that if $R$ is central in $B$, then the condition is vacuous. One can check that $B {}_R \times B$ is ring under compoentwise multiplication. One of the axioms of a bialgebroid is that the coproduct map factors through the Takeuchi product and is a ring homomorphism. (There is another interesting bialgebroid axiom, which is about the counit map, but for brevity, I won't discuss that).
The Takeuchi product (which in the $R$ commutative case appears to be due to Sweedler?) seemed somewhat mysterious to me until I saw that there is a ring isomorphism: $$ B {}_R \times B \simeq End_{B^{op}\otimes B^{op}} (_\bullet B \otimes_R {}_\bullet B ) $$ Thus, the comultiplication map is nothing more than the structure of a left $B$, right $B\otimes B$ bimodule on ${}_\bullet B\otimes_R {}_\bullet B$. This fits well with the $D$-modules story: the coproduct on $\mathcal D$ is precisely the transfer bimpdule structure on $$ \mathcal D_{X\to X\times X} = {}_\bullet \mathcal D \otimes_{\mathcal O} {}_{\bullet} \mathcal D$$ (as it should be, as the transfer bimodule represents the tensor product functor).
The D-module structure on Hom
Let me come back to this another time...
One way to think about this is that $D$ is the universal enveloping algebra $U(R,L)$ of the $(k,R)$ Lie-Rinehart algebra $\mathrm{Der}_k(R,R)$. Whenever one has such an enveloping algebra one may perform these kind of constructions. More details can be found in https://arxiv.org/abs/dg-ga/9702008. See section 2 in particular.
The well-known fact for $D$-modules that tensoring with the 'canonical sheaf', that is the top exterior power of $\Omega^1_{R/k}$, defines an equivalence between left $D$-modules and right $D$-modules also holds in this more general setting provided that $L$ is a projective $R$-module of finite rank (then the $R$-linear dual of the top exterior power of $L$ plays the role of the canonical sheaf).