Difference of two binomial random variables
I can give you an answer for the pmf of X-Y. From there |X - Y| is straightforward.
So we start with
$X \sim Bin(n_1, p_1)$
$Y \sim Bin(n_2, p_2)$
We are looking for the probability mass function of $Z=X-Y$
First note that the min and max of the support of Z must be $(-n_2, n_1)$ since that covers the most extreme cases ($X=0$ and $Y=n_2$) and ($X=n_1$ and $Y=0$).
Then we need a modification of the binomial pmf so that it can cope with values outside of its support.
$m(k, n, p) = \binom {n} {k} p^k (1-p)^{n-k}$ when $k \leq n$ and 0 otherwise.
Then we need to define two cases
- $Z \geq 0$
- $Z \lt 0$
In the first case
$p(z) = \sum_{i=0}^{n_1} m(i+z, n_1, p_1) m(i, n_2, p_2)$
since this covers all the ways in which X-Y could equal z. For example when z=1 this is reached when X=1 and Y=0 and X=2 and Y=1 and X=4 and Y=3 and so on. It also deals with cases that could not happen because of the values of $n_1$ and $n_2$. For example if $n_1 = 4$ then we cannot get Z=1 as a combination of X=5 and Y=4. In this case thanks to our modified binomial pmf the probablity is zero.
For the second case we just reverse the roles. For example if z=-1 then this is reached when X=0 and Y=1, X=1 and Y=2 etc.
$p(z) = \sum_{i=0}^{n_2} m(i, n_1, p_1) m(i+z, n_2, p_2)$
Put them together and that's your pmf.
$ f(z)= \begin{cases} \sum_{i=0}^{n_1} m(i+z, n_1, p_1) m(i, n_2, p_2),& \text{if } z\geq 0\\ \sum_{i=0}^{n_2} m(i, n_1, p_1) m(i+z, n_2, p_2), & \text{otherwise} \end{cases} $
Here's the function in R and a simulation to check it's right (and it does work.)
https://gist.github.com/coppeliaMLA/9681819
I doubt there is a special name for the distribution in general. There is one special case of interest: $p_1 = 1 - p_2$. Note that $n_2 - Y \sim {\text Bin}(n_2, 1-p_2)$, and so in this special case $X - Y + n_2 \sim {\text Bin}(n_1 + n_2, p_1)$.
This question is more tricky than it sounds. To solve it, I will use here a combination of both manual methods and automated methods, in particular computer algebra tools [the mathStatica
package (of which I am an author) for Mathematica and the latter itself].
If I may change the notation slightly:
The Problem
Let $X_1$ ~ $\text{Binomial}(n,p)$ and $X_2$ ~ $\text{Binomial}(m,q)$ be independent.
Find the pmf of $|X_1-X_2|$
Given: Due to independence, the joint pmf of $(X_1, X_2)$, say $f(x_1,x_2)$, is:
Solution
Let $Y=X_1-X_2$ and $Z=X_2$. Then, the joint pmf of $(Y,Z)$, say $g(y,z)$, is:
where Transform
is a mathStatica
function that derives the joint pmf using the Method of Transformations. Deriving the domain of support of $Y$ and $Z$ is a bit more tricky. To make things clearer, here is a rough plot that illustrates the (smoothed continuous version of) the domain of support:
This suggests two cases:
Case 1: When $y \ge 0$: $0 \le z \le n-y$
Case 2: When $y < 0$: $-y \le z \le m$
The density of $Y=X_1-X_2$ is then obtained by summing out $Z$ in each part of the domain:
Finally, to find the pmf of $|Y|$, the pmf for strictly positive values will be:
and when $Y=0$:
Summary
The pmf of $|X_1-X_2|$, say $\phi(y)$ is:
with domain of support $Y$ = {0, 1, ..., max$(m,n)$}.
All done.
Monte Carlo check
It is always a good idea to check ones work using Monte Carlo methods. Here, for instance, are 100,000 pseudo-random drawings from each of $X_1$ and $X_2$, given some parameter assumptions:
x1data = RandomVariate[BinomialDistribution[12, .1], 100000];
x2data = RandomVariate[BinomialDistribution[ 7, .9], 100000];
Next, compare the empirical distribution of $|X_1-X_2|$ (red triangles) to the theoretical density $\phi(y)$ (blue dots) derived above, given the same parameter assumptions:
Looks good :)