Generalizing Poisson's binomial distribution to the multinomial case.
Preliminary (TL;DR)
Background
In his 1991 publication, Norman C. Beaulieu answered your question w/ what he dubbed, the generalized multinomial distribution (GMD). My explanation will focus on the GMD's utility.
Notation
- # categories $= c$.
- # trials $= t$.
- Random vector $= X = \left[\begin{array}{cccc}X_1&X_2&\cdots&X_c\end{array}\right]^T$.
- Category responses after $t$ trials vector $= x = \left[\begin{array}{cccc}x_1&x_2&\cdots&x_c\end{array}\right]^T$.
- $\sum_{k = 1}^c x_k = t$.
- Probability of category response during trial matrix $= p = \left[\begin{array}{cccc} p_{1,1} & p_{1,2} & \cdots & p_{1,c} \\ p_{2,1} & p_{2,2} & \cdots & p_{2,c} \\ \vdots & \vdots & \ddots & \vdots \\ p_{t,1} & p_{t,2} & \cdots & p_{t,c} \end{array}\right]$.
- Pmf of $X = P\left[X = x\right]$.
- $[c] = \left\{1, 2, \cdots, c\right\}$.
- Multiset of $[c] = ([c], m) = \left\{1^{m(1)}, 2^{m(2)}, \cdots, c^{m(c)}\right\}$.
- $m(i) = x_i$.
- Permutations of $([c], m) = \mathfrak{S}_{([c], m)}$.
- $card\left(\mathfrak{S}_{([c], m)}\right) = \left(m(1), m(2), \cdots, m(c)\right)!$.
Pmf of GMD
$$P\left[X = x\right] = \sum_{\mathfrak{s} \in \mathfrak{S}_{([c], m)}} \left\{\prod_{k = 1}^t \left\{p_{k,\mathfrak{s}_k}\right\}\right\}$$
So far, I've identified it as being the superclass of 7 distributions! Namely...
- Bernoulli distribution.
- Uniform distribution.
- Categorical distribution.
- Binomial distribution.
- Multinomial distribution.
- Poisson's binomial distribution.
- Generalized multinomial distribution (if your definition of superclass allows self-inclusion).
Examples
Games
- g1: A 2 sided die is simulated using a fair standard die by assigning faces w/ pips 1 through 3 & 4 through 6 to sides 1 & 2, respectively. The die is biased by etching micro holes into faces w/ pips 1 through 3 s.t. $p_1 = 12/30$ & $p_2 = 18/30$. The 2 sided die is tossed 1 time & the category responses are recorded.
- g2: Same as g1, accept w/ ideal standard die, i.e., $p_1 = p_2 = \cdots = p_6 = 5/30$.
- g3: Same as g1, accept w/ standard die, i.e., $p_1 = p_2 = p_3 = 4/30$ & $p_4 = p_5 = p_6 = 6/30$.
- g4: Same as g1, accept die is tossed 7 times.
- g5: Same as g3, accept die is tossed 7 times.
- g6: Same as g4, accept the micro holes are filled w/ $0.07$ kg of a material, which evaporates @ $0.01$ kg/s upon being sprayed w/ an activator, s.t. $p_1 = p_2 = 15/30$ for the 1st toss. Immediately after being sprayed, category responses are recorded every second.
- g7: Same as g6, accept w/ standard die, i.e., $p_1 = p_2 = \cdots = p_6 = 5/30$ for the 1st toss.
Questions
- q1: Find pmf & evaluate when $x = \left[\begin{array}{cc}0&1\end{array}\right]^T$.
- q2: Find pmf & evaluate when $x = \left[\begin{array}{cccccc}0&1&0&0&0&0\end{array}\right]^T$.
- q3: q2.
- q4: Find pmf & evaluate when $x = \left[\begin{array}{cc}2&5\end{array}\right]^T$.
- q5: Find pmf & evaluate when $x = \left[\begin{array}{cccccc}0&2&1&1&0&3\end{array}\right]^T$.
- q6: q4.
- q7: q5.
Answers w/o knowledge of GMD
- a1: $X$ ~ Bernoulli distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^2 \frac{p_k^k}{k!} = \frac{1!(12/30)^0(18/30)^1}{0!1!}$
$\Longrightarrow P\left[X = x\right] = 3/5$.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^2 \frac{p_k^k}{k!} = \frac{1!(12/30)^0(18/30)^1}{0!1!}$
- a2: $X$ ~ Uniform distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{1!(5/30)^{0 + 1 + 0 + 0 + 0 + 0}}{0!1!0!0!0!0!}$
$\Longrightarrow P\left[X = x\right] = 1/6$.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{1!(5/30)^{0 + 1 + 0 + 0 + 0 + 0}}{0!1!0!0!0!0!}$
- a3: $X$ ~ Categorical distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{1!(4/30)^{0 + 1 + 0}(6/30)^{0 + 0 + 0}}{0!1!0!0!0!0!}$
$\Longrightarrow P\left[X = x\right] = 2/15$.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 1!\prod_{k = 1}^6 \frac{p_k^k}{k!} = \frac{1!(4/30)^{0 + 1 + 0}(6/30)^{0 + 0 + 0}}{0!1!0!0!0!0!}$
- a4: $X$ ~ Binomial distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 7!\prod_{k = 1}^2 \frac{p_k^k}{k!} = \frac{7!(12/30)^2(18/30)^5}{2!5!}$
$\Longrightarrow P\left[X = x\right] = 20412/78125$.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 7!\prod_{k = 1}^2 \frac{p_k^k}{k!} = \frac{7!(12/30)^2(18/30)^5}{2!5!}$
- a5: $X$ ~ Multinomial distribution.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 7!\prod_{k = 1}^6 \frac{p_k^k}{k!} =
\frac{7!(4/30)^{0 + 2 + 1}(6/30)^{1 + 0 + 3}}{0!2!1!1!0!3!}$
$\Longrightarrow P\left[X = x\right] = 224/140625$.
- $P\left[X = x\right] = t!\prod_{k = 1}^c \frac{p_k^k}{k!} = 7!\prod_{k = 1}^6 \frac{p_k^k}{k!} =
\frac{7!(4/30)^{0 + 2 + 1}(6/30)^{1 + 0 + 3}}{0!2!1!1!0!3!}$
- a6: $X$ ~ Poisson's binomial distribution.
- $P\left[\left[\begin{array}{cc}X_1&X_2\end{array}\right]^T = \left[\begin{array}{cc}x_1&x_2\end{array}\right]^T\right] = P\left[X_1 = x_1, X_2 = x_2\right] = P\left[X_1 = x_1\right] = P\left[X_2 = x_2\right]$.
- $p_1$ & $p_2$ are vectors now: $p_1 = \left[\begin{array}{cccc}p_{1_1}&p_{1_2}&\cdots&p_{1_t}\end{array}\right]^T, p_2 = \left[\begin{array}{cccc}p_{2_1}&p_{2_2}&\cdots&p_{2_t}\end{array}\right]^T$.
- $P\left[X_2 = x_2\right] = \frac{1}{t + 1}\sum_{i = 0}^t \left\{\exp\left(\frac{-j2\pi i x_2}{t + 1}\right) \prod_{k = 1}^t \left\{p_{2_k}\left(\exp\left(\frac{j2\pi i}{t + 1}\right) - 1\right) + 1\right\}\right\}$
$= \frac{1}{8}\sum_{i = 0}^7 \left\{\exp\left(\frac{-j5\pi i}{4}\right) \prod_{k = 1}^7 \left\{\left(\frac{0.5k + 14.5}{30}\right)\left(\exp\left(\frac{j\pi i}{4}\right) - 1\right) + 1\right\}\right\}$
$\Longrightarrow P\left[X_2 = 5\right] = 308327/1440000$.
- a7: $X$ ~ Generalized multinomial distribution.
- ???
Answers w/ Knowledge of GMD
- a1: $X$ ~ Bernoulli distribution.
- $p = \left[\begin{array}{c}\frac{12}{30}&\frac{18}{30}\end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(2\right)\right\}$.
- a2: $X$ ~ Uniform distribution.
- $p = \left[\begin{array}{c}\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}\end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2\right)\right\}$.
- a3: $X$ ~ Categorical distribution.
- $p = \left[\begin{array}{c}\frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30}\end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2\right)\right\}$.
- a4: $X$ ~ Binomial distribution.
- $p = \left[\begin{array}{cc} \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \\ \frac{12}{30}&\frac{18}{30} \end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(1,1,2,2,2,2,2\right), \ldots, \left(2,2,2,2,2,1,1\right)\right\}$.
- a5: $X$ ~ Multinomial distribution.
- $p = \left[\begin{array}{cccccc} \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2,2,3,4,6,6,6\right), \ldots, \left(6,6,6,4,3,2,2\right)\right\}$.
- a6: $X$ ~ Poisson's binomial distribution.
- $p = \left[\begin{array}{cc} \frac{15}{30}&\frac{15}{30} \\ \frac{14.5}{30}&\frac{15.5}{30} \\ \frac{14}{30}&\frac{16}{30} \\ \frac{13.5}{30}&\frac{16.5}{30} \\ \frac{13}{30}&\frac{17}{30} \\ \frac{12.5}{30}&\frac{17.5}{30} \\ \frac{12}{30}&\frac{18}{30} \end{array}\right]$.
- $\mathfrak{S}_{([2], m)} = \left\{\left(1,1,2,2,2,2,2\right), \ldots, \left(2,2,2,2,2,1,1\right)\right\}$.
- a7: $X$ ~ Generalized multinomial distribution.
- $p = \left[\begin{array}{cccccc} \frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30}&\frac{5}{30} \\ \frac{4.8\overline{3}}{30}&\frac{4.8\overline{3}}{30}&\frac{4.8\overline{3}}{30} &\frac{5.1\overline{6}}{30}&\frac{5.1\overline{6}}{30}&\frac{5.1\overline{6}}{30} \\ \frac{4.\overline{6}}{30}&\frac{4.\overline{6}}{30}&\frac{4.\overline{6}}{30} &\frac{5.\overline{3}}{30}&\frac{5.\overline{3}}{30}&\frac{5.\overline{3}}{30} \\ \frac{4.5}{30}&\frac{4.5}{30}&\frac{4.5}{30} &\frac{5.5}{30}&\frac{5.5}{30}&\frac{5.5}{30} \\ \frac{4.\overline{3}}{30}&\frac{4.\overline{3}}{30}&\frac{4.\overline{3}}{30} &\frac{5.\overline{6}}{30}&\frac{5.\overline{6}}{30}&\frac{5.\overline{6}}{30} \\ \frac{4.1\overline{6}}{30}&\frac{4.1\overline{6}}{30}&\frac{4.1\overline{6}}{30} &\frac{5.8\overline{3}}{30}&\frac{5.8\overline{3}}{30}&\frac{5.8\overline{3}}{30} \\ \frac{4}{30}&\frac{4}{30}&\frac{4}{30}&\frac{6}{30}&\frac{6}{30}&\frac{6}{30} \end{array}\right]$.
- $\mathfrak{S}_{([6], m)} = \left\{\left(2,2,3,4,6,6,6\right), \ldots, \left(6,6,6,4,3,2,2\right)\right\}$.
- $P\left[X = x\right] = 59251/36905625$.
Final Words
I know my answer was very long (& went far beyond what OP asked for) but this had been flying around inside my head for quite some time & this q seemed like the most suitable landing strip.
I performed the last 6 calculations using the function gmdPmf (which I defined in Mathematica)...
(* GENERALIZED MULTINOMIAL DISTRIBUTION (GMD) *)
(* Note: mXn = # rows X # columns. *)
gmdPmf[
x_ (* Responses of category j, after t trials have taken place. *),
p_ (* Matrix (tXm) holds p_{trial i, category j} = P["Response of trial i is category j"]. *)
] := Module[{t, c, ⦋c⦌, allRPs, desiredRPs, count = 0, sum = 0, product = 1},
t = Total[x]; (* # trials. *)
c = Length[x]; (* # categories. *)
⦋c⦌ = Range[c]; (* Categories. *)
allRPs = Tuples[⦋c⦌,t]; (* Matrix (c^tXt) holds all the response patterns given that t trials have occurred. *)
desiredRPs = {}; (* Matrix ((x_1,x_2,...,x_c) !Xt) holds the desired response patterns; subset of allRPs wrt n. *)
For[i = 1, i <= Length[allRPs], i++,
For[j = 1, j <= c, j++, If[Count[allRPs[[i]],⦋c⦌[[j]]] == x[[j]], count++];];
If[count == c, AppendTo[desiredRPs, allRPs[[i]]]];
count = 0;
];
For[i = 1, i <= Length[desiredRPs], i++,
For[j = 1, j <= t, j++, product *= (p[[j]][[desiredRPs[[i]][[j]]]]);];
sum += product;
product = 1;
];
sum
];
(* ANSWERS *)
Print["a1: P[X = x] = ", gmdPmf[{0, 1}, {{12/30, 18/30}}], "."];
Print["a2: P[X = x] = ", gmdPmf[{0,1, 0, 0, 0, 0}, {{5/30, 5/30, 5/30, 5/30, 5/30, 5/30}}], "."];
Print["a3: P[X = x] = ", gmdPmf[{0,1, 0, 0, 0, 0}, {{4/30, 4/30, 4/30, 6/30, 6/30, 6/30}}], "."];
Print["a4: P[X = x] = ", gmdPmf[{2, 5}, ArrayFlatten[ConstantArray[{{12/30, 18/30}}, {7, 1}]]], "."];
Print["a5: P[X = x] = ", gmdPmf[{0, 2, 1, 1, 0, 3}, ArrayFlatten[ConstantArray[{{4/30, 4/30, 4/30, 6/30, 6/30, 6/30}}, {7, 1}]]], "."];
p = {}; For[i = 1, i <= 7, i++, l = ((31/2) - (1/2)*i)/30; r = ((29/2) + (1/2)*i)/30; AppendTo[p,{l,r}];]; Print["a6: P[X = x] = ", gmdPmf[{2, 5}, p], "."];
p = {}; For[i = 1, i <= 7, i++, l = ((31/6) - (1/6)*i)/30; r = ((29/6) + (1/6)*i)/30; AppendTo[p,{l,l,l,r,r,r}];]; Print["a7: P[X = x] = ", gmdPmf[{0, 2, 1, 1, 0, 3}, p], "."];
Clear[gmdPmf];
Please edit, if you know of any ways to make it shorter/faster. Congrats, if you made it to the end! (:
I have come across very few resources on the topic. Such a problem is called Generalized Poisson's distribution or GPB. I am listing a few of them here to help others.
- An old paper describing the calculation of its approximate pdf.
- A 2018 paper describing an algorithm to compute its PDF using Fourier Transforms.
- An R package implementing the same.