Why does marginalization of a joint probability distribution use sums?
Suppose $X$ can be either $1$, $2$, $3$ or $4$, and $Y$ can be either $1$, $2$, or $3$. What is $\Pr(X=1)$? It is a marginal probability. And it is \begin{align} \Pr(X=1) & = \Pr \Big( (X=1 \text{ and } Y=1)\text{ or }(X=1 \text{ and } Y=2) \text{ or }(X=1 \text{ and } Y=3) \Big) \\[10pt] & = \Pr(X=1\ \text{and } Y=1) + \Pr(X=1\ \text{and } Y=2) + \Pr(X=1\ \text{and } Y=3). \end{align} This is a sum of values of the joint probability distribution.