What are efficient pooling designs for RT-PCR tests?

Let me get started with a small take at question 3, by proving that for $v\le 6$, the complete quadrilateral is optimal.

First, for $v\in\{1,2,3\}$ it is clear that no pooling design can have compression rate $r<1$ (so trivial is optimal). For example for $v=3$, we need to distinguish at least $5$ situations (no positives, at least $2$ positives, and $3$ possible single positives), so $2$ bits of information cannot be enough and we must have $e\ge 3$.

Thus $v=4$ is the first case where the trivial bound does not preclude a pooling design of interest (we need to distinguish $6$ situations, leading to the bound $e\ge3$). However:

Proposition. There are no pooling design with $v=4$ and $r<1$.

Proof. Assume $(V,E)$ is a pooling design with $V=\{1,2,3,4\}$ and $e=3$. If an element of $E$ is a singleton, then removing it from $E$ and its element from $V$ would give a pooling design with $v=3$ and $e=2$, which is impossible. If two elements $p,q$ of $E$ are contained one in the other, $p\subset q$, then replacing $q$ with $q\setminus p$ gives a pooling design (more information is carried by the results of $(p,q\setminus p)$ than by the results of $(p,q)$).

We can thus assume that no element of $E$ is a singleton, and no element of $E$ contains another (these are general arguments than can be used more widely).

In particular, all elements of $E$ have $2$ or $3$ elements.

No vertex can belong to all edges, since otherwise the positivity of this vertex would entail positivity of all edges, an event that cannot be distinguished from all vertices being positives.

No vertex $a$ can be contained in only one edge, otherwise the positivity of another vertex $b$ of this edge could not be distinguished from the positivity of $a$ and $b$.

It follows that all vertices must have degree exactly $2$. The total degree is thus $8$, and we must have two elements of $E$ of cardinal $3$ and the last one of cardinal $2$. But then the two largest edges must have two elements in common, which thus have the same link, a contradiction. $\square$

The same arguments lead to:

Proposition. A pooling design with $v=5$ must have $e\ge 4$.

Note that $(v,e) = (5,4)$ can be realized by removing a vertex from the complete quadrilateral.

Proof. Assume that $(V,E)$ is a pooling design with $v=5$ and $e=3$. Then its edges have cardinal $2,3$ or $4$ and its vertices all have degree $2$. The total degree is $10$, which can be achieved in two ways.

First, the decomposition $10=4+4+2$, i.e. two edges have $4$ elements each. But then these edges have two elements in common, which cannot be distinguished since they have degree $2$.

Second, the decomposition $10=4+3+3$. Then letting $V=\{1,2,3,4,5\}$ and $E=\{p,q,r\}$ with $p=\{1,2,3,4\}$, we must have $5^* = \{q,r\}$. Each of $q$ and $r$ have $3$ elements, including $5$. Therefore, up to symmetry, $q=\{1,2,5\}$ and $r=\{3,4,5\}$. Then $1^*=2^*$ and $3^*=4^*$, impossible. $\square$

Corollary. The complete quadrilateral is optimal for order $6$. For order $v< 6$, the only other pooling design with compression rate $r<1$ is obtained by removing one vertex from the complete quadrilateral.


This isn't a full answer, but too long for a comment. I suppose it comes closest to trying to answer Question 3 or the general question of whether the hypercube design can be improved.

Definition Given a hypergraph $G=(\{v_1, \dots, v_n\}, E)$, the dual of $G$ is the hypergraph $H$ with $V(H)=E(G)$ and $E(H)=\{\{e\in E(G): v_i\in e\}: i\in [k]\}$ (in other words, each edge of $H$ is a maximal collection of edges from $G$ which are incident with a single vertex).

Let $H_{n,k}$ be the dual of $K_n^{k}$, the complete $k$-regular hypergraph on $n$ vertices. Note that the dual of $H_{n,k}$ is isomorphic to $K_n^k$.

(It seems to me that this hypergraph must have been studied before, but I couldn't find any reference to it. One possible lead is that $H_{4,2}$ is what you call the complete quadrilateral.)

Claim 1. $H_{n,k}$ is a $\binom{n-1}{k-1}$-uniform $k$-regular hypergraph with $\binom{n}{k}$ vertices and $n$ edges.

Proof. In $K_n^k$, every vertex is incident with $\binom{n-1}{k-1}$ edges, every edge has order $k$, there are $\binom{n}{k}$ edges, and $n$ vertices.$\square$

Claim 2. $H_{n,k}$ is a pooling design.

Proof. Every vertex in $H_{n,k}$ is incident with $k$ edges, so $|x^*|=k$. If $X$ is a set of vertices with $|X|>1$ (which corresponds to a set of more than one edge in $K_n^k$, which spans more than $k$ vertices in $K_n^k$) then $|X^*|>k$. So $x^*\neq X^*$ if $|X|>1$.$\square$

The compression rate of $H_{n,k}$ is $\frac{n}{\binom{n}{k}}$ which is minimized when $k=\lfloor{n/2}\rfloor$. Also note that ratio of the uniformity to the number of vertices is $\binom{n-1}{k-1}/\binom{n}{k}=k/n$. So there is a tradeoff when minimizing the compression rate, since the uniformity and degree increase when we increase $k$.

Some more examples: $H_{5,2}$ is 4-uniform with 10 vertices and 5 edges giving a compression ratio of $1/2$. $H_{6,3}$ is 10-uniform with 20 vertices and 6 edges, giving a compression ratio of $3/10$. $H_{7,3}$ is 15-uniform with 35 vertices and 7 edges, giving a compression ratio of $1/5$. Note that the hypercube design with $D=3$ is 9-regular with 27 vertices and 9 edges and thus a compression ratio of 1/3, so $H_{6,3}$ and $H_{7,3}$ compare favorably in this case.

Update 1. (It seems best to update my previous answer rather than write a new one.)

After thinking it over some more, I think I have an alternate characterization of pooling designs which both makes it easier to check if $H$ is a pooling design and elucidates some features of pooling designs. In particular, this gives a simple proof of the Propositions in your answer.

Claim 3 $H$ is a pooling design if and only if $x^*\not\subseteq y^*$ for all distinct $x,y\in V(H)$.

Proof. ($\Rightarrow$) Suppose there exists distinct $x,y\in V(H)$ such that $x^*\subseteq y^*$. Then $y^*=\{x,y\}^*$ and thus $H$ is not a pooling design.

($\Leftarrow$) Suppose $H$ is not a pooling design; that is, suppose there exists $y\in V(H)$ and $Y\subseteq V(H)$ with $Y\neq \{y\}$ such that $y^*=Y^*$. Since $Y\neq \{y\}$, there exists $x\in Y$ such that $x\neq y$. Since $x\in Y$, we have $x^*\subseteq Y^*=y^*$. $\square$

Corollary 1 Let $H$ be a hypergraph and let $G$ be the dual of $H$. $H$ is a pooling design if and only if $e\not\subseteq f$ for all distinct $e,f\in E(G)$.

Proof. ($\Rightarrow$) Suppose $H$ is a pooling design. Choose distinct $e,f\in E(G)$ which correspond to distinct $x, y\in V(H)$ respectively. Since $x^*\not\subseteq y^*$, we have $e\not\subseteq f$.

($\Leftarrow$) Suppose $e\not\subseteq f$ for all distinct $e,f\in E(G)$. Choose distinct $x,y\in V(H)$ which correspond to distinct $e,f\in E(G)$. Since $e\not\subseteq f$, we have $x^*\not\subseteq y^*$. $\square$

Corollary 2 Let $H$ be a hypergraph with $e$ edges and $n$ vertices such that $\binom{e}{\lfloor{e/2}\rfloor}<n$. Then $H$ is not a pooling design.

Proof. Let $G$ be the dual of $H$ and note that $G$ has $e$ vertices and $n$ edges. Since $|E(G)|=n>\binom{e}{\lfloor{e/2}\rfloor}=\binom{|V(G)|}{\lfloor{|V(G)|/2}\rfloor}$, Sperner's theorem implies that there exists distinct $e,f\in E(G)$ such that $e\subseteq f$. Thus $H$ is not a pooling design by Corollary 1. $\square$

In particular, this proves that every pooling design on $4\leq n\leq 6$ vertices has at least 4 edges, every pooling design on $7\leq n\leq 10$ vertices has at least 5 edges, etc.

Update 2.

Again, after considering some more, I now think it's clearer to just remain in the setting of the hypergraph $G$ and forget about taking the dual.

For example, let's compare the $K_8$-design to the hypercube design with $D=3$. In the $K_8$-design, each edge is a sample (there are 28), each vertex is a test pooling the samples which are incident with that vertex (there are 8), each test pools 7 samples (since the degree of each vertex is 7), and each sample will be used twice (since $K_8$ is 2-uniform). As I mentioned in a comment, this is better than the $D=3$ hypercube design in every parameter. Also you can see that if exactly one sample is infected, say the edge $\{i,j\}$, then exactly two tests (test $i$ and test $j$) will come back positive.

For another example, let's compare the $K_{13}$-design to the hypercube design with $D=4$. The $D=4$ hypercube design handles 81 samples using 12 tests each of which has size 27 and each sample is used 4 times. The $K_{13}$-design handles 78 samples using 13 tests, but each test has size 12 and each sample is only used 2 times.

For a final example, let's compare the $K_{9,9}$-design (that is, a complete bipartite graph with 9 vertices in each part) to the $D=4$ hypercube design. The $K_{9,9}$-design handles 81 samples using 18 tests, each of which has size 9 and each sample is used 2 times; however, this design has the additional feature that if three tests come back positive, then we will know exactly which two samples are infected. Neither the $K_{13}$-design, nor the $D=4$ hypercube design have that property.

Update 3

Given this alternate way of thinking about pooling designs, the detection capacity of $G$ can be defined as the largest integer $c$ such that no edge $e\in E(G)$ is contained in the union of at most $c$ edges of $E(G)\setminus \{e\}$. So if we want a pooling design with testing capacity $c$ which uses $t$ tests, we want a hypergraph on $t$ vertices with as many edges as possible such that no edge $e\in E(G)$ is contained in the union of at most $c$ edges of $E(G)\setminus \{e\}$. It turns out that this problem was studied in Erdős, Paul; Frankl, P.; Füredi, Z., Families of finite sets in which no set is covered by the union of (r) others, Isr. J. Math. 51, 79-89 (1985). ZBL0587.05021.


If you are thinking about the realistic problem for COVID-19, then it is different from your mathematical question. I tried to make a summary about the real question: https://arxiv.org/pdf/2005.02388.pdf