In the estimation of proportions by pooled testing, the MLE is biased, and several methods of correcting the bias have been presented in previous studies. We propose a new estimator based on the bias correction method introduced by Firth (Biometrika 80:27–38, 1993), which uses a modification of the score function, and we provide an easily computable, Newton–Raphson iterative formula for its computation. Our proposed estimator is almost unbiased across a range of problems, and superior to existing methods. We show that for equal pool sizes the new estimator is equivalent to the estimator proposed by Burrows (Phytopathology 77:363–365, 1987). The performance of our estimator is examined using pooled testing problems encountered in plant disease assessment and prevalence estimation of mosquito-borne viruses.

Pooled testing (also known as group testing) occurs when individuals from a population are pooled together and tested as a group for the presence of an attribute, usually a pathogen. Since its introduction to the statistical literature by

Research in pooled testing diverges into two areas—classification, in which the purpose is to identify the positive units, and estimation, where the aim is to estimate the proportion of positives (

The maximum likelihood estimator (MLE) of

It is important that less biased alternatives to the MLE be available for the wide range of pooled testing scenarios that occur in practice. In this paper we describe estimators which are almost unbiased, and also have smaller mean squared error than the MLE. We first consider work that has already been done in this area, and then propose a new estimator based on the bias reduction method introduced by

Suppose that for _{i} pools of size _{i} are tested, of which _{i} = _{i} pools are positive. Assuming that the individuals in the pools follow i.i.d. Bernoulli distributions with parameter _{i} are _{i} and 1 − (1 − ^{mi}, _{i} assumed fixed and known. The log-likelihood is therefore
_{1},…, _{d}). The MLE of

^{−1}) is eliminated when

For equal pool sizes, therefore, the Burrows correction produces an estimator which, for all practical purposes, is effectively unbiased. This perhaps explains the lack of proposed alternatives, though there have been a few. One is the jackknife, mentioned above; another is an empirical Bayes estimator proposed by

The only serious competitor to the Burrows estimator is the general bias correction described by _{i} and a single parameter, which matches the usual pooled testing framework we study here. Except for terms of

For equal pool sizes,

The Gart correction has the disadvantage of not providing a result when

For unequal pool sizes, the Gart correction was also effective. However, their main evaluation involved only one pooled testing procedure, comprising 8 pools of 20 and 8 pools of 5. The context for that particular evaluation was the estimation of virus prevalence in a carnation population, from which 200 plants were sampled, and tested in pools using ELISA. In the present study we evaluate a range of pooled testing scenarios, including larger ones typical of the monitoring of mosquito-borne viruses.

Because of the effectiveness of the Burrows correction for equal pool sizes, an extension of it to unequal pool sizes is desirable. No extension or generalization has yet been derived, however, as even obtaining

_{i} = _{i} −_{i} as the number of negative pools for _{i} in (_{i} + _{i} and _{i} by _{i} + _{i}, where _{i} must equal _{i}, or why either quantity has to equal

To solve _{k} and _{k+1} is less than some desired tolerance. For the case of all positive pools, the starting value for the iteration should be

For equal pool sizes, expressions

We now compare Gart’s and Firth’s bias correction methods in some detail using the pooled testing example described above. There were 200 individuals grouped into 8 pools of 20 and 8 pools of 5, for which we adopt the notation ^{n} = 200 : 5^{8} 20^{8}. It is useful to test the methods on a small example such as this, because we cannot rely on asymptotic properties to rescue them from poor performance. It is instructive first to examine the estimates themselves for a range of outcomes; these are presented in

For evaluation of the bias, there is not much to be gained by examining the entire range of

We now consider a “medium-sized” example, representative of some procedures used by the CDC in assessing virus infection rates in mosquitoes. This example has 5 pools of 5, 5 pools of 10, 5 pools of 25, and 6 pools of 50, which we write ^{n} = 500 : 5^{5} 10^{5} 25^{5} 50^{6}.

Another medium-sized example is the problem described by ^{n} = 700 : 100^{7}. We do not provide details of the bias here, but overall Firth’s estimator performs better than Gart’s. At

We now consider a range of larger examples, with ^{n} and the resulting bias patterns. One of the plots arises from equal pool sizes, two of them from 2 pool sizes, two from 3 pool sizes, and one from 4.

Some trends emerge in these results. The most obvious and important is that, while the mean percentage (absolute) bias is small for both methods (generally

The “worst” bias (i.e., at the highest prevalence consistent with the testing procedure) is always much smaller for Firth’s method. It is always negative for Gart and usually positive for Firth. In percentage terms, the difference between the methods for worst bias decreases with increasing number of pools and increases with average pool size.

The average RMSE is always very slightly larger for Firth’s method, but the difference is of not practical consequence. For either method, it is generally worse (around 0.03) for procedures involving small pool sizes. However, this is still only about half the corresponding RMSE for the MLE.

We have considered bias correction in estimation of proportions by pooled testing, in which the MLE is clearly unacceptably biased for routine applications. We have proposed a new estimator based on the general bias reduction method applied to MLEs described by

Firth’s method has been applied to a range of estimation problems. One study of interest in the current context is that of

If MSE or RMSE (which is composed largely of variance here) was used as the main criterion for choosing an estimator, we might place the Gart and Firth methods on an equal footing. As

We have assumed in this study that positive and negative bias are of equal detriment to an estimator. The fact that the corrected estimator has negative bias for Gart’s method (i.e., it is a slight over-correction) and generally positive bias for Firth’s method has therefore not been a consideration in recommending the Firth method.

Computation is not a major issue in deciding on an appropriate estimator. Estimation of ^{n} = 1000 : 5^{10} 10^{10} 25^{10} 50^{12}, for which there are 17303 outcomes. If the number of different group sizes was large, this was accentuated. However, the computation of the estimates themselves took very little time, and so provided a practitioner has access to statistical software, bias-corrected estimates can be found readily. R code to implement the methods in this paper is available from the authors, and for Firth’s estimator, R code implementing the Newton–Raphson iteration is provided in an online supplement accompanying the article at the journal’s website.

Bias and root mean squared error of estimators corrected by either Gart’s or Firth’s method, for pooled testing with ^{n} = 500 : 5^{5} 10^{5} 25^{5} 50^{6}. Gart=

Bias of estimators over

Estimates of

Number of positive groups (_{1}, _{2}) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|

Method | (1, 2) | (4, 0) | (2, 5) | (3, 7) | (6, 4) | (5, 7) | (7, 5) | (7, 8) | (8, 7) | (8, 8) |

MLE | 0.016 | 0.025 | 0.042 | 0.067 | 0.085 | 0.099 | 0.128 | 0.205 | 0.341 | 1 |

Gart | 0.015 | 0.024 | 0.039 | 0.062 | 0.079 | 0.091 | 0.116 | 0.180 | 0.291 | – |

Firth | 0.015 | 0.024 | 0.040 | 0.064 | 0.080 | 0.093 | 0.118 | 0.187 | 0.296 | 0.455 |

Bias of estimators corrected by either Gart’s or Firth’s method, for testing 8 pools of 20 and 8 pools of 5.

MLE | Gart | Firth | |||||||
---|---|---|---|---|---|---|---|---|---|

% bias | RMSE | % bias | RMSE | % bias | RMSE | ||||

0.01 | 0.0105 | 4.6 | 0.0077 | 0.0100 | −0.06 | 0.0074 | 0.0100 | 0.13 | 0.0074 |

0.02 | 0.0210 | 5.0 | 0.0115 | 0.0200 | −0.07 | 0.0108 | 0.0200 | 0.15 | 0.0109 |

0.03 | 0.0317 | 5.5 | 0.0150 | 0.0300 | −0.08 | 0.0139 | 0.0301 | 0.17 | 0.0139 |

0.04 | 0.0424 | 6.0 | 0.0184 | 0.0400 | −0.09 | 0.0168 | 0.0401 | 0.19 | 0.0168 |

0.05 | 0.0533 | 6.6 | 0.0219 | 0.0499 | −0.10 | 0.0196 | 0.0501 | 0.22 | 0.0197 |

0.07 | 0.0755 | 7.8 | 0.0297 | 0.0699 | −0.14 | 0.0256 | 0.0702 | 0.25 | 0.0258 |

0.10 | 0.1099 | 9.9 | 0.0450 | 0.0998 | −0.22 | 0.0357 | 0.1003 | 0.25 | 0.0359 |

0.15 | 0.1715 | 14.3 | 0.0927 | 0.1494 | −0.41 | 0.0554 | 0.1501 | 0.09 | 0.0558 |

0.20 | 0.2448 | 22.4 | 0.1703 | 0.1983 | −0.85 | 0.0752 | 0.1994 | −0.30 | 0.0758 |

0.25 | 0.3363 | 34.5 | 0.2586 | 0.2459 | −1.64 | 0.0910 | 0.2474 | −1.05 | 0.0913 |

0.30 | 0.4447 | 48.2 | 0.3391 | 0.2910 | −3.00 | 0.1003 | 0.2927 | −2.42 | 0.1000 |

Mean percentage bias, RMSE, and bias (×10^{4}) at

^{n} | Gart | Firth | ||||||
---|---|---|---|---|---|---|---|---|

Mean | Mean | Bias at | Mean | Mean | Bias at | |||

500 | 5^{100} | 0.506 | 0.020 | 0.0288 | −9.81 | 0.015 | 0.0289 | 2.12 |

500 | 10^{50} | 0.248 | 0.085 | 0.0220 | −13.04 | 0.032 | 0.0222 | 0.57 |

500 | 20^{25} | 0.103 | 0.224 | 0.0137 | −9.80 | 0.066 | 0.0139 | 0.13 |

500 | 50^{10} | 0.027 | 0.735 | 0.0063 | −4.81 | 0.200 | 0.0065 | 0.22 |

500 | 100^{5} | 0.008 | 2.069 | 0.0033 | −2.46 | 0.553 | 0.0034 | 0.27 |

1000 | 20^{50} | 0.133 | 0.122 | 0.0123 | −9.46 | 0.033 | 0.0125 | −0.10 |

1000 | 100^{10} | 0.013 | 0.791 | 0.0033 | −2.52 | 0.204 | 0.0034 | 0.10 |

1000 | 5^{200} | 0.569 | 0.012 | 0.0231 | −8.53 | 0.009 | 0.0232 | 1.75 |

5000 | 5^{1000} | 0.687 | 0.004 | 0.0135 | −6.17 | 0.003 | 0.0135 | 1.21 |

1000 | 5^{100} 50^{10} | 0.506 | 0.023 | 0.0286 | −9.81 | 0.019 | 0.0287 | 2.12 |

1000 | 25^{20} 50^{10} | 0.078 | 0.251 | 0.0099 | −8.61 | 0.067 | 0.0101 | −0.20 |

5000 | 5^{500} 50^{50} | 0.641 | 0.006 | 0.0170 | −7.09 | 0.005 | 0.0171 | 1.41 |

5000 | 25^{100} 50^{50} | 0.132 | 0.076 | 0.0083 | −7.90 | 0.018 | 0.0084 | −0.33 |

1000 | 10^{20} 25^{8} 50^{12} | 0.180 | 0.203 | 0.0219 | −15.18 | 0.024 | 0.0222 | −1.12 |

1000 | 10^{50} 25^{12} 50^{4} | 0.248 | 0.085 | 0.0207 | −13.12 | 0.026 | 0.0209 | 0.23 |

1000 | 5^{100} 10^{40} 25^{4} | 0.507 | 0.019 | 0.0262 | −9.74 | 0.014 | 0.0263 | 1.98 |

5000 | 10^{200} 25^{60} 50^{30} | 0.344 | 0.032 | 0.0155 | −11.33 | 0.010 | 0.0156 | 0.18 |

1000 | 5^{10} 10^{10} 25^{10} 50^{12} | 0.261 | 0.199 | 0.0335 | −19.35 | 0.037 | 0.0339 | −0.46 |

1000 | 5^{20} 10^{40} 25^{12} 50^{4} | 0.350 | 0.065 | 0.0283 | −14.37 | 0.043 | 0.0286 | 3.49 |

5000 | 10^{50} 25^{40} 50^{30} 100^{20} | 0.248 | 0.080 | 0.0187 | −13.17 | 0.019 | 0.0188 | −0.28 |