This is an Open Access article distributed under the terms of the Creative Commons Attribution License (

Cluster surveys are frequently used to measure key nutrition and health indicators in humanitarian emergencies. The survey design of 30 clusters of 7 children (30 × 7) was initially proposed by the World Health Organization for measuring vaccination coverage, and later a design of 30 clusters of 30 children (30 × 30) was introduced to measure acute malnutrition in emergency settings. Recently, designs of 33 clusters of 6 children (33 × 6) and 67 clusters of 3 children (67 × 3) have been proposed as alternatives that enable measurement of several key indicators with sufficient precision, while offering substantial savings in time. This paper explores expected effects of using 67 × 3, 33 × 6, or 30 × 7 designs instead of a "standard" 30 × 30 design on precision and accuracy of estimates, and on time required to complete the survey.

The 67 × 3, 33 × 6, and 30 × 7 designs are expected to be more statistically efficient for measuring outcomes having high design effects (e.g., vaccination coverage, vitamin A distribution coverage, or access to safe water sources), and less efficient for measuring outcomes with more within-cluster variability, such as global acute malnutrition or anemia. Because of small sample sizes, these designs may not provide sufficient levels of precision to measure crude mortality rates. Given the small number (3 to 7) of survey subjects per cluster, it may be hard to select representative samples of subjects within clusters.

The smaller sample size in these designs will likely result in substantial time savings. The magnitude of the savings will depend on several factors, including the average travel time between clusters. The 67 × 3 design will provide the least time savings. The 33 × 6 and 30 × 7 designs perform similarly to each other, both in terms of statistical efficiency and in terms of time required to complete the survey.

Cluster designs discussed in this paper may offer substantial time and cost savings compared to the traditional 30 × 30 design, and may provide acceptable levels of precision when measuring outcomes that have high intracluster homogeneity. Further investigation is required to determine whether these designs can consistently provide accurate point estimates for key outcomes of interest. Organizations conducting cluster surveys in emergency settings need to build their technical capacity in survey design to be able to calculate context-specific sample sizes individually for each planned survey.

Over the past three decades, field surveys using a cluster sample design have become a standard, popular, and widely used method of measuring key nutrition and health indicators in humanitarian emergencies [

In this paper, we will explore the expected statistical effects on precision of using the newly proposed 33 × 6 and 67 × 3 designs as well as the "old" 30 × 7 EPI design, as compared with the conventional 30 × 30 design. We will then discuss the situations and outcomes for which these designs may be most statistically efficient (i.e., may result in the least loss of precision compared to a 30 × 30 design); offer some thoughts on the potential of these designs to introduce bias into the prevalence estimates; and explore situations where these designs can offer the most substantial savings of time and resources.

A simplified formula used for calculating the width of the one side of the two-sided 95% confidence interval in cluster surveys is [

where d = the width of one side of the two-sided 95% confidence interval

n = sample size

p = the prevalence of the outcome being measured

DEFF = design effect

Z = z value (generally 1.96)

The design effect is the ratio of the variance of the estimate under the actual (for example, cluster) design to the variance of the estimate assuming that the same data have been collected by simple random sampling [

Therefore, the smaller the cluster size, the smaller the design effect. Decreasing the design effect through decreasing the cluster size may, to a certain degree, offset the loss of precision resulting from decreasing the overall sample size [

Using this value of Rho, we can then calculate an expected DEFF for 33 × 6, 67 × 3, and 30 × 7 designs:

_{33 × 6 }= 1 + 0.0345*(6-1) = 1.1725

_{67 × 3 }= 1 + 0.0345*(3-1) = 1.0690

_{30 × 7 }= 1 + 0.0345*(7-1) = 1.2070

As can be seen, expected design effects for these three designs are substantially lower than the design effect of 2 observed in a 30 × 30 survey.

We can now calculate the expected relative change in the width of the confidence interval of the 33 × 6 design compared to the 30 × 30 design as a ratio of sqrt(DEFF_{33 × 6}/n_{33 × 6}) to sqrt(DEFF_{30 × 30}/n_{30 × 30}), since p and Z do not change:

Therefore, the 95% confidence interval in the 33 × 6 design is expected to be 63% wider than the 95% confidence interval in a 30 × 30 design.

Table

Expected change in precision for 67 × 3, 33 × 6 and 30 × 7 designs compared with 30 × 30 design

Rho | 67 × 3 design | 33 × 6 design | 30 × 7 design | ||||

Expected | ^{1 } | Expected | ^{1 } | Expected | ^{1 } | ||

1 | 0 | 1 | +111.6 | 1 | +113.2 | 1 | +107.0 |

1.25 | 0.0086 | 1.0172 | +90.9 | 1.0431 | +94.7 | 1.0517 | +89.9 |

1.5 | 0.01724 | 1.0345 | +75.7 | 1.0862 | +81.4 | 1.1034 | +77.6 |

2 | 0.03448 | 1.0690 | +54.7 | 1.1724 | +63.2 | 1.2069 | +60.8 |

2.5 | 0.05172 | 1.1034 | +40.6 | 1.2586 | +51.3 | 1.3103 | +49.9 |

3 | 0.06897 | 1.1379 | +30.3 | 1.3448 | +42.7 | 1.4138 | +42.1 |

4 | 0.10345 | 1.2069 | +16.2 | 1.5172 | +31.3 | 1.6207 | +31.8 |

5 | 0.13793 | 1.2759 | +6.9 | 1.6897 | +23.9 | 1.8276 | +25.2 |

7 | 0.20690 | 1.4138 | -4.9 | 2.0345 | +14.9 | 2.2414 | +17.1 |

10 | 0.31034 | 1.6207 | -14.8 | 2.5517 | +7.7 | 2.8621 | +10.8 |

^{1 }Relative expected change in the width of the 95% confidence interval compared to the 30 × 30 design

Figure

The width of the 95% confidence interval for different cluster designs given 15% prevalence of the measured outcome.

In practical terms, these results mean that 67 × 3, 33 × 6, and 30 × 7 designs are most statistically efficient for measuring outcomes or indicators that have high intracluster correlation, and that they are not as statistically efficient (resulting in substantial loss of precision compared to that of the 30 × 30 design) for outcomes with a low degree of clustering. Unfortunately, the body of literature that provides guidance on design effects for different outcomes measured in cluster surveys in emergencies is very limited, and design effects are not routinely included in survey reports. Two recent papers [

As a result of small overall sample sizes (around 200 households), 67 × 3, 33 × 6, and 30 × 7 designs are not likely to provide sufficient levels of precision to measure crude or under-5 mortality rates. For example, assuming that the true under-5 mortality rate in the population is 1.0 per 10,000 per day, a recall period of 3 months, and on average 1 child under 5 years of age per household, the expected number of under-5 deaths detected in the 33 × 6, 30 × 7, or 67 × 3 surveys is only about 1.8. This will result not only in a wide confidence interval, even if DEFF for mortality is low, but also in unstable point estimates.

It is also important to consider the minimum desired level of precision in each concrete situation. For example, if in the scenario presented in Figure

One important question is whether designs in which the number of survey subjects per cluster is low (3 in 67 × 3 to 7 in 30 × 7) can reliably ensure that the subjects sampled in each cluster are representative of that cluster. With only 3 survey subjects per cluster, such representativeness may be hard to achieve, especially if the EPI method, rather than random sampling, is used for selecting households or subjects within a cluster [

One way to assess the seriousness of this problem is to compare the prevalence estimates obtained from a 67 × 3 or a 33 × 6 survey to those obtained by using a 30 × 30 design, given that the same population is sampled. One recent paper [

One attractive feature of the survey designs discussed in this paper is the dramatically reduced sample size (from 900 in a 30 × 30 survey to around 200). Time and resource savings resulting from this more than four-fold decrease in sample size will likely be one of the main arguments for wider field application of these designs.

In a simplified form, the time needed for a team to complete a survey in the field (assuming that teams do not need to return to their base at the end of each day) can be expressed as:

_{1 }+ (k+1)*t_{2 }+ n*t_{3}

where:

T – total time to complete a survey for one team

t_{1 }– average time needed for initial introduction of team to the community in each cluster and initial selection of households within cluster

t_{2 }– average time of travel between clusters or between cluster and base

t_{3 }– average time to complete survey for one survey subject/household, including travel between houses

k – number of clusters

n – total sample size

Mathematically, it can be shown that relative time savings in 67 × 3, 33 × 6, and 30 × 7 designs, compared to 30 × 30 design, increase with increasing t_{3 }and decrease with increasing t_{1 }and t_{2}. Since the travel time is likely to vary the most from survey to survey, it would be interesting to explore how the time savings change, depending on travel time. Figure _{2}), varying from 15 minutes to 6 hours (assuming both t_{1 }and t_{3 }to be 30 minutes). As can be seen, the biggest time savings (3 to 4 times less than for the 30 × 30 design) are expected when the average travel times between clusters are small (15 minutes to 1 hour). With increasing average travel time between clusters, relative time savings decrease, especially for the 67 × 3 design, which at longer travel times (5–6 hours) offers little time advantage over the 30 × 30 design. For longer travel times, the 67 × 3 cluster design may present an additional disadvantage of higher fuel costs, since the teams need to travel long distances to 66 instead of 30 different locations. The 33 × 6 and 30 × 7 designs produce very similar time savings, and at longer travel times (4–6 hours) they would still require less than half the time needed to complete a 30 × 30 survey. Therefore, in situations where travel times between clusters are long, logistically 33 × 6 and 30 × 7 designs would be preferable to the 67 × 3 design.

Time needed to complete a survey for different cluster designs depending on average travel time between clusters.

It should be emphasized, however, that the scenario presented above is fairly simple in that it assumes that the team does not need to return to the base at the end of each day and can spend nights in the field, either in or between the clusters it is surveying. Also, it assumes that introduction of the team to the community and initial household selection within a cluster are not very time-consuming. In cases when mapping or enumeration of the households within each cluster needs to be carried out as part of random selection procedures, this initial phase will likely take much longer than 30 minutes. The quality of roads in the survey area and vehicles available to survey teams may have a substantial impact on travel time. Therefore, a time estimation exercise that takes into account specific circumstances of each planned survey needs to be carried out in the planning phase so as to evaluate expected time requirements associated with each of the alternative designs.

Overall, cluster designs with substantially decreased cluster size and overall sample size, such as the 67 × 3, 33 × 6, and 30 × 7 designs discussed in this paper, may offer substantial time and cost savings compared to the traditional 30 × 30 design, and they may provide levels of precision that are acceptable (or at least comparable to the 30 × 30 design) for measuring some common outcomes of interest in humanitarian emergencies. This is particularly the case for outcomes that tend to have high intracluster homogeneity, such as vaccination coverage, vitamin A supplementation coverage, or access to safe water. Using these designs to measure outcomes that are normally less clustered (e.g, global acute malnutrition) may result in a sizeable loss in precision (resulting in confidence intervals 1.5 to 2 times wider than those obtained with the 30 × 30 design), and may therefore need to be carefully considered vis-à-vis context-specific minimum requirements for precision. Because of the small overall sample size, these designs are not likely to provide sufficient precision to measure crude or under-5 mortality rates. The question of the ability of these designs to consistently provide accurate (or unbiased) point estimates for key indicators routinely measured in humanitarian emergencies remains open, and it is subject to further investigation. Good training and supervision of the teams implementing these designs in the field and use of random rather than EPI methods for household selection will be important for ensuring selection of representative samples within clusters.

Because of substantially reduced overall sample size, these designs will likely result in substantial (2–4 times compared to 30 × 30 design) time savings. The magnitude of these time savings will depend on several factors, including the average time needed to complete a survey for one survey subject or household, the average time to select households within a cluster, and the average time needed to travel between clusters. The 67 × 3 design will provide the least time savings relative to the 30 × 30 design, and it may be logistically difficult to implement if travel distances between clusters are large. The 33 × 6 and 30 × 7 designs are expected to perform similarly to each other, both in terms of statistical efficiency and in terms of time required to complete the survey.

Finally, despite several proposed "standard" designs that can provide the minimum required level of accuracy and precision, some of which were discussed in this paper, the best solution is to calculate sample sizes individually for each planned survey, taking into account the specific indicators being measured, context-specific levels of precision required, logistic and security concerns, availability of suitable personnel, and other factors. Many surveys in emergencies strive to measure multiple outcomes, including nutritional status, mortality, water and sanitation indicators, access to feeding programs and food aid, among others. In these situations, sample sizes to achieve acceptable levels of precision need to be calculated for each of the key indicators [

The author declares that he has no competing interests.

OOB designed the study, performed analysis, wrote and revised the manuscript.

I thank Curtis Blanton, M.S., for his valuable comments to the earlier version of this manuscript.

Disclaimer:

The findings and conclusions in this report are those of the author and do not necessarily represent the views of the Centers for Disease Control and Prevention.