Respondent‐driven sampling (RDS) is a network‐based method for sampling populations for whom a sampling frame is not available (

RDS is primarily used to estimate the prevalences of traits such as diseases and risk factors. Unbiased point and variance estimates of such prevalences from survey samples classically require calculating each participant’s probability of being sampled (“inclusion probability”). Because a sampling frame is not available, hidden population members’ inclusion probabilities cannot be calculated using standard approaches. Therefore, statistical inference from samples collected via RDS relies on models approximating the sampling process that incorporate information about the sample members’ social networks and information observed during the recruitment process.

Multiple evaluations of RDS point estimators and violations of RDS assumptions have been conducted, but significantly less work has examined RDS variance estimators (

Past research on RDS variance estimation suggested that RDS confidence intervals provide unacceptably low coverage rates and that RDS may have extremely large design effects when applied to hidden populations of public health interest (

RDS begins with researchers choosing a small number (usually 5 to 10) of “seed” population members. The seeds are interviewed and given a small number of uniquely numbered coupons with which they can recruit population members they know into the sample (usually 3‐5). Recruited population members are interviewed and given coupons, and the process is repeated until the target sample size is reached. Participants are remunerated both for completing the survey questionnaire and for each eligible population member they recruit.

RDS survey questionnaires and associated biological tests provide data on many characteristics of interest. For the purposes of this article, without loss of generality, we will represent these variables of interest by a two‐valued trait, with values “with trait” and “without trait.” Populations sampled via RDS are connected via social network ties; we will refer to the set of persons, or “nodes,” and ties connecting them as the “population network.” We will refer to the number of ties each person has to other members of the population as that person’s “degree.”

Most RDS point estimators currently used are design‐based, including the Salganik‐Heckathorn (SH) (

Commonly used RDS variance estimators employ a bootstrap resampling approach that approximates the RDS design (

The variability of estimators is typically presented as a standard error or confidence interval (CI), the latter often derived from the former. In RDS, CI’s are typically the metric of choice, as they provide an estimated range of values deemed plausible for the trait of interest. A properly‐calibrated method for computing level- α CIs produces intervals that capture the true population value for an estimand with probability at least (1 – α) (e.g., an α of .05 corresponds to a CI that includes the true population value in 95% of samples). CIs can be calculated from bootstrap variance estimates using a number of methods; the percentile and studentized bootstrap CI methods are most commonly used for RDS data (

Evaluations of RDS point estimators have been conducted both with real RDS samples from non‐hidden populations and with simulated RDS samples, but the accuracy of RDS variance estimators can only be evaluated via simulation. This is because, while it is theoretically possible to know the true value of an estimand in the target population to which to compare point estimators, it is only possible to know the true variability of an estimator in a true population by conducting a large number of independent studies in the same population with the same structure, which is practically infeasible.

Evaluating RDS by simulation consists of three steps: (1) obtaining or creating a population network with certain characteristics, (2) simulating RDS on that network, and (3) applying RDS estimators to the trait of interest in the resulting samples. As these procedures are repeated many times, the resulting distribution of simulated estimates approaches the true sampling distribution of the estimators under the simulation conditions. Therefore, one is able to compare estimates of estimator uncertainty to “true” simulated levels of uncertainty.

Our primary results evaluate the performance of RDS uncertainty estimation based on the performance of the CIs calculated from different point/variance estimator pairs (e.g., SH/Sal‐BS). An estimator pair’s CI

In addition to evaluating variance estimators, for comparison with previous research on RDS uncertainty estimation, we consider RDS

Typical DEs for many complex surveys that did not use RDS are between 1.5 and 2, but for some variables in some studies can range to 5 (

While the DE of a given RDS study in the real world is unknown because it cannot be calculated from the data, we can calculate the DEs for our simulations numerically. We refer to these as the “actual DEs” below. In addition to actual RDS DEs, which previous research has also calculated based on simulations, the DEs

Evaluating RDS via simulation requires obtaining or creating a population network from which to draw samples and simulating RDS on that network. Previous studies have simulated RDS both on real and synthetic population networks. RDS is used to study hidden populations, so an RDS simulation study’s population network should be as similar to real hidden population networks as possible. Unfortunately, complete data for hidden population networks is extremely rare. Complete network data is difficult and expensive to collect in any setting (

Hidden population network data are unavailable, so the real population networks in previous RDS simulation studies have come from a variety of sources (^{th} through 12^{th} grades (the “Add Health” study) (

Such institutional structures are not present for the vast majority of hidden population networks RDS is used to sample. RDS variance is known to be strongly positively related to homophily, the extent to which networks are assortative along characteristics of its members. These school networks demonstrate strong homophily by grade, resulting in networks that may have “bottlenecks” between population sub-groups (

Given the differences between the available population network data and the networks of hidden populations, we based both our simulated population networks and our simulated RDS sampling process on real RDS studies. To maximize the similarity of our simulations to RDS as it occurs in the field, our simulations are designed to reflect RDS as it was used to sample PWID by the CDC’S National HIV Behavioral Surveillance system (NHBS) in 2009 and 2012. NHBS sampled PWID in 20 U.S. cities in both 2009 and 2012 using a standard protocol, resulting in 40 RDS samples (

To create the simulated population networks for our study (step 1 in the 3-step process described above), we first estimated four characteristics of the PWID population in each NHBS city from each of the 40 NHBS samples: the prevalence and homophily for a two‐valued trait of public health interest; the estimated mean degree of population members; and differential activity (DA). Homophily is a measure of assortative mixing in the network defined as the proportion of ties in the network between two respondents who share a trait status relative to what would be expected by chance. DA is measure of one group’s gregariousness compared to another and is defined as the ratio of the mean degrees of population members with and without the trait. Summary statistics of these characteristics can be found in ^{*} Using each of the 40 sets of characteristics, we then simulated 1,000 networks using exponential‐family random graph models (ERGMs) (

We designed the RDS process (step #2 above) used in the simulations to match those observed in the NHBS samples by first measuring the following characteristics for each of the 40 NHBS samples: the sample size, the numbers of seeds with and without the trait, and the distribution of number of recruitments by sample members. Summary statistics of these characteristics can be found in ^{†}

For each of the 1,000 networks corresponding to a given NHBS sample, we simulated one RDS sample using the RDS package in the statistical software R (step #3 above) (

Our analysis compares the coverage rates of the 95% CIs for the four point/variance estimator pairs and two bootstrap CI methods when sampling was with and without replacement, where the coverage rates are calculated as the proportion of the simulations in which the CI contained the true population prevalence of the trait.

We calculate the actual DEs for our simulations numerically as ratio of the variance of the distribution of point estimates across simulations to the SRS variance, where the SRS variance includes a finite population adjustment based on the proportion of the population that was sampled. We calculate the estimated DEs for our simulations as the ratio of the estimated variance to the SRS variance. Each actual DE is calculated as the variance of the 1,000 simulations for each population network. Each estimated DE is calculated from a specific estimator pair applied to a single sampling simulation. Because the actual DE varies in magnitude across population networks, we summarize the estimated DEs’ accuracy by calculating the ratio of each simulation’s estimated DE to the actual DE for that population network. We compare the actual DEs for the four point estimators and also compare the actual DEs to the DEs estimated by the RDS variance estimators.

^{‡} The left panel of

The SS/SS‐BS estimator pair had overall higher coverage than the other two RDS estimator pairs: it only had one instance of coverage below 90%, whereas the SH/Sal‐BS and VH/Sal‐BS coverages were below 90% in five instances. The NHBS sample corresponding to the instance with SS/SS-BS coverage below 90% (86.8%) has trait prevalence of .034 and the smallest sample size (n=210) of all the NHBS samples.

The SH/Sal-BS and VH/Sal-BS had considerably worse coverage rates in 4 sets of simulations (

Given that the conditions varied considerably across the forty simulation sets, summary statistics such as the mean may mask meaningful variation in the coverages. Therefore, we calculated a summary measure of “acceptable coverage”.^{§} The Mean/SRS estimator had acceptable coverage in 5% of CIs. The SH/Sal‐BS and VH/Sal‐BS with studentized bootstrap estimator pairs produced acceptable coverage for 67.5% of CIs, and the SS/SS‐BS with percentile and studentized bootstrap CI methods produced acceptable coverages for 80% and 75% of CIs, respectively.

We conducted additional simulations to investigate the higher 95% CI coverage rates for the VH/Sal-BS in our analysis (all greater than 80%; see

^{¶}

The last row of

For with replacement sampling the VH/Sal-BS estimator pair shows much lower accuracy than all three without replacement estimators for the most stringent benchmark factor. It also has a high proportion of estimated DEs that are too low, with more than 79% of estimated DEs lower than the actual DE.

Our simulations suggest that the coverage of 95% CIs for RDS samples is usually above 90% (with no coverage rates above 97%). This is better than past work has suggested, demonstrating that reasonably accurate RDS variance estimation is feasible and that conclusions drawn from past analyses of RDS data that applied one of these estimators may well be reasonable in scenarios where RDS assumptions are met.

While the RDS estimators performed better than expected, the SRS variance estimator significantly underestimates the variability of RDS samples and provides very low coverage. Because of the complexity of RDS, it may be tempting to dispense with complicated inferential approaches and use the sample mean and SRS variance approximation. Our results show that this approach is likely to cause significant under-estimation of uncertainty and lead to misleading conclusions.

We found that the SS/SS‐BS estimator pair had overall higher coverage than the other two estimator pairs. The SS/SS-BS exhibited its lowest coverage when applied to a sample with lower prevalence and a smaller sample size than the other samples. In contrast, the SH/Sal‐BS and VH/Sal‐BS had lower coverage for samples with levels of differential activity much lower than those of the other samples in combination with higher levels of homophily than those of the other samples.

Note that the difference between the SS and VH estimators is a finite population adjustment that requires knowing the true size of the population, which is typically unavailable. The impact of error in the population size specified for the SS estimator in a given sample is a function of the true size of the population. The impact is relative, so the impact of a given absolute error in the specified population size will be larger for smaller population sizes (e.g., an error of 500 in the specified population size will have more impact when the true population size is 1,000 than when it is 10,000). For large population sizes the SS estimator approaches the VH estimator because the finite population adjustment has little impact, so using the SS estimator with a too-large population size specification will pull it toward the VH estimate. Therefore, the SS will perform at least as well as the VH unless the population size is dramatically underestimated.

The complexity of the relationship between a population’s characteristics and RDS CI coverage is high, so the specific relationships between prevalence, sample size, and homophily and the performance of RDS estimator pairs require further investigation. More generally, the number of such population characteristics that must be systematically varied in a simulation (the “parameter space”) to disentangle the combinations of factors that influence RDS CI coverage is very large. A systematic study of that parameter space is needed to provide evidence about RDS CI coverage in the large variety of settings in which RDS is applied.

While other work has suggested RDS variance estimators perform poorly, our analysis suggests those results can, at least partially, be attributed to the choice of bootstrap method and unrealistic use of with-replacement sampling in prior studies. For the SH and VH estimators, we found that using the studentized bootstrap, as compared to the percentile bootstrap, significantly increased the percentage of CIs with good coverage from 40 to 67.5 and 42.5 to 67.5, respectively (

We also found significantly smaller DEs than Goel and Salganik, with evidence that sampling with replacement increases the DE. For example, for without replacement sampling, both SS/SS and VH/Sal-BS produced actual DEs less than 3 in 92.5% of our conditions (37/40) and 62.5% less than 2, whereas for with replacement sampling the VH/Sal-BS estimator pair DE was less than 3 in only 67.5% of our conditions with only 30% less than 2. This echoes findings by Lu and colleagues and Gile and Handcock that sampling without replacement may reduce the DEs for RDS (

Furthermore, the estimated DEs were more accurate for sampling without replacement than for sampling with replacement. For example, for the VH/Sal-BS estimator pair sampling without replacement produced estimated DEs within a factor of 2 of the actual DEs 91.8% of the time, with slightly less than half (47.3%) being lower than the actual DE (the anti-conservative direction). In contrast, for with replacement sampling the estimated DEs were within a factor of 2 of the actual DEs only 79.9% of the time, with a significant majority (82.8%) being lower than the actual DE. Overall, the estimated DEs for the SS/SS estimator pair were the most accurate: 92.9% within a factor of 1.5, with fewer large outliers (see the

The RDS sampling process is highly complex and only partially observed in real RDS studies, so many choices about simulation design and specification must be made without reference to empirical data. Because the ultimate goal of RDS simulation studies is to understand how RDS performs in the real world, we recommend conducting RDS simulations without replacement and with parameters informed by real RDS samples to the extent possible.

This study’s simulations find that RDS DEs are in the range suggested in other methodological work that did not use simulation studies (

We used data from a large number of real RDS studies to parameterize our simulated networks and RDS sampling process. These RDS samples were of PWID in large US urban areas, so the results are likely most applicable to RDS samples drawn from large cities. Most of the largest RDS studies in the world occur in such places, such as studies conducted in China and Brazil (

Our results are subject to a number of limitations. First, although the networks created for our simulations were designed with some structural characteristics similar to those of PWID networks in NHBS cities, the true structure and complexity of hidden population networks is unknown. Almost all social networks contain structure that is not observed in RDS data. For example, an outcome might vary across a city’s neighborhoods, and the PWID networks in some neighborhoods may have few connections to those in other neighborhoods. The ERGM used to create our simulated networks did not directly specify such complex network structure, as it is unclear what the correct levels of such structure should be. Note that for such network structure to strongly influence RDS estimation, it must be strongly related to the outcome (e.g., quite different prevalences of the trait across the weakly connected subgroups).

Second, the characteristics we used to create the networks for our simulations were estimated from NHBS samples using RDS estimators. Therefore, the simulations are not replicates of the 40 samples collected by NHBS but are, instead, examples of networks and RDS processes similar to those observed in the NHBS samples. The results may be sensitive to our use of large networks and small sampling fractions as in the NHBS samples. The stability of NHBS samples of PWID over time suggests that our findings are applicable to future NHBS studies of PWID.

Third, our simulations implemented RDS with only a few statistical assumptions not met. Both the SH and VH point estimators assume that recruitment trees do not branch (i.e., each sample member makes exactly one recruitment) and that sampling is with-replacement, neither of which was true in our simulations. Other RDS statistical assumptions such as participants recruiting randomly from their set of contacts and, for the SS estimator, that the population size is known, were met. It is known that violations of RDS point estimator assumptions decrease the accuracy of RDS point estimates (

Fourth, our analysis did not evaluate all RDS variance estimators. Some work has proposed new point estimators that were accompanied by minor modifications to an existing variance estimator to incorporate the new point estimator (

Sampling hidden populations is critical for public health surveillance and planning around the world. RDS is effective at reaching members of hidden populations that other sampling methods cannot and is inexpensive enough to be used in low‐resource settings. These strengths have led to its wide use around the world for many different applications.

Past research on RDS variance estimation suggested that RDS variance estimator CIs provide very low coverage rates and that RDS has higher DEs than has been assumed in the public health literature (

RDS is used around the world to sample hidden populations that suffer from high rates of infection by HIV and other diseases. It is critical that researchers draw correct conclusions from RDS data by applying appropriate statistical techniques. We look forward to an improved understanding of RDS estimation that will better inform the policies critical to preventing and reducing the burden of disease borne by hidden populations worldwide.

De-identified characteristics for each sample may be found in the

De-identified characteristics for each sample may be found in the

Samples numbers are prefixed with “A” for samples from 2009 and “B” for samples from 2012. Sample numbers were randomly assigned to cities and are consistent across the two survey years (e.g., A-01 is the same city as B-01).

Acceptable coverage percent is calculated as the percentage of confidence intervals (CIs) with coverage between 93% and 97%, inclusive, for a given estimator pair and bootstrap CI method.

We have also observed this pattern of SH estimator behavior in its implementation in the Respondent-Driven Sampling Analysis Tool v7.1 software (

95% confidence interval (CI) coverage percentages for 40 sets of RDS simulations (sampling without replacement; studentized bootstrap CI method). The horizontal axis is the nominal 95% CI coverage percentage, and the vertical axis is the 40 simulation sets ordered from top to bottom by the SS coverage percentage (the red line). The left panel’s horizontal axis ranges from 0 to 100%; the right panel’s horizontal axis ranges from 80% to 100% for detail. The coverage percentages for the sample mean do not appear in the right panel.

95% confidence interval (CI) coverage percentages for 40 sets of RDS simulations (VH/Sal-BS estimator pair) by bootstrap CI method and sampling with and without replacement. The horizontal axis is the nominal 95% CI coverage percentage, and the vertical axis is the 40 simulation sets ordered from top to bottom by the without replacement, studentized bootstrap condition (the purple line and triangles).

Findings from previous simulation studies of RDS variance estimation and design effects

Study | Point Estimator | Variance Estimator | Simulation approach/CI Method | Population Network Data Used for RDS Simulations | 95% CI Coverage | DE Result |
---|---|---|---|---|---|---|

Goel & Salganik ( | Volz-Heckathorn | Salganik | With Replacement/Percentile | 1987 attempted network census of high‐risk heterosexuals in Colorado Springs, Colorado | 52% (median) | 11 (median; multiple traits) |

Goel & Salganik | Volz-Heckathorn | Salganik | With Replacement/Percentile | Sample of United States adolescents in 7th – 12th grades between 1994 and 1996 | 62% (median) | 5.9 (median; multiple traits) |

Lu et al. ( | Volz-Heckathorn | N/A | N/A | Online social network in Sweden | N/A | 5 to 13 (multiple traits) |

Verdery et al. ( | Volz-Heckathorn | Salganik | With Replacement/Studentized | Sample of United States adolescents in 7th – 12th grades between 1994 and 1996 | 68% (mean) | 15 (mean; 3 traits) |

Verdery et al. | Volz-Heckathorn | Salganik | With Replacement/Studentized | Multiple Facebook social networks of college students | 65% (mean) | 30 (mean; 2 traits) |

Summary of 40 NHBS samples used to create RDS simulations

Characteristic | Mean | Std Dev | Median | Minimum | Maximum |
---|---|---|---|---|---|

Prevalence | 0.104 | 0.0653 | 0.091 | 0.018 | 0.286 |

Mean Degree | 10.64 | 5.096 | 9.88 | 4.45 | 35.39 |

Homophily | 1.226 | 0.2281 | 1.19 | 0.91 | 1.99 |

Differential Activity | 0.931 | 0.2098 | 0.92 | 0.53 | 1.44 |

Sample Size | 519.1 | 108.85 | 539.5 | 206 | 700 |

Number of Seeds | 8 | 3.31 | 8 | 3 | 16 |

Number of Seeds With Trait | 1.1 | 1.18 | 1 | 0 | 5 |

Number of Seeds Without Trait | 6.8 | 3.21 | 7 | 1 | 16 |

Number of Seeds Missing Trait | 0.13 | 0.404 | 0 | 0 | 2 |

% of Coupons Returned | 30.60% | 6.60% | 33.20% | 20.00% | 49.80% |

Number Recruits = 0 | 33.90% | 7.01% | 35.50% | 21.40% | 48.00% |

Number Recruits = 1 | 21.80% | 5.07% | 22.10% | 9.10% | 32.10% |

Number Recruits = 2 | 17.70% | 3.54% | 18.20% | 10.00% | 25.10% |

Number Recruits = 3 | 10.50% | 2.69% | 10.00% | 4.60% | 16.00% |

Number Recruits = 4 | 1.70% | 2.00% | 0.67% | 0% | 7.70% |

Number Recruits = 5 | 0.54% | 0.65% | 0.30% | 0% | 2.40% |

Assigned to be without trait for purposes of sampling simulation

Among sample members who were given coupons.

These numbers include 6 studies where a maximum of 3 coupons were distributed per subject; the counts for those studies are constrained to be 0.

95% confidence interval (CI) coverage percentages for four RDS point and variance estimator pairs by bootstrap CI method

Point Estimator | Variance Estimator | Bootstrap CI Method | Mean | Standard Deviation | Median | Range | Acceptable Coverage % |
---|---|---|---|---|---|---|---|

Sample mean | SRS variance | N/A | 67.4 | 23.8 | 74.9 | [14, 96] | 5 |

Salganik- Heckathorn | Salganik | Percentile | 87 | 12.8 | 91.9 | [41, 96] | 40 |

Salganik- Heckathorn | Salganik | Studentized bootstrap | 93 | 2.8 | 93.9 | [86, 97] | 67.5 |

Volz-Heckathorn | Salganik | Percentile | 87 | 12.8 | 91.8 | [41, 96] | 42.5 |

Volz-Heckathorn | Salganik | Studentized bootstrap | 92.9 | 3.2 | 93.9 | [82, 97] | 67.5 |

Successive Sampling | Successive Sampling | Percentile | 94.1 | 1.8 | 94.6 | [87, 97] | 80 |

Successive Sampling | Successive Sampling | Studentized bootstrap | 93.8 | 1.8 | 94.2 | [87, 96] | 75 |

Acceptable coverage percent is calculated as the percentage of confidence intervals (CIs) with coverage between 93% and 97%, inclusive, for a given estimator pair and bootstrap CI method.

Design effects for four RDS point estimators by sampling with or without replacement

Point Estimator (sampling method) | Range | Median | Mean | Standard Deviation |
---|---|---|---|---|

Sample mean (without replacement) | [0.75, 2.64] | 1.34 | 1.42 | 0.49 |

Salganik-Heckathorn (without replacement) | [0.83, 95.51] | 1.72 | 7.47 | 19.96 |

Volz-Heckathorn (without replacement) | [0.81, 6.19] | 1.69 | 1.91 | 0.96 |

Successive Sampling (without replacement) | [0.83, 6.03] | 1.66 | 1.89 | 0.93 |

Volz-Heckathorn (with replacement) | [1.01, 7.97] | 2.34 | 2.77 | 1.48 |

Point estimator and sampling method used in

Comparison of estimated and actual design effects by estimator pair and sampling method

Point Estimator (sampling method) | Within a factor of 1.5 of the actual DE | Within a factor of 2 of the actual DE | Within a factor of 3 of the actual DE | |||
---|---|---|---|---|---|---|

Percent | Percent of those within factor that are too low | Percent | Percent of those within factor that are too low | Percent | Percent of those within factor that are too low | |

SH/Sal-BS (without replacement) | 78.1 | 45.8 | 83.7 | 46.2 | 88.3 | 47.0 |

VH/Sal-BS (without replacement) | 83.5 | 46.6 | 91.8 | 47.3 | 97.5 | 48.0 |

SS/SS-BS (without replacement) | 92.9 | 51.3 | 98.6 | 51.0 | 99.8 | 51.0 |

VH/Sal-BS (with replacement) | 56.9 | 79.4 | 79.9 | 82.8 | 93.5 | 84.5 |

Between 66% and 150% of the actual DE

Between 50% and 200% of the actual DE

Between 33% and 300% of the actual DE

Point estimator and sampling method used in