Careful design of a dualframe random digit dial (RDD) telephone survey requires selecting from among many options that have varying impacts on cost, precision, and coverage in order to obtain the best possible implementation of the study goals. One such consideration is whether to screen cellphone households in order to interview cellphone only (CPO) households and exclude dualuser household, or to take all interviews obtained via the cellphone sample. We present a framework in which to consider the tradeoffs between these two options and a method to select the optimal design. We derive and discuss the optimum allocation of sample size between the two sampling frames and explore the choice of optimum
Modern random digit dial (RDD) telephone surveys in the U.S. use two samples: a sample of landlines and a sample of cellphone lines.
Because it is less costly on a perunit basis and has a longer history of use, the landline sample is often the larger sample and the survey interview is attempted for all respondents in this sample. The interviewing protocol for the smaller cellphone sample is configured in one of two ways: (1) attempt to complete the survey interview for all responding persons, or (2) conduct a brief screening interview to ascertain the telephone status of the respondent, and then attempt to complete the survey interview only for respondents whose telephone status is classified as cellphoneonly (CPO) (i.e., respondents who report in the screening interview that they do not have a working landline in their household). (Within the screening approach there are variations, such as interviewing both CPO respondents and others who report that there is a landline in the household but they are not reachable through the landline.) As the size of the landlineonly (LLO) population (i.e., persons who have a working landline telephone in the household but do not have access to a cell phone) declines over time (
We shall develop the methods for optimum allocation under ideal assumptions that the sample sizes refer to completed cases (i.e., no nonresponse); that there is essentially a onetoone relationship between the sampling units (telephone numbers) and the analytical units (e.g., households) in the landline population; that there is essentially a onetoone relationship between the sampling units and the analytical units in the cellphone population; and that all units in the target population are included in at least one of the two sampling frames. Given these assumptions, each and every specific analytic unit is linked to a landline, a cellphone line, or both a landline and a cellphone line, and is linked to at most one landline and at most one cellphone line.
Most of the previous literature on dualframe surveys studies estimation procedures rather than the question of allocation of the sample size to the various sampling frames, including
To begin, we establish our notation and assumptions. Let
Let
Let
In what follows, we derive the optimum allocation given the takeall protocol and the screening protocols in Section 2 and Section 3, respectively. Section 4 compares the two protocols in terms of efficiency and cost and attempts to provide guidance about the circumstances under which each protocol is better. The section also explores the optimum choice of a mixing parameter
In the takeall protocol, one conducts survey interviews for all units in both samples
The unbiased estimator of the population total (
Given fixed
The classical optimum allocation of the total sample to the two sampling frames (
In the screening protocol, one conducts survey interviews for all units in the landline sample
The unbiased estimator of the overall population total is
The optimal allocation of the total sample is
We compare the takeall and screening protocols to establish which is the less costly or more efficient. Such a comparison can provide practical guidance to planners of future dualframe telephone surveys.
Given either fixed cost or fixed variance, efficiency can be assessed in terms of the ratio
Values less than 1.0 favor the screening approach while values greater than 1.0 favor the takeall approach.
We will illustrate efficiency using six scenarios regarding a survey of a hypothetical adult population. For all scenarios, the population size is taken from the March 2010 Current Population Survey (
The scenario specific assumptions are set forth in the following table:
The means correspond to the proportions of adults with the attribute. Scenario 1 describes a population in which the domain means are similar, with the mean of the dualuser domain being somewhat larger than the means of the CPO and LLO populations. Scenario 2 describes a population in which the mean of the LLO domain is somewhat larger than the means of the other telephone status domains. Scenario 3 reflects a population in which the means of all telephone status domains are equal. Scenario 4 reflects a population in which the mean of the LLO domain is much larger than the mean of the CPO domain. Scenarios 5 and 6 correspond to Scenarios 1 and 2, respectively, using means equal to one minus the corresponding means. The mean of the CPO domain declines from Scenario 1 to 6.
We selected the six scenarios to illustrate various circumstances in which the means of CPO, LLO, and dualuser domains differ. Differences can arise because younger adults, Hispanics, adults living only with unrelated adult roommates, adults renting their home, and adults living in poverty tend to be CPO (
We will consider the six scenarios using three assumed cost structures. The cost structures are intended to illuminate various circumstances in which the perunit cost of screening is high or low relative to the cost of the survey interview, with Cost Structures 1–3 reflecting increasing relative cost of screening. All cost components are expressed in interviewing hours:
Cost Structure 1:
Cost Structure 2:
Cost Structure 3:
All reflect circumstances in which the hours per case for a cellphone interview is about 2 times larger than the hours per case for a landline interview.
Efficiencies corresponding to the various scenarios for the first cost structure are illustrated in Figure 4.1. We have prepared similar figures for the second and third cost structures, but to conserve space we do not present them here.
Given Cost Structure 1, the screening approach achieves the lower variance for the same fixed cost for all six scenarios. Given Cost Structure 3, in which the perunit cost of screening is relatively much higher than in Cost Structure 1, the takeall approach achieves a smaller variance than the screening approach for half of the population scenarios. For Cost Structure 2, which entails an intermediate level of screening cost, the screening approach beats the takeall approach for all scenarios except for Scenario 1, in which the two approaches are nearly equally efficient.
The comparison between the takeall and screening protocols can be understood by examining the form of efficiency
It is also of interest to examine how the efficiency
For the takeall protocol, the optimum
In summary, one may conclude from these illustrations that the screening approach is often more efficient than the takeall approach. As the cost of the screener increases relative to the cost of the interview, the outcome can tip in favor of the takeall approach. The takeall approach will be preferred for surveys in which the cost of the screener is relatively very high; otherwise, the screening protocol will be preferred. The screening approach will tend to be relatively more efficient for small values of the CPO domain mean than for large values of this mean.
The optimum allocation is defined in terms of the mixing parameter, and thus it is important to consider the choice of this parameter. In the foregoing section, we saw that variance is likely not very sensitive to the choice of
The landline and cellphone samples each supply an estimator of the total in the dualuser domain, and the mixing parameter
From (
For the cost structures considered in this section, the corresponding
CDC has sponsored the
We will discuss the NIS as it was conducted in 2011. The main interview consisted of six sections, beginning with Section S, which is a brief questionnaire module that determines whether the household has ageeligible children. The interview is then terminated for ineligible households. For eligible respondents with an available vaccination record (shotcard), Section A obtains the child(ren)’s householdreported vaccination history. For all other respondents, Section B obtains a more limited and less specific amount of information about the child(ren)’s vaccinations. Section C collects demographic characteristics of the child(ren), the mother, and the household. Section D collects the names and contact information for the child(ren)’s vaccination providers and requests parental consent to contact the providers, while Section E collects information regarding current health insurance coverage.
The NIS is designed to produce estimates at the national level and for 56 nonoverlapping estimation areas, consisting of 46 whole states, 6 large urban areas, and 4 restofstate areas. Each of these areas is a sampling stratum in the NIS design. For each of these areas, NIS is designed to minimize the cost of the survey subject to a constraint on variance: the coefficient of variation (CV) of the estimator of the vaccination coverage rate (UTD children as a proportion of all eligible children) is to be 7.5 percent at the estimationarea level, when the true rate is 50 percent.
Given the takeall protocol, the sixpart survey interview is administered to all respondents in both sample. Given the screening protocol, the survey interview is administered to all respondents in the landline sample, while in the cellphone sample, the overall interview is now in two parts: (i) the brief screener to determine telephone status and (ii) the aforementioned sixpart survey interview. Dual users are screened out of the cellphone sample.
To illustrate the optimum allocation, we take the perunit costs to be proportional to the following values:
To estimate a vaccination coverage rate given the takeall approach, we work with the variable
Then, the estimated vaccination coverage rate is
To estimate a vaccination coverage rate given the screening design, we work with the variable
Given these assumptions, the values of the efficiency ratio
Given our assumptions, the optimum allocation for the takeall protocol at the optimum
We developed the optimum allocations revealed here under ideal conditions in which there is no nonresponse. To prepare a sample for actual use in the NIS (or any real survey), the allocation must be adjusted by the reciprocals of the expected survey cooperation rates and by the expected design effect due to weighting and clustering.
While the extant evidence shows that the screening protocol is slightly less costly than the takeall protocol, given that both achieve the same fixed variance constraint, the takeall protocol actually provides the NIS an ongoing platform for testing and comparing both protocols. The authors continue to monitor the achieved sample composition and to conduct other specialized studies of response and nonresponse error.
We investigated two designs for a dualframe telephone survey: a takeall protocol in which every respondent in the cellphone sample is interviewed and a screening protocol in which respondents in the cellphone sample are screened for phone status and only CPO respondents are interviewed. For each design, we derived the optimum allocation of the overall survey resources to the two sampling frames.
We studied the allocation problem given the two traditional meanings of the word “optimum”: (1) to minimize variance subject to a constraint on data collection cost, and (2) to minimize data collection cost subject to a constraint on variance. Given fixed variance, we find that the screening approach tends to achieve lower total cost than the takeall approach when the perunit cost of screening is low relative the unit cost of the survey interview. The takeall approach can achieve the lower total cost when the perunit cost of screening is relatively high. Similarly, given fixed total cost, the screening protocol tends to be the more efficient approach when the perunit cost of screening is relatively low, and the takeall protocol can be the more efficient approach as the perunit cost of screening rises. Both the landline and cellphone samples have the capacity to produce estimators for the dualuser domain, while only the cellphone sample can produce estimators for the CPO domain. Thus, when screening is relatively inexpensive on a perunit basis, then it should be used to produce the largest possible sample from the CPO domain. But when screening is relatively expensive, then it is better to avoid the screening step and invest the survey resources in a larger interview sample. These results were obtained under an assumption of simple random sampling, and they may not carry over exactly to other sampling designs.
The takeall design results in two estimators for the dualuser domain, which are combined using factors of
We initiated this work before 2010 at a time when the CPO population in the U.S. was only a fifth to a quarter of the total population of households. At that time it made sense to contemplate a protocol in which the larger landline sample is interviewed in its entirety and the smaller cellphone sample is screened for CPO status. At this writing, however, the CPO population comprises more than a third of the total population of households and it is still growing. It has become reasonable to consider a new screening protocol in which the landline sample is screened for telephone status and only LLO respondents are interviewed. The foregoing allocations and findings apply to this new protocol by symmetry.
We illustrated the optimum allocations and the two interviewing protocols using the 2011
The authors’ kindly acknowledge suggestions for improved readability offered by the Associate Editor and referees. Disclaimer: The findings and conclusions in this paper are those of the author(s), and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
Plot of efficiency
Definition of six scenarios for a hypothetical adult population
Scenarios  

1  0.791  0.750  0.800  0.750  0.784 
2  0.759  0.800  0.750  0.750  0.750 
3  0.500  0.500  0.500  0.500  0.500 
4  0.518  0.600  0.500  0.400  0.469 
5  0.209  0.250  0.200  0.250  0.216 
6  0.241  0.200  0.250  0.250  0.250 
Sample sizes and optimum
Cost Structure  Screening Design  TakeAll Design  


 
(1 −  
Scenario 1  
1  494  747  234  0.45  337  331 
2  469  641  201  0.45  337  331 
3  431  505  159  0.45  337  331 
Scenario 2  
1  506  728  229  0.45  339  330 
2  481  626  197  0.45  339  330 
3  443  494  155  0.45  339  330 
Scenario 3  
1  583  615  193  0.50  344  328 
2  559  533  167  0.50  344  328 
3  520  425  134  0.50  344  328 
Scenario 4  
1  605  582  183  0.55  377  312 
2  581  506  159  0.55  377  312 
3  543  405  127  0.55  377  312 
Scenario 5  
1  606  581  182  0.55  358  321 
2  582  505  159  0.55  358  321 
3  544  404  127  0.55  358  321 
Scenario 6  
1  618  563  177  0.55  354  323 
2  594  490  154  0.55  354  323 
3  557  393  123  0.55  354  323 
Expected sample sizes by telephone status domain given optimum allocations
Sample and Telephone Status Domains  TakeAll Protocol  Screening Protocol  

 
Expected Sample Size  Expected AgeEligible Cases  Expected Sample Size  Expected AgeEligible Cases  
3,069  86  5,858  164  
7,437  289  8,432  188  
416  6  794  12  
2,653  80  5,064  152  
4,122  124  4,674  0  
3,314  166  3,758  188 