In recent years response rates on telephone surveys have been declining. Rates for the behavioral risk factor surveillance system (BRFSS) have also declined, prompting the use of new methods of weighting and the inclusion of cell phone sampling frames. A number of scholars and researchers have conducted studies of the reliability and validity of the BRFSS estimates in the context of these changes. As the BRFSS makes changes in its methods of sampling and weighting, a review of reliability and validity studies of the BRFSS is needed.
In order to assess the reliability and validity of prevalence estimates taken from the BRFSS, scholarship published from 2004–2011 dealing with tests of reliability and validity of BRFSS measures was compiled and presented by topics of health risk behavior. Assessments of the quality of each publication were undertaken using a categorical rubric. Higher rankings were achieved by authors who conducted reliability tests using repeated test/retest measures, or who conducted tests using multiple samples. A similar rubric was used to rank validity assessments. Validity tests which compared the BRFSS to physical measures were ranked higher than those comparing the BRFSS to other self-reported data. Literature which undertook more sophisticated statistical comparisons was also ranked higher.
Overall findings indicated that BRFSS prevalence rates were comparable to other national surveys which rely on self-reports, although specific differences are noted for some categories of response. BRFSS prevalence rates were less similar to surveys which utilize physical measures in addition to self-reported data. There is very little research on reliability and validity for some health topics, but a great deal of information supporting the validity of the BRFSS data for others.
Limitations of the examination of the BRFSS were due to question differences among surveys used as comparisons, as well as mode of data collection differences. As the BRFSS moves to incorporating cell phone data and changing weighting methods, a review of reliability and validity research indicated that past BRFSS landline only data were reliable and valid as measured against other surveys. New analyses and comparisons of BRFSS data which include the new methodologies and cell phone data will be needed to ascertain the impact of these changes on estimates in the future.
Health officials recognize the need for accurate data for purposes of program planning, policy evaluation and estimation of health risk prevalence [
Individual states use data from the BFRSS to assess need and plan public health priorities. These data have been essential to states and local jurisdictions and have historically been shown to be useful as sources of information [
Given the importance of the BRFSS data to its constituent jurisdictions, continuous validation of findings is requisite. CDC, perforce, conducts numerous internal checks on BRFSS data. Independent practitioners have also tested BRFSS reliability and validity within their areas of interest. A comprehensive reliability/ validity study of BRFSS was conducted earlier by Nelson [
In its current form, the BRFSS not only produces a large data set covering a number of health risk behaviors, but also provides a number of services to states which are engaged in the process of data collection [
The BRFSS is one of several surveys which compile health data in a variety of modes and methods. Many researchers review BRFSS prevalence indicators in terms of prevalence rates from other surveys which can be used to produce national estimates. These include:
The NHIS is conducted continuously throughout the year using face-to-face interviews in respondents’ homes. Basic health information is collected for all family members, by proxy if necessary. Additional health and socio-demographic information, including health risk behaviors, is collected, by self-report, from one adult family member [
The NHANES collects information on adults and children and combines face-to-face interviews with physical examination measures. The NHANES has been conducted periodically since the early 1960s. In 1999 the NHANES became a continuous survey with data released every two years [
The NSDUH is annually compiled from face-to-face interviews. It focuses primarily on substance abuse among respondents 12 years of age and older [
The CPS is conducted by the Bureau of Labor Statistics and the Census Bureau [
The NSFG gathers information using personal interviews [
Despite studies which support findings from self-reported information [
No research effort can result in a comprehensive disclosure of all relevant publications, especially on a publically available dataset which encompasses a wide range of topics. The articles presented here were obtained through an extensive search of publications indices (PubMed, ProQuest, and ScienceDirect). Within each search inquiry, keywords included “BRFSS,” “validity,” and/or “reliability.” Any article which included testing of BRFSS reliability and/or validity was included. Articles which expressed only opinions, without any comparisons or statistical testing were not considered. Given that the purpose of this research was to validate self-reported estimates in an era of declining landline telephone coverage, only those articles which have been published from 2004–2011 were included. Articles were then categorized and are presented in the following topic areas:
1. Access to health care/ general health
2. Immunization, preventive screening, and testing
3. Physical activity measures
4. Chronic disease
5. Mental health measures
6. Overweight and obesity measures
7. Tobacco and alcohol use measures
8. Responsible sexual behavior measures
9. Injury risk and violence
Quality of individual studies may vary significantly. Therefore a scoring rubric was devised to estimate the rigor of the tests of reliability and/or validity found in the literature. Higher rankings on the reliability rubric were achieved by authors who conducted reliability tests using repeated test/retest measures, used multiple samples/populations or multiple time periods. The rubric was also scored higher if authors conducted statistical tests, rather than simply comparing prevalence estimates. Authors who simply tested reliability by noting that results within the BRFSS were internally consistent were ranked lower on the reliability rubric. A similar rubric was used to rank validity assessments. Validity tests comparing the BRFSS to physical measures were ranked highest. Comparing BRFSS validity over time or comparing BRFSS against other self-reported data were ranked lower. Higher ranked assessments of validity and reliability were also characterized by more rigorous statistical comparisons, including the use of sensitivity and specificity measures [
1. The number of articles relating to reliability of the BRFSS
2. The number of articles relating to validity of the BRFSS
3. The quality of reliability tests used by authors
4. The quality of validity tests
5. An overall assessment of the literature on reliability and validity of the BRFSS
Thus the method used to assess the literature followed the path illustrated in Figure
Method of assessing literature on reliability and validity of the BRFSS.
The literature provided thirty-two examples of reliability and validity tests published since 2004 for the BRFSS among the ten categories. The literature did not evenly examine each of the topics covered by the BRFSS, and published reports of validity and reliability varied in quality. The largest number of articles was identified for physical activity, access to health care, immunization and preventive testing and diagnoses of chronic disease/conditions (Table
Overall number and ranking of reliability and validity tests for BRFSS estimates
| Access to Health Care/ General Health | 4 | High (test/retest) | High (comparisons with other surveys and HMO records) | High levels of reliability and validity using published information and test retest methods; BRFSS noted to have higher proportions of persons reporting poorer health. |
| Immunization and Preventive testing | 4 | High (test/retest) | High (comparisons with other surveys, national registry data, medical records) | BRFSS rates highly reliable; Validity testing against medical records of individuals high; Validity testing indicating over reporting for some screening tests |
| Physical activity | 8 | High (test/retest; time trend) | High (comparisons with other surveys, respondent logs, accelerometers, physical measures) | Better reliability assessment among physically active groups; self-reports substantially reliable; Validity when compared to physical measures moderate |
| Chronic conditions | 4 | High (test/retest; comparisons with telephone and written responses) | High (comparisons with physical measures, prescription drug use data, medical history) | High levels of agreement in reliability testing; Some differences in prevalence rates among national surveys when compared to physical measures. |
| Mental Health Measures | 2 | High (test/retest with multiple indicators) | N/A | Substantial agreement between test/retest measures |
| Obesity | 3 | N/A | Moderate (comparisons against other national surveys, physical measures) | Self-reports reliable across modes; Differences between self-reports and physical measures |
| Tobacco use | 2 | N/A | Moderate (comparisons with other national surveys, physical measures) | Self-reports reliable across modes; Differences between self-reports and physical measures |
| Alcohol/Substance abuse | 2 | Moderate | Moderate (comparisons with other national surveys) | Trends and risk factors in BRFSS validated by other national surveys; BRFSS prevalence rates lower than other measures at national level and some state levels |
| Health risk and Sexual behavior | 2 | N/A | Moderate (comparisons with other national surveys) | BRFSS produces slightly higher level estimates of measures than other national survey; Differences in prevalence rates of HIV testing question. |
| Injury and violence | 1 | N/A | Moderate (comparisons with other survey in nonrandom setting) | High levels of agreement between two surveys using nonrandom samples |
Validity of BRFSS and other self-reported data was best when respondents were asked about behaviors which were not sensitive, and questions referred to discreet events such as enrollment in health care plans, immunization or testing. In some cases, researchers found BRFSS to be reliable and valid for some groups of individuals and not for others. For example, respondents who reported strenuous physical exercise were found to provide more reliable and valid information than respondents who reported moderate, light or no physical exercise [
Several scholars investigated whether self-reported claims of health care coverage can be substantiated (Table
Reliability and validity studies of general health assessment and health care access estimates
| Mucci (2006) | Reliability of self-reported health insurance coverage | Self-reports of enrollment agreement .93 |
| Type of plan self-report reliability .79 | ||
| Clements (2006) | Reliability of self-reported HMO health care plan | Comparison of self-reports to external data |
| Test/retest agreement 78% | ||
| Fahimi (2008) | Validity of prevalence of health insurance coverage | No health insurance |
| BRFSS (18.4-19.1)* | ||
| NHIS (18.7-20.0)* | ||
| No medical care due to cost | ||
| BRFSS (14.8-15.4)* | ||
| NHIS (7.4-8.2)* | ||
| Salomon (2009) | Comparison of trends of general health in BRFSS and other national surveys; Comparison of prevalence of BRFSS and other national surveys | Prevalence of “fair” or “poor” health/ males: |
| NHIS (11.3-12.7)* | ||
| BRFSS (15.9-16.8)* | ||
| Prevalence of “fair” or “poor” health/ females: | ||
| NHIS (12.9-14.1)* | ||
| BRFSS (16.6-17.2)* | ||
| BRFSS more likely to show increased proportions of self-reports of “fair” or “poor” health |
Fahimi [
Several measures of immunization, preventive screening and testing are collected by the BRFSS (Table
Reliability and validity studies of immunization, preventive screening and testing estimates
| Shenson (2005) | Reliability and validity testing of immunization questions | Test/retest agreement on vaccination questions was 73%; Self-reports had a sensitivity of .75 and specificity of .83 when compared against medical records. |
| Bradbury (2005) | Test/ retest of colorectal cancer screening tests | Variation in reliability estimates due in part to time period between test/retest: |
| Cronin (2009) | Validity testing of mammography screening using registry rates and BRFSS rates | Estimates of BRFSS over reporting of mammography: |
| 16% women 40-49 | ||
| 25% women 70-79 | ||
| Fahimi (2008) | Validity testing of immunization questions from BRFSS, NHIS | Influenza vaccine prevalence: |
| BRFSS (66.9-68.2)* | ||
| NHIS (63.2-66.0)* | ||
| Pneumonia vaccine prevalence: | ||
| BRFSS (62.7-64.1)* | ||
| NHIS (55.3-58.3)* |
Cronin [
Questions on the BRFSS related to physical activity produced data that allow researchers to classify respondents into levels of recommended and vigorous physical activity, from inactive to vigorously active. Eight studies were identified from the literature which presented findings of reliability and/or validity of BRFSS physical activity measures (Table
Reliability and validity studies of physical activity estimates
| Yore (2007) | Reliability and validity using comparison with physical measures and repeated telephone interviews | Moderate activity group |
| Vigorous activity | ||
| Recommended activity | ||
| Strengthening measures | ||
| Self-reports/ personal log | ||
| Self-reports/accelerometer | ||
| Yore (2005) | Reliability using repeated measures and self-reported logs | 1-5 days between surveys |
| 10-18 days between surveys | ||
| 10-19 days between surveys | ||
| Everson (2005) | Reliability using test/retest telephone surveys including gender and racial differences among indicators | Moderate activity ICC = .32-.58 |
| Vigorous activity ICC = .55-.85 | ||
| Leisure activity ICC = .46-.68 | ||
| Occupational activity ICC =.82 | ||
| Sedentary indicators ICC = .32-.90 | ||
| Brown (2004) | Reliability using test/retest telephone surveys | Percent agreement for classification of active/insufficiently active/ sedentary 77.6; |
| All activity groups | ||
| Walking measures ICC = .45 | ||
| Moderate activity ICC= .44 | ||
| Vigorous activity ICC= .39 | ||
| Hutto (2008) | Reliability of questions when question order is changed | Question order effect BRFSS and alternate order, respectively |
| Walking (37.7 , 41.0) | ||
| Vigorous Activity (34.7, 37.0) | ||
| Moderate Activity (40.3, 30.5) | ||
| Meeting Physical Activity Recommendations (53.9, 51.7) | ||
| Pettee (2008) | Reliability of questions over different time periods | ICC= .42 and .55 for 3 week and 1 week test/retest on TV watching and physical activity question |
| Carlson (2009) | Validity testing comparing prevalence across surveys and methods | Active: mean difference BRFSS/ NHIS: 18.1 |
| Active: mean difference BRFSS/ NHANES: 14.8 | ||
| Inactive: mean difference BRFSS/ NHIS: 26.8 | ||
| Inactive: mean difference BRFSS/ NHANES: 18.5 | ||
| Reis (2005) | Validity testing of multiple indicators from OPAQ to single question on the BRFSS | Agreement between single BRFSS occupational measure and OPAQ: |
Everson and McGinn [
The BRFSS may also be compared with other surveys, interviews and physical measures taken of the same or similar populations. Carlson [
Reis [
Overall, the identified studies of reliability and validity for physical activity measures supported findings of the BRFSS. The reliability of indicators was supported using test/retest methods and time trend methods. Reliability measures for physical activity questions were found to be in the fair to substantial ranges of the statistic k. Findings indicate that the most reliable estimates were achieved for persons who exercise regularly. Validity was assessed by comparison with other surveys, although some of the comparison surveys used different data collection methods. Some research also compared BRFSS physical activity measures and responses to physical measures such as accelerometers. Variation of prevalence estimates was found in some instances, but trends were similar when comparing among survey results over time.
It is not surprising to find that differences in reporting physical activity change over time. Respondents who were contacted for test/retest studies may have, in fact, changed their levels of activity in the interim between testing. Therefore, higher levels of reliability of measures in shorter term retests are reasonable.
The BRFSS collected data on a number of chronic conditions, including diabetes, asthma, arthritis, and cardiovascular diseases. Fahimi [
Reliability and validity studies of chronic condition estimates
| Fahimi (2008) | Comparison of BRFSS, NHIS and NHANES prevalence estimates | Diabetes |
| BRFSS (7.9-8.1)* | ||
| NHIS (7.8-8.5)* | ||
| NHANES(5.1-7.4)* | ||
| Asthma | ||
| BRFSS (13.1-13.6)* | ||
| NHIS (9.5-10.3)* | ||
| Bombard (2005) | Validity and reliability of arthritis questions using different modes and physical measures | Sensitivity 70.8% |
| Specificity 70.3% | ||
| Agreement between phone and written responses | ||
| Sacks (2005) | Validity of BRFSS arthritis questions using physical measures | For ages 45-64 |
| Sensitivity 77.4% | ||
| Specificity 58.8% | ||
| For ages 65 and older | ||
| Sensitivity 83.6% | ||
| Specificity 70.6% | ||
| Cossman (2008) | Validity of BRFSS cardiovascular measures using prescription data | Correlation coefficients (r) = .43-.66 |
Bombard [
The BRFSS included a number of quality of life and related mental health. Andresen [
Reliability and validity studies of mental health estimates
| Andresen (2003) | Reliability test/retest of quality of life measures among Missouri respondents | Overall health ( |
| Poor physical health days ( | ||
| Poor mental health days ( | ||
| Limited activity days ( | ||
| Healthy days ( | ||
| Frequent mental distress ( | ||
| Frequent physical distress ( | ||
| Kapp (2009) | Test/retest of quality of life measures among cancer survivors and other respondents | Physical distress( |
| Activity limitation ( | ||
| Social and emotional support ( | ||
| Life satisfaction ( | ||
| Pain ( | ||
| General health ( |
Three components of behavioral health and status (overweight and obesity, tobacco use and alcohol use) are examined in this section. A comprehensive study of multiple indicators from BRFSS, NHANES and NHIS was conducted by Fahimi [
Reliability and validity studies of obesity estimates
| Fahimi (2008) | Comparison of BRFSS, NHANES, and NHIS measures of height and weight | NHIS and BRFSS height measures differed by .14 inches |
| NHIS and BRFSS weight measures differed by 1.2% | ||
| BRFSS and NHANES height measures were statistically identical | ||
| BRFSS weight measures fell between measures taken by NHANES (self-reports) and NHIS | ||
| Ezzati (2006) | Weighting BRFSS self-reports of height and weight by NHANES to correct for bias/ underestimation | BRFSS underestimation from 1999–2002 averaged 5.9%, but could be corrected by weighting |
| Yun (2006) | Weighting BRFSS self-reports of height and weight by NHANES to correct for bias/ underestimation by race, gender and age | BRFSS underestimated prevalence of obesity and overweight groups by 9.5 and 5.7 percentage points, respectively, Estimates for femalesaged 20–39 differed from NHANES physical measures most often. |
Prevalence estimates reported by Ezzati [
Although tobacco use is widely noted to be related to health status, there are relatively few comparative studies published since 2004 concerning reliability of tobacco use prevalence measures across national surveys. This may be due to the fact that question format differs on these studies, making them somewhat difficult to compare. The BRFSS, NHIS and NHANES all measured tobacco use in some way. Klein [
Reliability and validity studies of tobacco and alcohol use estimates
| Klein (2007) | Validity comparison of online, personal interview, examination and telephone survey results of tobacco use | Smoking prevalence |
| BRFSS 20.9 (median) | ||
| NHIS 20.9-22.1 | ||
| NHANES self-reports 22.4-27.5 | ||
| NHANES physical measures 30.6-38.1 | ||
| HPOL 23.7-24.4 | ||
| Fahimi (2008) | Validity test comparing three national surveys | Current smoker prevalence estimates: |
| BRFSS (20.4-21.0)* | ||
| NHIS (20.3-21.6)* | ||
| NHANES (21.4-25.9)* | ||
| Fahimi (2008) | Validity test comparing national surveys | Binge drinking prevalence estimates: |
| BRFSS (4.2-4.4)* | ||
| NHIS (4.5-4.9)* | ||
| Average Number of drinks per occasion: | ||
| BRFSS (2.4-2.5)* | ||
| NHIS (2.4-2.5)* | ||
| Miller (2004) | Comparison of in-home and telephone survey results related to adult binge drinking | Binge drinking state level prevalence estimates: |
| NSDUH (21.2-22.0)* | ||
| BRFSS (14.5-15.5)* | ||
| Absolute differences by race, age, gender groups for national prevalence estimate: | ||
| (.06-8.1)* |
Fahimi [
Only two studies published since 2004 were identified which examined reliability and/or validity of BRFSS measures of health risks related to sexual behavior (Table
Reliability and validity studies of health risks related to sexual behavior, injury rick and partner violence
| Santelli (2008) | Validity of BRFSS using comparison with NSFG | BRFSS and NSGF, respectively |
| Not Sexually Active (16.5% and 12.5%) | ||
| Vasectomy (7.7% and 6.3%) | ||
| Use of the pill (21.9% and 19.6%) | ||
| Rhythm (1.5% and 1.0%) | ||
| Diaphragm (.5% and .2%) | ||
| Withdrawal (.3% and 2.7%) | ||
| Fahimi (2008) | Comparison of BRFSS and NHIS prevalence estimates | BRFSS (43.4-44.2)* |
| NHIS (33.9-35.3)* | ||
| Bonomi (2006) | Validity testing of BRFSS and WEB surveys | Agreement levels BRFSS/ WEB |
| Any abuse (88.2%) | ||
| Sexual abuse (93.6%) | ||
| Physical abuse (90.7%) | ||
| Fear due to threats (92.9%) | ||
| Controlling behavior (91.9%) |
Only one study published since 2004 was identified which examined reliability of BRFSS measures on violence and injury risk (Table
Despite concerns about declines in telephone survey response rates, the BRFSS is comparable to other national and state level surveys investigating similar topics. In comparison with the last comprehensive review of literature on reliability and validity conducted over a decade ago [
In some cases, even when prevalence estimates differed, other statistical relationships within survey datasets remained the same. For example, although rates of binge drinking were different among some of the surveys, demographic characteristics associated with binge drinking persisted for all of the datasets examined by the literature cited here. In other cases where prevalence rates differed, trends noted in the BRFSS were also noted in other national surveys. Over or under reporting of health risk behaviors is in part a function of the desire of respondents to please interviewers, regardless of whether responses were collected by phone or in personal interviews. However, bias created by the physical presence of interviewers is likely to be stronger than that created by surveys conducted over the phone when respondents were asked sensitive questions [
The BRFSS produced similar prevalence rates as other surveys examined by the literature; however, care should always be taken when comparing estimates from different surveys. Consumers of information should examine the questionnaires, the number and timing of questions as well as the mode of interview and sampling methods before determining that prevalence rates are comparable. As BRFSS has moved to a new weighting method and included cell phone respondents in its sample, users should replicate their examination of reliability and validity of BRFSS estimates. This research updated that of Nelson [
BMI: Body Mass Index; BRFSS: Behavioral Risk Factor Surveillance System; CDC: Centers for Disease Control and Prevention; CPS: Current Population Survey; HMO: Health Maintenance Organization; HPOL: Harris Poll Online; ICC: Intercorrelation coefficient; NHANES: National Health and Nutrition Examination Survey; NHIS: National Health Interview Survey; NSFG: National Survey of Family Growth; NSDUH: National Survey of Drug Use and Health; OPAQ: Occupational Physical Activity Questionnaire; WEB: Women’s Experience with Battering Scale.
The authors know of no competing interests.
Carol Pierannunzi participated in the literature review, completed the first draft; Sean Hu participated in the literature review and commented on drafts of the manuscript; Lina Balluz had the original concept of the manuscript and commented in drafts. All authors participated in responding to reviewers’ comments and suggestions for change.
The pre-publication history for this paper can be accessed here:
The authors wish to thank the members of the Survey Operations Team of the Division of Behavioral Surveillance at the Centers for Disease Control and Prevention. Thanks especially to Machell Town and Bill Garvin for their attention to data quality during the collection and weighting phases of the BRFSS. Thanks also to Dr. Chaoyang Li for helpful comments on previous versions of the manuscript.
The findings and conclusions in this manuscript are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.