Am J EpidemiolamjepidajeAmerican Journal of Epidemiology0002-92621476-6256Oxford University Press20421221286673910.1093/aje/kwq026Practice of EpidemiologyImprovements in Ability to Detect Undiagnosed Diabetes by Using Information on Family History Among Adults in the United StatesYangQuanhe*LiuTiebinValdezRodolfoMoonesingheRamalKhouryMuin J.Correspondence to Dr. Quanhe Yang, National Office of Public Health Genomics, Centers for Disease Control and Prevention, 1600 Clifton Road, Northeast, MS E61, Atlanta, GA 30333 (e-mail: qay0@cdc.gov).155201025420102542010171101079108916920091412010American Journal of Epidemiology Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health 2010.2010This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Family history is an independent risk factor for diabetes, but it is not clear how much adding family history to other known risk factors would improve detection of undiagnosed diabetes in a population. Using the National Health and Nutrition Examination Survey for 1999−2004, the authors compared logistic regression models with established risk factors (model 1) with a model (model 2) that also included familial risk of diabetes (average, moderate, and high). Adjusted odds ratios for undiagnosed diabetes, using average familial risk as referent, were 1.7 (95% confidence interval (CI): 1.2, 2.5) and 3.8 (95% CI: 2.2, 6.3) for those with moderate and high familial risk, respectively. Model 2 was superior to model 1 in detecting undiagnosed diabetes, as reflected by several significant improvements, including weighted C statistics of 0.826 versus 0.842 (bootstrap P = 0.001) and integrated discrimination improvement of 0.012 (95% CI: 0.004, 0.030). With a risk threshold of 7.3% (sensitivity of 40% based on model 1), adding family history would identify an additional 620,000 (95% CI: 221,100, 1,020,000) cases without a significant change in false-positive fraction. Study findings suggest that adding family history of diabetes can provide significant improvements in detecting undiagnosed diabetes in the US population. Further research is needed to validate the authors’ findings.

decision analysislogistic regressionmass screeningmodel fittingnutrition surveysrisk

Family history is a consistent risk factor for many chronic diseases of public health significance (1) and, in the past few years, it has increasingly been discussed as a tool for preventing common diseases and for promoting health (24). In 2005, the US Surgeon General launched a public health campaign to enhance the public's awareness of the importance of family history (http://www.hhs.gov/familyhistory/), and the Centers for Disease Control and Prevention (CDC) has initiated a public health research initiative on this topic. The CDC's initiative is focused primarily on several common chronic diseases, including diabetes, stroke, heart disease, and cancers (http://www.cdc.gov/genomics/famhistory/index.htm). Yet, in spite of the increased interest in family history as a public health tool, the clinical validity and utility of this readily obtained risk factor have not been systematically evaluated.

In the present study, we assessed the improvements in detecting undiagnosed diabetes among US adults that might be obtained by using information on family history. Among an estimated 24 million individuals with diabetes in the United States in 2007 (based on fasting plasma glucose), 28% (6.6 million) were undiagnosed (5). One of the rationales for asking undiagnosed people about their family history is that a number of diabetes risk models/tools have included family history of diabetes as a risk factor, with an estimated relative risk 2–6 times that of people without family history (615). Furthermore, other studies suggest that family history might be an effective screening tool for identifying both diabetes and undiagnosed diabetes (1, 3, 14, 1618). Even so, none of these studies has formally evaluated the improvements in detecting undiagnosed diabetes by using family history. This is important in part because both empirical and theoretical analyses have suggested that a significant and independent risk factor for a disease does not necessarily increase the ability of detecting the disease or to enhance the discrimination ability between people with and without disease (19).

Receiver-operating characteristic (ROC) curves and the associated C statistics are commonly used to summarize the diagnostic accuracy of risk models and to assess the improvements made to such models that are gained from adding other risk factors (20). Some studies, however, have criticized ROC curves for lacking the ability to display the risk in a particular population and to assess the reclassification of individuals into different risk groups (e.g., higher risk, lower risk) (19, 21). Recently, researchers have developed several alternative methods to assess the improvements made by a new marker or risk factor in risk models (2225). Predictiveness curves, for example, display the distribution of risk in the population and also assess the classification ability of additional risk factors (24). Alternatively, the net reclassification improvement and integrated discrimination improvement integrate sensitivity, specificity, and the information from reclassification tables to assess improvements in risk models that include new risk factors (23). A third method involves net benefit curves, which might help to determine whether it would be cost-effective to include an additional risk factor in the risk model (25). We applied both conventional and recently developed methods to assess the improvements made from using family history to detect cases of undiagnosed diabetes among adults. Our source of data was the National Health and Nutrition Examination Survey (NHANES) for 1999–2004.

MATERIALS AND METHODS

NHANES is a series of stratified, multistage probability surveys designed to obtain information on the health and nutritional status of the civilian, noninstitutionalized US population. From 1999, NHANES data have been collected continuously, with every 2 years serving as 1 analytical cycle. The data are collected by the National Center for Health Statistics, CDC, via household interviews and physical examinations and are intended to provide estimates that are representative of the US population. Detailed information is available elsewhere (http://www.cdc.gov/nchs/nhanes.htm). The present study included 3 cycles (1999–2000, 2001–2002, and 2003–2004) of samples of adults aged ≥20 years who were examined in the morning after overnight fasting (between 8 and 23 hours) and did not have diagnosed diabetes. When analyzing combined data sets, we found that the sampling weights must be recalculated to produce unbiased estimates, because weights for the 1999–2000 cycle were based on population data prior to the 2000 US Census, and weights for the other cycles were based on the 2000 US Census. Detailed NHANES analytic and reporting guidelines that provide algorithms to recalculate the sampling weights can be found at the following website: (http://www.cdc.gov/NCHS/data/nhanes/nhanes_03_04/nhanes_analytic_guidelines_dec_2005.pdf).

Undiagnosed diabetes and family history of diabetes

We excluded pregnant women and persons with diagnosed diabetes, with unknown diabetes status, and with missing values for some of the covariates. Participants with a fasting plasma glucose level of ≥126 mg/dL (7.0 mmol/L) who reported no previous diagnosis of diabetes were defined as cases of undiagnosed diabetes (26).

We classified all participants into 3 mutually exclusive groups of familial risk on the basis of their family history of diabetes among first- and second-degree relatives: 1) high (at least 2 first-degree relatives or 1 first-degree and at least 2 second-degree relatives from the same lineage); 2) moderate (just 1 first-degree and 1 second-degree relative with diabetes, or only 1 first-degree relative with diabetes, or at least 2 second-degree relatives with diabetes from the same maternal or paternal line); or 3) average (no family history of diabetes or, at most, 1 second-degree relative with diabetes) (15). We use the term “family history of diabetes” to mean all 3 groups (high, moderate, and average). Limited information on family history of diabetes in NHANES 1999–2004 does not allow further detailed analysis.

Risk models

We used logistic regression models to calculate the predicted risk for undiagnosed diabetes. To select the appropriate models, we started with the list of those risk factors suggested by the American Diabetes Association that were available in NHANES 1999–2004 (27); these risk factors included age, race/ethnicity (non-Hispanic white, non-Hispanic black, Mexican American, others), body mass index, physical activity (inactive, irregularly active, regularly active), hypertension (≥140/90 mm Hg or on therapy for hypertension), a high density lipoprotein cholesterol level of ≤35 mg/dL (0.90 mmol/L) and/or a triglyceride level of ≥250 mg/dL (2.82 mmol/L), history of cardiovascular disease, and family history of diabetes. We used the backwards selection approach, including all suggested risk factors in the multiple logistic regression models with α = 0.10 to select the final models (28, 29). These final models included age, gender, body mass index, hypertension, low high-density lipoprotein cholesterol and/or elevated triglycerides, and family history of diabetes. We found no evidence of multicollinearity among the selected risk factors (30). We tested interactions between family history and other risk factors by including the product terms in the risk models based on the Satterthwaite-adjusted F test. There is no evidence of significant interaction. We also included age as a nonlinear term, the logarithm of high density lipoprotein cholesterol as a continuous variable, and the interaction between body mass index and high density lipoprotein cholesterol in the model. The full model does not offer significant improvements over the main effect models (results not shown). For simplicity, we used the main effects models. Similar sets of risk factors have been used and validated by other studies using the NHANES data (15, 31, 32). For assessments of the improvements in detecting undiagnosed diabetes by using family history of diabetes, we calculated 2 risk models: one that had the selected risk factors excluding family history of diabetes (model 1) and the other with the selected risk factors plus family history of diabetes (model 2, nested model).

Statistical analysis

The adjusted and weighted prevalence and odds ratios and 95% confidence intervals for undiagnosed diabetes were obtained by logistic regression models by using the predicted margins by the 3 categories of family history of diabetes (33). The prevalence and odds ratios were adjusted by the risk factors selected for the final model. We estimated the mean and standard error for continuous variables, proportions for categorical variables, and their 95% confidence intervals by levels of family history of diabetes. We tested for significant differences in the mean and prevalence across levels of family history of diabetes based on Satterthwaite-adjusted F statistics and on the χ2 test, respectively. All tests were 2 tailed at the α = 0.05 level of significance.

Assessment of risk models and improvements from using family history in detection of undiagnosed diabetes

For the global measure of models’ fit, we used the Akaike Information Criterion (AIC) estimated from the logistic regression models; a difference in AIC between 2 models of >2 was interpreted as a significant improvement for the model with the smaller AIC (34). For models’ calibration, we calculated Hosmer-Lemeshow goodness-of-fit statistics on the basis of deciles of risk (29). For the discrimination abilities of family history of diabetes, we constructed the weighted ROC curves and calculated the C statistics (35). To test for the significance of differences between AIC values, between weighted ROC curves, and between C statistics of different risk models, we used the rescaling bootstrap method of Cheng et al. (36) and Rao et al. (37) that takes into account the complex survey design by changing the sampling weights for each resample. We generated 1,000 rescaled bootstrap weights, calculated the distribution for the 2.5 and 97.5 percentiles, and reported these values as the 95% confidence intervals of the differences between different risk models (38).

The predictiveness curve described earlier is an integrated plot of predicted risks from logistic regression models formed by the percentiles of risk in the population (24). From the predictiveness curves, one could read off the predicted probability of an event for any corresponding true-positive fraction (sensitivity) or false-positive fraction (1 − specificity). We constructed the weighted predictiveness curves. For the summary measure of weighted predictiveness curves, we calculated the proportion-explained variations (R2) for each risk model and used the rescaling bootstrap method to make the inference about significant differences between the different R2 variations (37). The difference between R2 variations is equivalent to the integrated discrimination improvement index proposed by Pencina et al. (23) and Pepe et al. (39) that measures the ability of the additional risk factor to increase the predicted probability among those who had the event and to decrease the predicted probability among those who were event free (23).

For risk prediction, it is important to examine if the model with the additional risk factor can more accurately stratify individuals into higher or lower risk categories (risk reclassification) (21). Some recently developed risk reclassification measures require use of recognized risk thresholds (22, 23), but at present no researchers or clinicians have proposed any risk classification schemes (risk thresholds) for clinical use in identifying higher- or lower-risk patients for diabetes. Nor have they proposed follow-up tests such as glucose testing for people at higher or lower risk to identify those who really have diabetes. Accordingly, we used logistic regression model 1 to determine the predicted probability of events that corresponded approximately to 20%, 40%, 60%, and 80% of undiagnosed diabetes (dichotomous cutpoints at quintiles of sensitivity) and used these probability thresholds to identify the true-positive fraction and false-positive fraction from the predictiveness curves. We also calculated the net reclassification improvement index, positive predictive values, and negative predictive values for each dichotomous risk threshold for model 1 and model 2, respectively. The net reclassification improvement index is a special case of integrated discrimination improvement with the recognized risk thresholds (23). We used the rescaled bootstrap method with 1,000 samples to estimate the 95% confidence intervals of integrated discrimination improvement and net reclassification improvement (39).

To help to determine whether including a risk factor in a risk model might be cost-effective, we used decision curve analysis (25). Briefly, decision curve analysis estimates the net benefit of a model by taking the difference between the number of true positives and the number of false positives weighted by the odds of the selected threshold probability of risk for a range of threshold probabilities (25, 40). The net benefit of a model compared with the reference net benefit or compared with another model might be interpreted as the net increase in the proportion of cases identified. The reference was calculated by assuming that all people were tested for the events, and testing no one was set to a net benefit of zero. For any given threshold probability cutpoint, the risk models with the higher net benefit are the preferred model (41). We calculated and plotted the weighted net benefit curves for the reference model (testing all), model 1, and model 2, respectively. We used the quintile cutpoints of the predicted probabilities to compare the net benefits curves of model 1 and model 2 and calculated the differences in the net benefit between the 2 models for the each cutpoint and 95% confidence interval of difference between the 2 models using the rescaled bootstrap method. Unless otherwise specified, data were analyzed by using SAS, version 9.2, software (SAS Institute, Inc., Cary, North Carolina) and SUDAAN, release 9.0, software (Research Triangle Institute, Research Triangle Park, North Carolina) to account for the complex sampling design of NHANES 1999–2004 (42).

RESULTS

NHANES 1999–2004 surveyed 5,551 adults aged ≥20 years without diagnosed diabetes who were asked for a blood sample after fasting overnight. The 498 persons excluded included 324 pregnant women, 1 person with unknown diabetes status, and 173 people with missing covariates. Of the final sample (n = 5,053), 73.6% were non-Hispanic white; 10.5%, non-Hispanic black; 7.2%, Mexican American; and 8.8%, other race/ethnicity.

The prevalence of undiagnosed diabetes, adjusted odds ratios, and characteristics of the people by level of familial risk of diabetes are summarized in Table 1. The prevalence increased significantly with level of familial risk from 2.2% (95% confidence interval (CI): 1.7, 2.6) to 7.2% (95% CI: 4.2, 10.1) (P = 0.001). The adjusted odds ratio increased from 1.7 (95% CI: 1.2, 2.5) to 3.8 (95% CI: 2.2, 6.3) for moderate and high familial risk, respectively. Familial risk of diabetes was significantly associated with all the selected covariates except for physical activity.

Characteristics of Participants by Family History of Diabetes, National Health and Nutrition Examination Survey, 1999–2004

CharacteristicSample, no.Familial Risk of Diabetes
P Valuea
Average (n = 3,526)95% CIModerate (n = 1,150)95% CIHigh (n = 377)95% CI
Prevalence of undiagnosed diabetes, %2002.21.7, 2.63.62.5, 4.77.24.2, 10.10.001
Adjusted odds ratiob1.01.71.2, 2.53.82.2, 6.3<0.001
Mean age, years (SE)5,05344.7 (0.43)46.5 (0.75)47.3 (0.87)0.002
Gender, %
    Male2,55649.948.5, 51.448.545.3, 51.841.635.0, 48.5
    Female2,49750.148.6, 51.551.548.2, 54.758.451.5, 65.00.046
Race/ethnicity, %
    Non-Hispanic white2,68274.871.2, 78.172.667.6, 77.063.555.3, 70.9
    Non-Hispanic black9079.87.9, 12.011.18.6, 14.216.311.9, 21.8
    Mexican American1,1316.65.2, 8.38.45.9, 11.710.37.3, 14.4
    Other (including other Hispanic)3338.96.6, 11.98.05.4, 11.59.96.0, 15.9<0.001
Body mass index category, %
    <18.5 kg/m2712.11.6, 2.91.00.5, 2.11.00.3, 3.2
    18.5–24.9 kg/m21,62837.435.1, 39.827.724.2, 31.526.921.2, 33.6
    25–29.9 kg/m21,84334.732.3, 37.235.932.4, 39.535.127.7, 43.2
    ≥30 kg/m21,51125.823.7, 27.935.432.4, 38.537.030.5, 44.0<0.001
Physical activity (n = 4,889), %
    Inactive2,02333.431.0, 35.836.232.8, 39.739.533.7, 45.7
    Irregularly active1,64938.536.5, 40.636.332.0, 40.936.830.5, 43.5
    Regularly active1,24028.125.8, 30.527.523.6, 31.823.718.6, 29.70.136
Hypertension, %
    Yes2,01031.229.2, 33.438.034.8, 41.340.033.9, 46.3
    No3,04368.866.6, 70.862.058.7, 65.260.053.7, 66.1<0.001
Lipid, %
    HDL-C of ≤35 mg/dL or triglycerides of ≥250 mg/dL85115.414.0, 16.920.817.6, 24.521.215.7, 27.9
    Other4,20284.683.1, 86.079.275.5, 82.478.872.1, 84.30.011
History of heart disease, %
    Yes3714.94.1, 5.75.94.4, 7.910.57.2, 15.0
    No4,65495.194.3, 95.994.192.1, 95.689.585.0, 92.80.005
Overall5,05371.969.9, 73.721.820.0, 23.86.35.5, 7.2<0.001

Abbreviations: CI, confidence interval; HDL-C, high density lipoprotein cholesterol; SE, standard error.

For prevalence and odds ratios of undiagnosed diabetes, P values were for the trend across the categories of family history of diabetes based on the Satterthwaite-adjusted F test; for categorical variables, P values were based on the χ2 test; all tests were 2 tailed.

Adjusted for age, gender, body mass index, hypertension, a HDL-C level of ≤35 mg/dL (0.90 mmol/L) and/or a triglyceride level of ≥250 mg/dL, and family history of diabetes.

Assessing improvements in the detection of undiagnosed diabetes by using family history

Table 2 includes several statistical measures of overall fit, discrimination ability, and reclassification of risk for models 1 and 2. Compared with model 1, model 2 represented significant improvements in 3 statistical measures in detecting undiagnosed diabetes: a lower AIC, a significant improvement in the weighted C statistic, and a significant improvement in reclassification as measured by integrated discrimination improvement. Models 1 and 2 demonstrated similar levels of calibration (goodness-of-fit tests), suggesting the adequate fit of both models.

Comparison of 2 Models’ Fit, Discrimination Ability, and Risk Reclassification, National Health and Nutrition Examination Survey, 1999–2004

Statistical Measures of Undiagnosed DiabetesModels
Difference (Model 1 − Model 2)95% CIa
Without Family History (Model 1)bWith Family History (Model 2)c
AICd59057910.91.4, 24.3
Goodness-of-fit teste12.6 (0.126)7.0 (0.534)
Weighted C statistics0.8260.8420.0160.005, 0.031
R2/IDI0.0550.0670.012f0.004, 0.030

Abbreviations: AIC, Akaike Information Criterion; CI, confidence interval; IDI, integrated discrimination improvement.

The 2.5 and 97.5 percentile distributions of 1,000 rescaled bootstrap samples of the differences between the different risk models.

Model 1 was adjusted for age, gender, body mass index, hypertension, and a high density lipoprotein cholesterol level of ≤35 mg/dL (0.90 mmol/L) and/or a triglyceride level of ≥250 mg/dL.

Model 2 included, in addition to the risk factors in model 1, family history of diabetes.

The means and differences of AIC were generated from 1,000 rescaled bootstrap samples for the different risk models.

Hosmer-Lemeshow goodness-of-fit test; the numbers are χ2, with P values in parentheses.

The difference between the R2 of the 2 risk models equals the IDI.

Figure 1A plots the weighted predictiveness curves, and Figure 1B shows the weighted true-positive fraction and false-positive fraction by risk percentiles in the population. These graphs show that using family history of diabetes, in addition to the selected risk factors, reclassified the people with undiagnosed diabetes to the higher predicted risk and the diabetes-free people to the lower predicted risk. Appendix Table 1 presents a detailed analysis of the selected risk thresholds. In a comparison of model 2 with model 1, for a higher risk threshold (e.g., at 7.3%, or approximately the 89th percentile of risk distribution in the population) (Figure 1), the weighted true-positive fraction (Appendix Table 1) increased from 40.0% (95% CI: 29.4, 51.5) in model 1 to 49.4% (95% CI: 37.9, 60.9) in model 2. The weighted positive predictive value rose from 11.0% (95% CI: 8.5, 14.4) to 14.2% (95% CI: 10.9, 18.2), and the net reclassification improvement in model 2 was 10.1% (95% CI: 1.0, 18.1; P = 0.009). The weighted false-positive fraction and the negative predictive value remained largely unchanged at this risk threshold. At this level of risk, model 2 would identify approximately 620,000 (95% CI: 221,100, 1,020,000) more cases of undiagnosed diabetes in the population than would model 1 (2.64 million vs. 3.26 million). As the risk thresholds lowered, model 2 was associated with a decreased false-positive fraction and little change in negative predictive value compared with model 1. However, these changes were not significant enough to have a significant improvement in risk reclassification indicated by net reclassification improvement.

Weighted predictiveness curves (A) and true-positive fraction (TPF) and false-positive fraction (FPF, 1 − specificity) (B) for model with selected risk factors (model 1) and model with selected risk factors plus family history of diabetes (model 2), National Health and Nutrition Examination Survey, 1999–2004. The horizontal dashed line in A indicates the prevalence of undiagnosed diabetes in the population (2.9%).

Decision curves analysis

Figure 2 presents the weighted net benefit curves derived for testing all people versus testing strategies based on model 1 and model 2. Model 2 appeared to offer greater net benefit across most risk thresholds, especially from the predicted risk of around 5% to 15%. Both of the model-based net benefits were higher than testing all (the reference testing strategy). Appendix Table 2 presents the detailed analysis of net benefits for 4 selected risk thresholds. Comparing model 2 with model 1, for example, at a 7.3% risk threshold (40% sensitivity based on model 1), the difference of net benefit equals 0.32 per 100 people (95% CI: 0.06, 0.58), indicating that 3 extra cases of undiagnosed diabetes would be detected per 1,000 subjects based on model 2. The differences in net benefits between the 2 models diminished at either higher or lower risk thresholds, especially at the lower risk thresholds.

Weighted decision curves for models predicting undiagnosed diabetes using models with family history of diabetes (solid line) and without this history (small dashed line), National Health and Nutrition Examination Survey, 1999–2004. The dash-dot-dot-dash line indicates the net benefit of testing all people, and the horizontal dashed line indicates testing none of the people. The y axis indicates the number of true cases identified per 100 people.

DISCUSSION

This study confirms that family history of diabetes is an independent risk factor for undiagnosed diabetes, a finding that is consistent with those of many other studies (615, 43). Recent National Institutes of Health state-of-the-science statements on family history recognized the important role of family history in the practice of medicine, motivation of positive lifestyle changes, and influence of clinical interventions (44). Our study assessed the improvements in detecting undiagnosed diabetes that would come from including family history in risk assessment and population screening. Our findings suggest that using a risk model with family history of diabetes offers significant improvements over a model with common risk factors in detecting undiagnosed diabetes, especially among populations at higher risk. For example, by using a risk threshold of 7.3% (the median predicted risk = 1.3% in the population), approximately 11% of the population had a predicted risk ≥7.3% based on model 1. With model 2 we had a net reclassification improvement of 10.1% (95% CI: 1.0, 18.1; P = 0.009) that was mainly due to the increase in true-positive fraction from 40.0% (95% CI: 29.4, 51.5) in model 1 to 49.4% (95% CI: 37.9, 60.9) in model 2, a 24% increase in the number of undiagnosed diabetes cases identified. In other words, using model 2 at a risk threshold of 7.3%, one would identify approximately 3.26 million cases instead of 2.64 million cases of undiagnosed diabetes of an estimated 6.6 million total cases without an increase in false-positive fraction.

Some researchers have argued that the statistical measures of risk models for performance in prediction and reclassification have limited value for evaluation of the clinical utility of the additional risk factor/marker because they do not consider cost-effectiveness (25, 41, 45). However, the traditional cost-effectiveness analysis of diagnostic tests has involved collecting additional data on alternative treatments that could involve substantial cost and sometimes might be difficult to collect (46, 47). The decision curve analysis, which does not require collecting additional data on cost and effectiveness, offers a simple approach to examining the clinical consequences of alternative testing strategies and to comparing the different risk models in terms of net benefits over a range of predicted probabilities for an event (25). The focus of the net benefit curves is not on any particular point estimate, but rather on the entire range of threshold probabilities in a way that one net benefit curve is greater or lesser than the other alternatives (25, 48). Our findings indicate that the net benefit curves derived from model 2 (versus model 1) were greater over nearly the whole range of risk thresholds, especially from 5% to 15% predicted risks, indicating the net benefit of detecting extra cases of undiagnosed diabetes based on model 2. Given the fact that little cost might be involved in collecting information on family history of diabetes, the evaluation of added value of using family history should mainly focus on the magnitude of the benefit rather than on cost-effectiveness.

The limitations to our study include, first, that NHANES is a cross-sectional survey, and it cannot be used to predict the risk of developing diabetes. Accordingly, we focused our analysis on the improvements in detecting undiagnosed diabetes that might be realized by incorporating family history of diabetes in a model. Second, NHANES 1999–2004 measured fasting glucose but did not assess glucose tolerance, and thus it might have underestimated the prevalence of diabetes. However, the American Diabetes Association has recommended that, for epidemiologic studies and estimates of diabetes prevalence, a fasting plasma glucose level of ≥126 mg/dL (7.0 mmol/L) should be used (49). Third, diabetes was self-reported in NHANES 1999–2004, and reporting bias by different groups might exist. Studies indicated that the proportion of undiagnosed diabetes was higher in men, Mexican Americans, and the uninsured compared with women, non-Hispanic whites, and the insured, suggesting some reporting bias of diagnosed diabetes (50). The prevalence of undiagnosed diabetes might be overrepresented in certain groups in NHANES 1999–2004. Fourth, the family risk of diabetes was significantly related to sex and race/ethnicity (31, 51, 52). Women tend to have a better knowledge of the presence of the disease among their relatives, and the large families, for example, non-Hispanic blacks and Mexican Americans compared with non-Hispanic whites, are likely to have a greater possibility of relatives with diabetes than the smaller families, especially among populations where the disease prevalence is high. To examine the possible effect of sex, ethnicity, or racial differences in the familial risk of diabetes on the detection of undiagnosed diabetes, we conducted stratified analysis by sex and race/ethnicity; the results suggested that the improvements in detecting undiagnosed diabetes by using family history of diabetes are consistent across sex and race/ethnicity strata (Appendix Table 3). Fifth, there are no generally recognized risk thresholds for undiagnosed diabetes; we arbitrarily used the quintile cutpoints of predicted risk that included 20%, 40% 60%, or 80% of undiagnosed diabetes cases based on risk model 1. Some statistical measures of how well a model performs in prediction, such as net reclassification improvement, might be sensitive to the risk thresholds used (23). Sixth, using the same data to fit a risk model and to assess its performance could lead to overfitting. We conducted 5-fold cross-validation and obtained an average weighted area under curve = 0.84 for the final model with family history, and external validation using the NHANES III (1988–1994) data set obtained a weighted area under curve = 0.89, indicating adequate performance of our risk models.

The major strengths of our study include the availability of fasting glucose measurements from a nationally representative sample of the US adult population and the large number of potential risk factors for undiagnosed diabetes to investigate.

Our findings suggest that family history of diabetes provides significant improvements in the detection of additional cases of undiagnosed diabetes, especially among people with higher predicted risk. It also provides greater net benefits than a risk model without family history when applied to the US population. Unlike other biomarkers, for example, prostate-specific antigen for prostate cancer or C-reactive protein for cardiovascular diseases, or genetic testing, obtaining information on family history of diabetes costs little, and no adverse effect is associated with the process. With increased awareness and education, family history could be a useful part of a public health tool designed for the detection and control of diabetes in populations.

Author affiliations: National Office of Public Health Genomics, Centers for Disease Control and Prevention, Atlanta, Georgia (Quanhe Yang, Tiebin Liu, Rodolfo Valdez, Muin J. Khoury); and Office of Minority Health, Centers for Disease Control and Prevention, Atlanta, Georgia (Ramal Moonesinghe).

The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

Conflict of interest: none declared.

AbbreviationsAIC

Akaike Information Criterion

CDC

Centers for Disease Control and Prevention

CI

confidence interval

NHANES

National Health and Nutrition Examination Survey

ROC

receiver-operating characteristic

Weighted True-Positive Fraction, False-Positive Fraction, Positive Predictive Value, Negative Predictive Value, and Net Reclassification Index of Undiagnosed Diabetes Using Risk Models With and Without Family History of Diabetes, National Health and Nutrition Examination Survey, 1999–2004

Predicted Probability of Events, %Undiagnosed Diabetes
No. of Cases of Undiagnosed Diabetes Identified in Population (× 100,000)No. of People ≥ Predicted Probability in Population (× 100,000)
True-Positive Fraction95% CIFalse-Positive Fraction95% CIPositive Predictive Value95% CINegative Predictive Value95% CINet Reclassification Index, %95% CI
Model Without Family History (Model 1)a
12.020.013.8, 29.63.32.9, 3.915.810.7, 22.697.697.1, 98.013.256.9
7.340.029.4, 51.59.79.0, 10.511.08.5, 14.498.197.4, 98.526.4170.9
5.460.050.8, 69.515.714.5, 16.910.58.6, 12.798.698.1, 99.039.6268.7
3.680.071.5, 86.826.124.5, 27.98.57.3, 9.999.298.7, 99.552.8448.3
Model with Family History (Model 2)b
12.027.219.1, 37.13.73.1, 4.418.313.1, 25.097.897.2, 98.26.30.6, 12.1c17.963.1
7.349.437.9, 60.99.18.2, 10.014.210.9, 18.298.497.8, 98.810.11.0, 18.1c32.6159.4
5.460.048.2, 70.414.413.4, 15.511.29.0, 13.998.798.0, 99.00.7−7.1, 7.6c39.6246.8
3.675.266.8, 82.123.522.0, 25.08.97.5, 10.599.098.6, 99.3−2.3−9.3, 3.8c49.6403.5

Abbreviation: CI, confidence interval.

Model 1 included age, gender, body mass index, hypertension, and a high density lipoprotein cholesterol level of ≤35 mg/dL (0.90 mmol/L) and/or a triglyceride level of ≥250 mg/dL.

Model 2 included, in addition to model 1 risk factors, family history of diabetes.

The 95% confidence intervals of the net reclassification index were estimated by using 1,000 rescaled bootstrap samples for the complex surveys.

Weighted Net Benefit and Differences in Net Benefit for Testing All People for Undiagnosed Diabetes or According to Risk Models With or Without Family History Using Selected Thresholds of Predicted Probabilities of Undiagnosed Diabetes, National Health and Nutrition Examination Survey, 1999–2004

Predicted Probability of Events, %True-Positive Fraction, %95% CIModelsNet Benefit, %95% CIaDifferences Between Testing All vs. Model 1 and Model 1 vs. Model 295% CIa
12.020.013.8, 29.6Testing allb−10.30−10.86, 9.85
Model 1c0.16−0.09, 0.3910.59.99, 11.00
Model 2d0.320.03, 0.580.160.03, 0.33
7.340.029.4, 51.5Testing allb−4.70−5.23, 4.27
Model 1c0.440.10, 0.745.144.61, 5.65
Model 2d0.760.37, 1.130.320.06, 0.58
5.460.050.8, 69.5Testing allb−2.61−3.13, 2.19
Model 1c0.910.52, 1.213.523.11, 3.90
Model 2d0.960.55, 1.330.05−0.12, 0.32
3.680.071.5, 86.8Testing allb−0.69−1.2, 0.28
Model 1c1.411.00, 1.742.101.82, 2.37
Model 2d1.360.92, 1.68−0.03−0.23, 0.14

Abbreviation: CI, confidence interval.

Ninety-five percent confidence intervals of the difference in net benefit between testing all versus model 1 and model 1 versus model 2 were estimated by using 1,000 rescaled bootstrap samples for complex surveys.

Assuming that all people were tested for fasting glucose concentrations for diagnosis of diabetes.

Model 1 included age, gender, body mass index, hypertension, and a high density lipoprotein cholesterol level of ≤35 mg/dL (0.90 mmol/L) and/or a triglyceride level of ≥250 mg/dL.

Model 2 included, in addition to the risk factors of model 1, family history of diabetes.

Comparison of Models’ Fit, Discrimination Ability, Risk Stratification, and Risk Reclassification Between Models With and Without Family History of Diabetes for Detecting Undiagnosed Diabetes Stratified by Sex and Race/Ethnicity, National Health and Nutrition Examination Survey, 1999–2004

Statistical MeasuresModels
Differences (Model 1 − Model 2)95% CIa
Without Family History (Model 1)bWith Family History (Model 2)c
Male
    AICd357.4356.31.1−3.6, 9.9
    Goodness-of-fit teste7.4 (0.289)3.5 (0.743)
    Weighted C statistics0.8370.8480.0110.001, 0.024
    R2/IDI0.06770.07340.006−0.001, 0.031f
Female
    AICd247.3239.28.1−0.5, 18.9
    Goodness-of-fit teste5.1 (0.280)2.0 (0.732)
    Weighted C statistics0.8200.8470.0270.005, 0.054
    R2/IDI0.0490.0730.0240.006, 0.065f
Non-Hispanic white
    AICd319.6316.43.2−2.6, 11.4
    Goodness-of-fit teste4.2 (0.124)3.1 (0.213)
    Weighted C statistics0.8480.8630.0150.003, 0.031
    R2/IDI0.0630.0730.0100.001, 0.039f
Non-Hispanic black
    AICd131.9128.33.6−3.2, 15.5
    Goodness-of-fit teste3.2 (0.788)6.3 (0.392)
    Weighted C statistics0.8310.8560.025−0.006, 0.061
    R2/IDI0.0720.1130.0410.010, 0.125f
Mexican American
    AICd125.7118.57.3−3.3, 22.2
    Goodness-of-fit teste3.1 (0.381)4.6 (0.203)
    Weighted C statistics0.8540.8960.042−0.003, 0.085
    R2/IDI0.0640.1250.0610.010, 0.191f

Abbreviations: AIC, Akaike Information Criterion; CI, confidence interval; IDI, integrated discrimination improvement.

The 2.5 and 97.5 percentile distributions of 1,000 rescaled bootstrap samples of the differences between the different risk models.

Model 1 was adjusted for age, gender, body mass index, hypertension, and a high density lipoprotein cholesterol level of ≤35 mg/dL (0.90 mmol/L) and/or a triglyceride level of ≥250 mg/dL.

Model 2 included, in addition to the risk factors in model 1, family history of diabetes.

The means and differences of AIC were generated from 1,000 rescaled bootstrap samples for the different risk models.

Hosmer-Lemeshow goodness-of-fit test; the numbers are χ2, with P values in parentheses.

The difference between the R2 of the 2 risk models equals the IDI.

YoonPWScheunerMTKhouryMJResearch priorities for evaluating family history in the prevention of common chronic diseasesAm J Prev Med200324212813512568818GuttmacherAECollinsFSCarmonaRHThe family history—more important than everN Engl J Med2004351222333233615564550YoonPWScheunerMTPeterson-OehlkeKLCan family history be used as a tool for public health and preventive medicine?Genet Med20024430431012172397RichECBurkeWHeatonCJReconsidering the family history in primary careJ Gen Intern Med200419327328015009784American Diabetes AssociationEconomic costs of diabetes in the U.S. in 2007Diabetes Care200831359661518308683BaanCARuigeJBStolkRPPerformance of a predictive model to identify undiagnosed diabetes in a health care settingDiabetes Care199922221321910333936HermanWHSmithPJThompsonTJA new and simple questionnaire to identify people at increased risk for undiagnosed diabetesDiabetes Care19951833823877555482KanayaAMWassel FyrCLde RekeneireNPredicting the development of diabetes in older adults: the derivation and validation of a prediction ruleDiabetes Care200528240440815677800LindströmJLouherantaAMannelinMThe Finnish Diabetes Prevention Study (DPS): lifestyle intervention and 3-year results on diet and physical activityDiabetes Care200326123230323614633807LindströmJTuomilehtoJThe diabetes risk score: a practical tool to predict type 2 diabetes riskDiabetes Care200326372573112610029TabaeiBPEngelgauMMHermanWHA multivariate logistic regression equation to screen for dysglycaemia: development and validationDiabet Med200522559960515842515Tunstall-PedoeHThe Dundee coronary risk-disk for management of change in risk factorsBMJ199130368057447471932933SchwarzPELiJLindstromJTools for predicting the risk of type 2 diabetes in daily practiceHorm Metab Res.2009412869719021089HarrisonTAHindorffLAKimHFamily history of diabetes as a potential public health toolAm J Prev Med200324215215912568821ValdezRYoonPWLiuTFamily history and prevalence of diabetes in the U.S. population: the 6-year results from the National Health and Nutrition Examination Survey (1999–2004)Diabetes Care200730102517252217634276HaririSYoonPWMoonesingheREvaluation of family history as a risk factor and screening tool for detecting undiagnosed diabetes in a nationally representative survey populationGenet Med200681275275917172938HaririSYoonPWQureshiNFamily history of type 2 diabetes: a population-based screening tool for prevention?Genet Med20068210210816481893ValdezRGreenlundKJKhouryMJIs family history a useful tool for detecting children at risk for diabetes and cardiovascular diseases? A public health perspectivePediatrics2007120suppl 2S78S8617767009PepeMSJanesHLongtonGLimitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening markerAm J Epidemiol2004159988289015105181FletcherRHFletcherSWClinical Epidemiology: The Essentials20054th edBaltimore, MDLippincott Williams & WilkinsCookNRUse and misuse of the receiver operating characteristic curve in risk predictionCirculation2007115792893517309939CookNRRidkerPMAdvances in measuring the effect of individual predictors of cardiovascular risk: the role of reclassification measuresAnn Intern Med20091501179580219487714PencinaMJD'AgostinoRBSrD'AgostinoRBJrEvaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyondStat Med2008272157172discussion 207–21217569110PepeMSFengZHuangYIntegrating the predictiveness of a marker with its performance as a classifierAm J Epidemiol2008167336236817982157VickersAJElkinEBDecision curve analysis: a novel method for evaluating prediction modelsMed Decis Making200626656557417099194American Diabetes AssociationDiagnosis and classification of diabetes mellitusDiabetes Care200932suppl 1S62S6719118289American Diabetes AssociationReport of the expert committee on the diagnosis and classification of diabetes mellitusDiabetes Care.200326suppl 1S5S2012502614KleinbaumDGLogistic Regression: A Self-Learning Text1994New York, NYSpringerHosmerDWLemeshowSApplied Logistic Regression20002nd edNew York, NYWileyBelsleyDAKuhEWelschRERegression Diagnostics: Identifying Influential Data and Sources of Collinearity1980New York, NYWileyAnnisAMCaulderMSCookMLFamily history, diabetes, and other demographic and risk factors among participants of the National Health and Nutrition Examination Survey 1999–2002 [electronic article]Prev Chronic Dis200522A1915888230HeikesKEEddyDMArondekarBDiabetes risk calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetesDiabetes Care20083151040104518070993GraubardBIKornELPredictive margins with survey dataBiometrics199955265265911318229BurnhamKPModel Selection and Multimodel Inference: A Practical Information-Theoretic Approach20022nd edNew York, NYSpringerHanleyJAMcNeilBJThe meaning and use of the area under a receiver operating characteristic (ROC) curveRadiology1982143129367063747ChengNFHanPZGanskySAMethods and software for estimating health disparities: the case of children's oral healthAm J Epidemiol2008168890691418779387RaoJNKWuCFJYueKSome recent work on resampling methods for complex surveysSurv Methodol1992182209217EfronBTibshiraniRAn Introduction to the Bootstrap1993New York, NYChapman & HallPepe MS, Feng Z, Gu JW. Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by M.J. Pencina et al, Statistics in Medicine (DOI: 10.1002/sim.2929). Stat Med. 2008;27(2):173–181VickersAJCroninAMElkinEBExtensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers [electronic article]BMC Med Inform Decis Mak200885319036144VickersAJDecision analysis for the evaluation of diagnostic tests, prediction models and molecular markersAm Stat200862431432019132141ShahBVBarnwellBGBielerGSSUDAAN User's Manual, Release 9.02005Research Triangle Park, NCResearch Triangle InstituteEddyDMSchlessingerLValidation of the Archimedes diabetes modelDiabetes Care200326113102311014578246BergAOBairdMABotkinJRNational Institutes of Health State-of-the-Science Conference statement: family history and improving healthAnn Intern Med20091511287287719884615VickersAJElkinEBSteyerbergENet reclassification improvement and decision theoryStat Med2009283525526author reply 526–52817907248MushlinAIRuchlinHSCallahanMACosteffectiveness of diagnostic testsLancet200135892901353135511684235PetittiDBMeta-analysis, Decision Analysis, and Cost-Effectiveness Analysis: Methods for Quantitative Synthesis in Medicine20002nd edNew York, NYOxford University PressSteyerbergEWVickersAJDecision curve analysis: a discussionMed Decis Making200828114614918263565Report of the Expert Committee on the Diagnosis and Classification of Diabetes MellitusExpert Committee on the Diagnosis and Classification of Diabetes MellitusDiabetes Care200326suppl 1S5S2012502614DanaeiGFriedmanABOzaSDiabetes prevalence and diagnosis in US states: analysis of health surveys [electronic article]Popul Health Metr200971619781056Centers for Disease Control and PreventionAwareness of family health history as a risk factor for disease—United States, 2004MMWR Morb Mortal Wkly Rep200453441044104715538320SuchindranSVanaAMShafferRARacial differences in the interaction between family history and risk factors associated with diabetes in the National Health and Nutritional Examination Survey, 1999–2004Genet Med200911754254719606541