Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques

Xie, Zidian; Nikolayeva, Olga; Luo, Jiebo; Li, Dongmei

i

Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques

Supporting Files Public Domain

September 19 2019
By Xie, Zidian ; Nikolayeva, Olga ; Luo, Jiebo ; ...

File Language:

English

Details

Journal Article:

Preventing Chronic Disease (PCD)
Personal Author:

Xie, Zidian ; Nikolayeva, Olga ; Luo, Jiebo ; Li, Dongmei
Description:

Introduction

As one of the most prevalent chronic diseases in the United States, diabetes, especially type 2 diabetes, affects the health of millions of people and puts an enormous financial burden on the US economy. We aimed to develop predictive models to identify risk factors for type 2 diabetes, which could help facilitate early diagnosis and intervention and also reduce medical costs.

Methods

We analyzed cross-sectional data on 138,146 participants, including 20,467 with type 2 diabetes, from the 2014 Behavioral Risk Factor Surveillance System. We built several machine learning models for predicting type 2 diabetes, including support vector machine, decision tree, logistic regression, random forest, neural network, and Gaussian Naive Bayes classifiers. We used univariable and multivariable weighted logistic regression models to investigate the associations of potential risk factors with type 2 diabetes.

Results

All predictive models for type 2 diabetes achieved a high area under the curve (AUC), ranging from 0.7182 to 0.7949. Although the neural network model had the highest accuracy (82.4%), specificity (90.2%), and AUC (0.7949), the decision tree model had the highest sensitivity (51.6%) for type 2 diabetes. We found that people who slept 9 or more hours per day (adjusted odds ratio [aOR] = 1.13, 95% confidence interval [CI], 1.03–1.25) or had checkup frequency of less than 1 year (aOR = 2.31, 95% CI, 1.86–2.85) had higher risk for type 2 diabetes.

Conclusion

Of the 8 predictive models, the neural network model gave the best model performance with the highest AUC value; however, the decision tree model is preferred for initial screening for type 2 diabetes because it had the highest sensitivity and, therefore, detection rate. We confirmed previously reported risk factors and also identified sleeping time and frequency of checkup as 2 new potential risk factors related to type 2 diabetes.
Subjects:

Original Research
Source:

Prev Chronic Dis. 16
ISSN:

1545-1151
Pubmed ID:

31538566
Pubmed Central ID:

PMC6795062
Document Type:

Journal Article
Volume:

16
Collection(s):

Preventing Chronic Disease
Main Document Checksum:

urn:sha-512:38c05fa0eabfff97cf32acef3f05c5bea8fe9f9e0249acb7fdcd91fc51741ba5eea2374adebb22c91a454894bdf5bc976a6e7ad8722325e1760d04f47306ce4c
Download URL:

https://stacks.cdc.gov/view/cdc/83070/cdc_83070_DS1.pdf
File Type:

[PDF - 278.12 KB ]

PCD-16-E130.nxml

Download xml

File Language:

English

ON THIS PAGE

Details Supporting Files

CDC STACKS serves as an archival repository of CDC-published products including scientific findings, journal articles, guidelines, recommendations, or other public health information authored or co-authored by CDC or funded partners.

As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.