U.S. flag An official website of the United States government.
Official websites use .gov

A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

i

A machine learning model for predicting congenital heart defects from administrative data

Supporting Files
File Language:
English


Details

  • Alternative Title:
    Birth Defects Res
  • Personal Author:
  • Description:
    Introduction:

    International Classification of Diseases (ICD) codes recorded in administrative data are often used to identify congenital heart defects (CHD). However, these codes may inaccurately identify true positive (TP) CHD individuals. CHD surveillance could be strengthened by accurate CHD identification in administrative records using machine learning (ML) algorithms.

    Methods:

    To identify features relevant to accurate CHD identification, traditional ML models were applied to a validated dataset of 779 patients; encounter level data, including ICD-9-CM and CPT codes, from 2011 to 2013 at four US sites were utilized. Five-fold cross-validation determined overlapping important features that best predicted TP CHD individuals. Median values and 95% confidence intervals (CIs) of area under the receiver operating curve, positive predictive value (PPV), negative predictive value, sensitivity, specificity, and F1-score were compared across four ML models: Logistic Regression, Gaussian Naive Bayes, Random Forest, and eXtreme Gradient Boosting (XGBoost).

    Results:

    Baseline PPV was 76.5% from expert clinician validation of ICD-9-CM CHD-related codes. Feature selection for ML decreased 7138 features to 10 that best predicted TP CHD cases. During training and testing, XGBoost performed the best in median accuracy (F1-score) and PPV, 0.84 (95% CI: 0.76, 0.91) and 0.94 (95% CI: 0.91, 0.96), respectively. When applied to the entire dataset, XGBoost revealed a median PPV of 0.94 (95% CI: 0.94, 0.95).

    Conclusions:

    Applying ML algorithms improved the accuracy of identifying TP CHD cases in comparison to ICD codes alone. Use of this technique to identify CHD cases would improve generalizability of results obtained from large datasets to the CHD patient population, enhancing public health surveillance efforts.

  • Subjects:
  • Keywords:
  • Source:
    Birth Defects Res. 115(18):1693-1707
  • Pubmed ID:
    37681293
  • Pubmed Central ID:
    PMC10841295
  • Document Type:
  • Funding:
  • Volume:
    115
  • Issue:
    18
  • Collection(s):
  • Main Document Checksum:
    urn:sha-512:9de98ef784a552559fd9923b78436e5241eb49a427cdfced59d7eb06d0d97dff2f8f91c69a75d382e157e8849a95151721ecaaa18406c1027da933c35e6f5f3a
  • Download URL:
  • File Type:
    Filetype[PDF - 1.36 MB ]
File Language:
English
ON THIS PAGE

CDC STACKS serves as an archival repository of CDC-published products including scientific findings, journal articles, guidelines, recommendations, or other public health information authored or co-authored by CDC or funded partners.

As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.