Using electronic health records to identify candidates for human immunodeficiency virus pre-exposure prophylaxis: An application of super learning to risk prediction when the outcome is rare
Supporting Files
-
June 24 2020
-
File Language:
English
Details
-
Alternative Title:Stat Med
-
Personal Author:
-
Description:Methods:
Data consisted of 180 covariates (demographic, diagnoses, treatments, prescriptions) extracted from records on 399 385 patient (150 cases) seen at Atrius Health (2007–2015), a clinical network in Massachusetts. Super learner is an ensemble machine learning algorithm that uses k-fold cross validation to evaluate and combine predictions from a collection of algorithms. We trained 42 variants of sophisticated algorithms, using different sampling schemes that more evenly balanced the ratio of cases to controls. We compared super learner’s cross validated area under the receiver operating curve (cv-AUC) with that of each individual algorithm.
Results:
The least absolute shrinkage and selection operator (lasso) using a 1:20 class ratio outperformed the super learner (cv-AUC = 0.86 vs 0.84). A traditional logistic regression model restricted to 23 clinician-selected main terms was slightly inferior (cv-AUC = 0.81).
Conclusion:
Machine learning was successful at developing a model to predict 1-year risk of acquiring HIV based on a physician-curated set of predictors extracted from EHRs.
-
Subjects:
-
Source:Stat Med. 39(23):3059-3073
-
Pubmed ID:32578905
-
Pubmed Central ID:PMC7646998
-
Document Type:
-
Funding:U54GM11567/Rhode Island IDeA-CTR/ ; P30AI042853/Providence/Boston Center for AIDS Research/ ; P30 AI060354/AI/NIAID NIH HHS/United States ; U54 GM115677/GM/NIGMS NIH HHS/United States ; SSuN, CDC-RFA-PS13-1306/CC/CDC HHS/United States ; K23 MH098795/MH/NIMH NIH HHS/United States ; P30 AI042853/AI/NIAID NIH HHS/United States
-
Volume:39
-
Issue:23
-
Collection(s):
-
Main Document Checksum:urn:sha256:fcb1ddac8cc0e16e63dd7baf2c3ab4ef504f930f143d51046725c2278e96e858
-
Download URL:
-
File Type:
Supporting Files
File Language:
English
ON THIS PAGE
CDC STACKS serves as an archival repository of CDC-published products including
scientific findings,
journal articles, guidelines, recommendations, or other public health information authored or
co-authored by CDC or funded partners.
As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.
As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.
You May Also Like
COLLECTION
CDC Public Access