Using supervised machine learning to identify efficient blocking schemes for record linkage
Supporting Files
-
Jun 03 2021
-
File Language:
English
Details
-
Alternative Title:Stat J IAOS
-
Personal Author:
-
Description:Record linkage enables survey data to be integrated with other data sources, expanding the analytic potential of both sources. However, depending on the number of records being linked, the processing time can be prohibitive. This paper describes a case study using a supervised machine learning algorithm, known as the Sequential Coverage Algorithm (SCA). The SCA was used to develop the join strategy for two data sources, the National Center for Health Statistics' (NCHS) 2016 National Hospital Care Survey (NHCS) and the Center for Medicare & Medicaid Services (CMS) Enrollment Database (EDB), during record linkage. Due to the size of the CMS data, common record joining methods (i.e. blocking) were used to reduce the number of pairs that need to be evaluated to identify the vast majority of matches. NCHS conducted a case study examining how the SCA improved the efficiency of blocking. This paper describes how the SCA was used to design the blocking used in this linkage.
-
Subjects:
-
Source:Stat J IAOS. 37(2):673-680
-
Pubmed ID:34413910
-
Pubmed Central ID:PMC8371678
-
Document Type:
-
Funding:
-
Volume:37
-
Issue:2
-
Collection(s):
-
Main Document Checksum:urn:sha256:9c816338501b3228627173e59d9d81933243d638fbe869da9d2179f52ac10fc7
-
Download URL:
-
File Type:
Supporting Files
File Language:
English
ON THIS PAGE
CDC STACKS serves as an archival repository of CDC-published products including
scientific findings,
journal articles, guidelines, recommendations, or other public health information authored or
co-authored by CDC or funded partners.
As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.
As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.
You May Also Like
COLLECTION
CDC Public Access