Using Synthetic Data to Replace Linkage Derived Elements: A Case Study
Advanced Search
Select up to three search categories and corresponding keywords using the fields to the right. Refer to the Help section for more detailed instructions.

Search our Collections & Repository

For very narrow results

When looking for a specific result

Best used for discovery & interchangable words

Recommended to be used in conjunction with other fields

Dates

to

Document Data
Library
People
Clear All
Clear All

For additional assistance using the Custom Query please check out our Help Page

i

Using Synthetic Data to Replace Linkage Derived Elements: A Case Study

Filetype[PDF-827.70 KB]


English

Details:

  • Alternative Title:
    Health Serv Outcomes Res Methodol
  • Personal Author:
  • Description:
    While record linkage can expand analyses performable from survey microdata, it also incurs greater risk of privacy-encroaching disclosure. One way to mitigate this risk is to replace some of the information added through linkage with synthetic data elements. This paper describes a case study using the National Hospital Care Survey (NHCS), which collects patient records under a pledge of protecting patient privacy from a sample of U.S. hospitals for statistical analysis purposes. The NHCS data were linked to the National Death Index (NDI) to enhance the survey with mortality information. The added information from NDI linkage enables survival analyses related to hospitalization, but as the death information includes dates of death and detailed causes of death, having it joined with the patient records increases the risk of patient re-identification (albeit only for deceased persons). For this reason, an approach was tested to develop synthetic data that uses models from survival analysis to replace vital status and actual dates-of-death with synthetic values and uses classification tree analysis to replace actual causes of death with synthesized causes of death. The degree to which analyses performed on the synthetic data replicate results from analysis on the actual data is measured by comparing survival analysis parameter estimates from both data files. Because synthetic data only have value to the degree that they can be used to produce statistical estimates that are like those based on the actual data, this evaluation is an essential first step in assessing the potential utility of synthetic mortality data.
  • Subjects:
  • Source:
  • Pubmed ID:
    34737669
  • Pubmed Central ID:
    PMC8563018
  • Document Type:
  • Funding:
  • Volume:
    21
  • Collection(s):
  • Main Document Checksum:
  • Download URL:
  • File Type:

You May Also Like

Checkout today's featured content at stacks.cdc.gov