Data-driven approach for creating synthetic electronic medical records
Advanced Search
Select up to three search categories and corresponding keywords using the fields to the right. Refer to the Help section for more detailed instructions.

Search our Collections & Repository

For very narrow results

When looking for a specific result

Best used for discovery & interchangable words

Recommended to be used in conjunction with other fields

Dates
...

to

...
Document Data
Library
People
Clear All
...
Clear All

For additional assistance using the Custom Query please check out our Help Page

CDC STACKS serves as an archival repository of CDC-published products including scientific findings, journal articles, guidelines, recommendations, or other public health information authored or co-authored by CDC or funded partners. As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.
i

Data-driven approach for creating synthetic electronic medical records



English

Details:

  • Alternative Title:
    BMC Med Inform Decis Mak
  • Personal Author:
  • Description:
    Background

    New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed.

    Methods

    This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population.

    Results

    We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified.

    Conclusions

    A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4-11 year old age group. The adaptations that must be made to the algorithms to produce synthetic background EMRs for other age groups are indicated.

  • Subjects:
  • Source:
  • Pubmed ID:
    20946670
  • Pubmed Central ID:
    PMC2972239
  • Document Type:
  • Funding:
  • Volume:
    10
  • Collection(s):
  • Main Document Checksum:
  • Download URL:
  • File Type:
    Filetype[PDF-3.16 MB]

You May Also Like

Checkout today's featured content at stacks.cdc.gov