<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="abstract"><?properties open_access?><front><journal-meta><journal-id journal-id-type="nlm-ta">Online J Public Health Inform</journal-id><journal-id journal-id-type="iso-abbrev">Online J Public Health Inform</journal-id><journal-id journal-id-type="publisher-id">OJPHI</journal-id><journal-title-group><journal-title>Online Journal of Public Health Informatics</journal-title></journal-title-group><issn pub-type="epub">1947-2579</issn><publisher><publisher-name>University of Illinois at Chicago Library</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="pmc">6088080</article-id><article-id pub-id-type="publisher-id">ojphi-10-e201</article-id><article-id pub-id-type="doi">10.5210/ojphi.v10i1.9122</article-id><article-categories><subj-group subj-group-type="heading"><subject>ISDS 2018 Conference Abstracts</subject></subj-group></article-categories><title-group><article-title>Data Quality Improvements in National Syndromic Surveillance Program
(NSSP) Data</article-title></title-group><contrib-group><contrib contrib-type="author"><name><surname>Ejigu</surname><given-names>Girum S.</given-names></name><xref ref-type="corresp" rid="cor1">*</xref></contrib><contrib contrib-type="author"><name><surname>Radhakrishnan</surname><given-names>Lakshmi</given-names></name></contrib><contrib contrib-type="author"><name><surname>McMurray</surname><given-names>Paul</given-names></name></contrib><contrib contrib-type="author"><name><surname>English</surname><given-names>Roseanne</given-names></name></contrib><aff id="aff1">Division of Health Informatics and Surveillance, Center for
Surveillance, Epidemiology, and Laboratory Services, <institution>Centers for
Disease Control and Prevention (CDC)</institution>, <addr-line>Atlanta,
GA</addr-line>, <country>USA</country></aff></contrib-group><author-notes><corresp id="cor1"><label>*</label><bold>Girum S. Ejigu</bold> E-mail: <email xlink:href="kwa7@cdc.gov">kwa7@cdc.gov</email></corresp></author-notes><pub-date pub-type="epub"><day>30</day><month>5</month><year>2018</year></pub-date><pub-date pub-type="collection"><year>2018</year></pub-date><volume>10</volume><issue>1</issue><elocation-id>e201</elocation-id><permissions><license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by-nc/3.0/"><license-p>ISDS Annual Conference Proceedings 2018. This is an Open Access
article distributed under the terms of the Creative Commons
Attribution-Noncommercial 3.0 Unported License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/3.0/">http://creativecommons.org/licenses/by-nc/3.0/</ext-link>), permitting
all non-commercial use, distribution, and reproduction in any medium,
provided the original work is properly cited.</license-p></license></permissions><kwd-group kwd-group-type="author"><title>Keywords </title><kwd>NSSP</kwd><kwd>Data Quality</kwd><kwd>Completeness</kwd><kwd>Chief Complaint</kwd><kwd>Discharge Diagnosis</kwd></kwd-group></article-meta></front><body><sec><title>Objective</title><p>Review the impact of applying regular data quality checks to assess completeness of
core data elements that support syndromic surveillance.</p></sec><sec sec-type="intro"><title>Introduction</title><p>The National Syndromic Surveillance Program (NSSP) is a community focused
collaboration among federal, state, and local public health agencies and partners
for timely exchange of syndromic data. These data, captured in nearly real time, are
intended to improve the nation&#x02019;s situational awareness and responsiveness to
hazardous events and disease outbreaks. During CDC&#x02019;s previous implementation
of a syndromic surveillance system (BioSense 2), there was a reported lack of
transparency and sharing of information on the data processing applied to data
feeds, encumbering the identification and resolution of data quality issues. The
BioSense Governance Group Data Quality Workgroup paved the way to rethink
surveillance data flow and quality. Their work and collaboration with state and
local partners led to NSSP redesigning the program&#x02019;s data flow. The new data
flow provided a ripe opportunity for NSSP analysts to study the data landscape
(e.g., capturing of HL7 messages and core data elements), assess end-to-end data
flow, and make adjustments to ensure all data being reported were processed, stored,
and made accessible to the user community. In addition, NSSP extensively documented
the new data flow, providing the transparency the community needed to better
understand the disposition of facility data. Even with a new and improved data flow,
data quality issues that were issues in the past, but went unreported, remained
issues in the new data. However, these issues were now identified. The newly
designed data flow provided opportunities to report and act on issues found in the
data unlike previous versions. Therefore, an important component of the NSSP data
flow was the implementation of regularly scheduled standard data quality checks, and
release of standard data quality reports summarizing data quality findings.</p></sec><sec sec-type="methods"><title>Methods</title><p>NSSP data was assessed for the national-level completeness of chief complaint and
discharge diagnosis data. Completeness is the rate of non- null values (Batini et
al., 2009). It was defined as the percent of visits (e.g., emergency department,
urgent care center) with a non-null value found among the one or more records
associated with the visit. National completeness rates for visits in 2016 were
compared with completeness rates of visits in 2017 (a partial year including visits
through August 2017). In addition, facility-level progress was quantified after
scoring each facility based on the percent completeness change between 2016 and
2017. Legacy data processed prior to introducing the new NSSP data flow were not
included in this assessment.</p></sec><sec sec-type="results"><title>Results</title><p>Nationally, the percent completeness of chief complaint for visits in 2016 was 82.06%
(N=58,192,721), and the percent completeness of chief complaint for visits in 2017
was 87.15% (N=80,603,991). Of the 2,646 facilities that sent visits data in 2016 and
2017, 114 (4.31%) facilities showed an increase of at least 10% in chief complaint
completeness in 2017 compared with 2016. As for discharge diagnosis, national
results showed the percent completeness of discharge diagnosis for 2016 visits was
50.83% (N=36,048,334), and the percent completeness of discharge diagnosis for 2017
was 59.23% (N=54,776,310). Of the 2,646 facilities that sent data for visits in 2016
and 2017, 306 (11.56%) facilities showed more than a 10% increase in percent
completeness of discharge diagnosis in 2017 compared with 2016.</p></sec><sec sec-type="conclusions"><title>Conclusions</title><p>The newly designed NSSP data flow provided more opportunity to identify data quality
issues. By applying data quality checks within the newly designed NSSP data flow,
data quality issues related to HL7 messages and processed data could be identified
early. Improvements in data quality were demonstrated by measuring percent
completeness of chief complaint and discharge diagnosis data in 2017 and comparing
with data from 2016. Overall, several factors helped improve data quality:
implementation of routine and targeted data quality checks; investigation of the
root cause of data quality issues; and communication of such findings by engaging
the NSSP team, sites, and vendors.</p></sec></body><back><ack><title>Acknowledgments</title><p>We thank Paula Yoon, David Walker, Michael Coletta, Alan Davis, Niketta Womack, NSSP
Partners and BioSense Governance Group.</p></ack><ref-list><title>References</title><ref id="r1"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Batini</surname><given-names>C</given-names></name><name><surname>Cappiello</surname><given-names>C</given-names></name><name><surname>Francalanci</surname><given-names>C</given-names></name><name><surname>Maurino</surname><given-names>A</given-names></name></person-group>
<year>2009</year>
<article-title>Methodologies for data quality assessment
and improvement</article-title>. <source>ACM Comput Surv</source>.
<volume>41</volume>(<issue>3</issue>), <fpage>1</fpage>-<lpage>52</lpage>.
<pub-id pub-id-type="doi">10.1145/1541880.1541883</pub-id></mixed-citation></ref></ref-list></back></article>