Back to Previous Page

A UMLS-based spell checker for natural language processing in vaccine safety

Feb 12 2007

By Tolentino, Herman D. ; Matters, Michael D. ; Walop, Wikke ; ...

Source: BMC Med Inform Decis Mak. 2007; 7:3.

[PDF-519.14 KB]

English

Details Supporting Files You May Also Like

Details:

Alternative Title:

BMC Med Inform Decis Mak

Personal Author:

Tolentino, Herman D. ; Matters, Michael D. ; Walop, Wikke ; Law, Barbara ; Tong, Wesley ; ... More +

Description:

Background

The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports.

Methods

We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction.

Results

We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively.

Conclusion

We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest.

Subjects:

[+]

Source:

BMC Med Inform Decis Mak. 2007; 7:3.

Document Type:

Journal Article

Collection(s):

CDC Public Access

Main Document Checksum:

[+]

Download URL:

https://stacks.cdc.gov/view/cdc/3476/cdc_3476_DS1.pdf

File Type:

	1472-6947-7-3-S11.gz	gz
	1472-6947-7-3-S12.gz	gz
	1472-6947-7-3-S13.gz	gz
	1472-6947-7-3-S14.gz	gz
	1472-6947-7-3-S15.gz	gz
	1472-6947-7-3-S16.gz	gz
	1472-6947-7-3-S17.gz	gz
	1472-6947-7-3-S18.gz	gz
	1472-6947-7-3-S2.zip	zip
	1472-6947-7-3-S3.txt	txt
	1472-6947-7-3-1.gif	gif
	1472-6947-7-3-S4.gz	gz
	1472-6947-7-3-S5.gz	gz
	1472-6947-7-3-S6.gz	gz
	1472-6947-7-3-S7.php	txt
	1472-6947-7-3-S8.gz	gz
	1472-6947-7-3-S9.gz	gz
	1472-6947-7-3.nxml	txt
	license.txt	txt
	1472-6947-7-3-1.jpg	jpeg
	1472-6947-7-3-2.gif	gif
	1472-6947-7-3-2.jpg	jpeg
	1472-6947-7-3-3.gif	gif
	1472-6947-7-3-3.jpg	jpeg
	1472-6947-7-3-S1.doc	doc
	1472-6947-7-3-S10.gz	gz

More +

A UMLS-based spell checker for natural language processing in vaccine safety

Details:

Supporting Files

You May Also Like

Have Questions?

CDC INFORMATION

CONNECT WITH CDC