Emerg Infect DiseidEmerging Infectious Diseases1080-60401080-6059Centers for Disease Control96211862640126Research ArticleAccommodating error analysis in comparison and clustering of molecular fingerprints.SalamonH.hugh@molepi.stanford.eduSegalM. R.Ponce de LeonA.SmallP. M.University of California, San Francisco, USA.Apr-Jun199842159168

Molecular epidemiologic studies of infectious diseases rely on pathogen genotype comparisons, which usually yield patterns comprising sets of DNA fragments (DNA fingerprints). We use a highly developed genotyping system, IS6110-based restriction fragment length polymorphism analysis of Mycobacterium tuberculosis, to develop a computational method that automates comparison of large numbers of fingerprints. Because error in fragment length measurements is proportional to fragment length and is positively correlated for fragments within a lane, an align-and-count method that compensates for relative scaling of lanes reliably counts matching fragments between lanes. Results of a two-step method we developed to cluster identical fingerprints agree closely with 5 years of computer-assisted visual matching among 1,335 M. tuberculosis fingerprints. Fully documented and validated methods of automated comparison and clustering will greatly expand the scope of molecular epidemiology.