CDC STACKS serves as an archival repository of CDC-published products including scientific findings, journal articles, guidelines, recommendations, or other public health information authored or co-authored by CDC or funded partners.
As a repository, CDC STACKS retains documents in their original published format to ensure public access to scientific information.
i
Epidemiologic Utility of a Framework for Partition Number Selection When Dissecting Hierarchically Clustered Genetic Data Evaluated on the Intestinal Parasite Cyclospora Cayetanensis
-
5 05 2023
-
Source: Am J Epidemiol. 192(5):772-781
Details:
-
Alternative Title:Am J Epidemiol
-
Personal Author:
-
Description:Comparing parasite genotypes to inform parasitic disease outbreak investigations involves computation of genetic distances that are typically analyzed by hierarchical clustering to identify related isolates, indicating a common source. A limitation of hierarchical clustering is that hierarchical clusters are not discrete; they are nested. Consequently, small groups of similar isolates exist within larger groups that get progressively larger as relationships become increasingly distant. Investigators must dissect hierarchical trees at a partition number ensuring grouped isolates belong to the same strain; a process typically performed subjectively, introducing bias into resultant groupings. We describe an unbiased, probabilistic framework for partition number selection that ensures partitions comprise isolates that are statistically likely to belong to the same strain. We computed distances and established a normalized distribution of background distances that we used to demarcate a threshold below which the closeness of relationships is unlikely to be random. Distances are hierarchically clustered and the dendrogram dissected at a partition number where most within-partition distances fall below the threshold. We evaluated this framework by partitioning 1,137 clustered Cyclospora cayetanensis genotypes, including 552 isolates epidemiologically linked to various outbreaks. The framework was 91% sensitive and 100% specific in assigning epidemiologically linked isolates to the same partition.
-
Subjects:
-
Source:
-
Pubmed ID:36617302
-
Pubmed Central ID:PMC10165878
-
Document Type:
-
Funding:
-
Volume:192
-
Issue:5
-
Collection(s):
-
Main Document Checksum:
-
Download URL:
-
File Type: