U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs
Advanced Search
Select up to three search categories and corresponding keywords using the fields to the right. Refer to the Help section for more detailed instructions.
Clear All

U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs

Filetype[PDF-652.84 KB]

  • English

  • Details:

    • Alternative Title:
      J Comput Biol
    • Description:
      Advances in next-generation sequencing technologies enable routine genome sequencing, generating millions of short reads. A crucial step for full genome analysis is the de novo assembly, and currently, performance of different assembly methods is measured by a metric called N|. However, the N| value can produce skewed, inaccurate results when complex data are analyzed, especially for viral and microbial datasets. To provide a better assessment of assembly output, we developed a new metric called U|. The U| identifies unique, target-specific contigs by using a reference genome as baseline, aiming at circumventing some limitations that are inherent to the N| metric. Specifically, the U| program removes overlapping sequence of multiple contigs by utilizing a mask array, so the performance of the assembly is only measured by unique contigs. We compared simulated and real datasets by using U| and N|, and our results demonstrated that U| has the following advantages over N|: (1) reducing erroneously large N| values due to a poor assembly, (2) eliminating overinflated N| values caused by large measurements from overlapping contigs, (3) eliminating diminished N| values caused by an abundance of small contigs, and (4) allowing comparisons across different platforms or samples based on the new percentage-based metric UG|%. The use of the U| metric allows for a more accurate measure of assembly performance by analyzing only the unique, non-overlapping contigs. In addition, most viral and microbial sequencing have high background noise (i.e., host and other non-targets), which contributes to having a skewed, misrepresented N| value-this is corrected by U|. Also, the UG|% can be used to compare assembly results from different samples or studies, the cross-comparisons of which cannot be performed with N|.
    • Source:
      J Comput Biol. 24(11):1071-1080
    • Pubmed ID:
    • Pubmed Central ID:
    • Document Type:
    • Collection(s):
    • Main Document Checksum:
    • File Type:

    You May Also Like

    Checkout today's featured content at stacks.cdc.gov