U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs
Advanced Search
Select up to three search categories and corresponding keywords using the fields to the right. Refer to the Help section for more detailed instructions.

Search our Collections & Repository

For very narrow results

When looking for a specific result

Best used for discovery & interchangable words

Recommended to be used in conjunction with other fields

Dates

to

Document Data
Library
People
Clear All
Clear All

For additional assistance using the Custom Query please check out our Help Page

i

U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs

Filetype[PDF-652.84 KB]


English

Details:

  • Alternative Title:
    J Comput Biol
  • Personal Author:
  • Description:
    Advances in next-generation sequencing technologies enable routine genome sequencing, generating millions of short reads. A crucial step for full genome analysis is the de novo assembly, and currently, performance of different assembly methods is measured by a metric called N|. However, the N| value can produce skewed, inaccurate results when complex data are analyzed, especially for viral and microbial datasets. To provide a better assessment of assembly output, we developed a new metric called U|. The U| identifies unique, target-specific contigs by using a reference genome as baseline, aiming at circumventing some limitations that are inherent to the N| metric. Specifically, the U| program removes overlapping sequence of multiple contigs by utilizing a mask array, so the performance of the assembly is only measured by unique contigs. We compared simulated and real datasets by using U| and N|, and our results demonstrated that U| has the following advantages over N|: (1) reducing erroneously large N| values due to a poor assembly, (2) eliminating overinflated N| values caused by large measurements from overlapping contigs, (3) eliminating diminished N| values caused by an abundance of small contigs, and (4) allowing comparisons across different platforms or samples based on the new percentage-based metric UG|%. The use of the U| metric allows for a more accurate measure of assembly performance by analyzing only the unique, non-overlapping contigs. In addition, most viral and microbial sequencing have high background noise (i.e., host and other non-targets), which contributes to having a skewed, misrepresented N| value-this is corrected by U|. Also, the UG|% can be used to compare assembly results from different samples or studies, the cross-comparisons of which cannot be performed with N|.
  • Subjects:
  • Source:
  • Pubmed ID:
    28418726
  • Pubmed Central ID:
    PMC5783553
  • Document Type:
  • Funding:
  • Collection(s):
  • Main Document Checksum:
  • Download URL:
  • File Type:

You May Also Like

Checkout today's featured content at stacks.cdc.gov