Advanced Search
Select up to three search categories and corresponding keywords using the fields to the right. Refer to the Help section for more detailed instructions.

Search our Collections & Repository

All these words:

For very narrow results

This exact word or phrase:

When looking for a specific result

Any of these words:

Best used for discovery & interchangable words

None of these words:

Recommended to be used in conjunction with other fields

Language:

Dates

Publication Date Range:

to

Document Data

Title:

Document Type:

Library

Collection:

Series:

People

Author:

Help
Clear All

Query Builder

Query box

Help
Clear All

For additional assistance using the Custom Query please check out our Help Page

i

U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs

Filetype[PDF-652.84 KB]


  • English

  • Details:

    • Alternative Title:
      J Comput Biol
    • Description:
      Advances in next-generation sequencing technologies enable routine genome sequencing, generating millions of short reads. A crucial step for full genome analysis is the de novo assembly, and currently, performance of different assembly methods is measured by a metric called N|. However, the N| value can produce skewed, inaccurate results when complex data are analyzed, especially for viral and microbial datasets. To provide a better assessment of assembly output, we developed a new metric called U|. The U| identifies unique, target-specific contigs by using a reference genome as baseline, aiming at circumventing some limitations that are inherent to the N| metric. Specifically, the U| program removes overlapping sequence of multiple contigs by utilizing a mask array, so the performance of the assembly is only measured by unique contigs. We compared simulated and real datasets by using U| and N|, and our results demonstrated that U| has the following advantages over N|: (1) reducing erroneously large N| values due to a poor assembly, (2) eliminating overinflated N| values caused by large measurements from overlapping contigs, (3) eliminating diminished N| values caused by an abundance of small contigs, and (4) allowing comparisons across different platforms or samples based on the new percentage-based metric UG|%. The use of the U| metric allows for a more accurate measure of assembly performance by analyzing only the unique, non-overlapping contigs. In addition, most viral and microbial sequencing have high background noise (i.e., host and other non-targets), which contributes to having a skewed, misrepresented N| value-this is corrected by U|. Also, the UG|% can be used to compare assembly results from different samples or studies, the cross-comparisons of which cannot be performed with N|.
    • Pubmed ID:
      28418726
    • Pubmed Central ID:
      PMC5783553
    • Document Type:
    • Collection(s):
    • Main Document Checksum:
    • File Type:

    You May Also Like

    Checkout today's featured content at stacks.cdc.gov