TY - GEN
T1 - PERGA
T2 - 2013 4th ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, ACM-BCB 2013
AU - Zhu, Xiao
AU - Leung, Henry C.M.
AU - Chin, Francis Y.L.
AU - Yiu, Siu Ming
AU - Quan, Guangri
AU - Liu, Bo
AU - Wang, Yadong
PY - 2013
Y1 - 2013
N2 - Since the read lengths of high throughput sequencing (HTS) technologies are short, de novo assembly which plays significant roles in many applications remains a great challenge. Most of the state-of-the-art approaches base on de Bruijn graph strategy and overlap-layout strategy. However, these approaches which depend on k-mers or read overlaps do not fully utilize information of single-end and paired-end reads when resolving branches, e.g. the number and positions of reads supporting each possible extension are not taken into account when resolving branches. We present PERGA (Paired-End Reads Guided Assembler), a novel sequence-reads-guided de novo assembly approach, which adopts greedy-like prediction strategy for assembling reads to contigs and scaffolds. Instead of using single-end reads to construct contig, PERGA uses paired-end reads and different read overlap size thresholds ranging from Omax to Omin to resolve the gaps and branches. Moreover, by constructing a decision model using machine learning approach based on branch features, PERGA can determine the correct extension in 99.7% of cases. When the correct extension cannot be determined, PERGA will try to extend the contigs by all feasible extensions and determine the correct extension by using look ahead technology.
AB - Since the read lengths of high throughput sequencing (HTS) technologies are short, de novo assembly which plays significant roles in many applications remains a great challenge. Most of the state-of-the-art approaches base on de Bruijn graph strategy and overlap-layout strategy. However, these approaches which depend on k-mers or read overlaps do not fully utilize information of single-end and paired-end reads when resolving branches, e.g. the number and positions of reads supporting each possible extension are not taken into account when resolving branches. We present PERGA (Paired-End Reads Guided Assembler), a novel sequence-reads-guided de novo assembly approach, which adopts greedy-like prediction strategy for assembling reads to contigs and scaffolds. Instead of using single-end reads to construct contig, PERGA uses paired-end reads and different read overlap size thresholds ranging from Omax to Omin to resolve the gaps and branches. Moreover, by constructing a decision model using machine learning approach based on branch features, PERGA can determine the correct extension in 99.7% of cases. When the correct extension cannot be determined, PERGA will try to extend the contigs by all feasible extensions and determine the correct extension by using look ahead technology.
KW - Genome assembly
KW - Greedy-like prediction
KW - Look ahead technology
KW - Variable overlap sizes
UR - https://www.scopus.com/pages/publications/84888142004
U2 - 10.1145/2506583.2506612
DO - 10.1145/2506583.2506612
M3 - 会议稿件
AN - SCOPUS:84888142004
SN - 9781450324342
T3 - 2013 ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, ACM-BCB 2013
SP - 161
EP - 170
BT - 2013 ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, ACM-BCB 2013
Y2 - 22 September 2013 through 25 September 2013
ER -