Skip to main navigation Skip to search Skip to main content

PERGA: A paired-end read guided de novo assembler for extending contigs using SVM approach

  • Xiao Zhu
  • , Henry C.M. Leung
  • , Francis Y.L. Chin
  • , Siu Ming Yiu
  • , Guangri Quan
  • , Bo Liu
  • , Yadong Wang*
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology
  • The University of Hong Kong
  • Harbin Institute of Technology Weihai

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Since the read lengths of high throughput sequencing (HTS) technologies are short, de novo assembly which plays significant roles in many applications remains a great challenge. Most of the state-of-the-art approaches base on de Bruijn graph strategy and overlap-layout strategy. However, these approaches which depend on k-mers or read overlaps do not fully utilize information of single-end and paired-end reads when resolving branches, e.g. the number and positions of reads supporting each possible extension are not taken into account when resolving branches. We present PERGA (Paired-End Reads Guided Assembler), a novel sequence-reads-guided de novo assembly approach, which adopts greedy-like prediction strategy for assembling reads to contigs and scaffolds. Instead of using single-end reads to construct contig, PERGA uses paired-end reads and different read overlap size thresholds ranging from Omax to Omin to resolve the gaps and branches. Moreover, by constructing a decision model using machine learning approach based on branch features, PERGA can determine the correct extension in 99.7% of cases. When the correct extension cannot be determined, PERGA will try to extend the contigs by all feasible extensions and determine the correct extension by using look ahead technology.

Original languageEnglish
Title of host publication2013 ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, ACM-BCB 2013
Pages161-170
Number of pages10
DOIs
StatePublished - 2013
Externally publishedYes
Event2013 4th ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, ACM-BCB 2013 - Wshington, DC, United States
Duration: 22 Sep 201325 Sep 2013

Publication series

Name2013 ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, ACM-BCB 2013

Conference

Conference2013 4th ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, ACM-BCB 2013
Country/TerritoryUnited States
CityWshington, DC
Period22/09/1325/09/13

Keywords

  • Genome assembly
  • Greedy-like prediction
  • Look ahead technology
  • Variable overlap sizes

Fingerprint

Dive into the research topics of 'PERGA: A paired-end read guided de novo assembler for extending contigs using SVM approach'. Together they form a unique fingerprint.

Cite this