Skip to main navigation Skip to search Skip to main content

TideHunter: Efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain

  • Harbin Institute of Technology
  • The Children’s Hospital of Philadelphia
  • University of Pennsylvania

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencing technologies can produce long-reads up to tens of kilobases, but with high error rates. In order to reduce sequencing error, Rolling Circle Amplification (RCA) has been used to improve library preparation by amplifying circularized template molecules. Linear products of the RCA contain multiple tandem copies of the template molecule. By integrating additional in silico processing steps, these tandem sequences can be collapsed into a consensus sequence with a higher accuracy than the original raw reads. Existing pipelines using alignment-based methods to discover the tandem repeat patterns from the long-reads are either inefficient or lack sensitivity. Results: We present a novel tandem repeat detection and consensus calling tool, TideHunter, to efficiently discover tandem repeat patterns and generate high-quality consensus sequences from amplified tandemly repeated long-read sequencing data. TideHunter works with noisy long-reads (PacBio and ONT) at error rates of up to 20% and does not have any limitation of the maximal repeat pattern size. We benchmarked TideHunter using simulated and real datasets with varying error rates and repeat pattern sizes. TideHunter is tens of times faster than state-of-the-art methods and has a higher sensitivity and accuracy.

Original languageEnglish
Article numberbtz376
Pages (from-to)i200-i207
JournalBioinformatics
Volume35
Issue number14
DOIs
StatePublished - 15 Jul 2019

Fingerprint

Dive into the research topics of 'TideHunter: Efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain'. Together they form a unique fingerprint.

Cite this