Skip to main navigation Skip to search Skip to main content

CHTKC: A robust and efficient k-mer counting algorithm based on a lock-free chaining hash table

  • Jianan Wang
  • , Su Chen
  • , Lili Dong
  • , Guohua Wang*
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology
  • Northeast Forestry University

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation: Calculating the frequency of occurrence of each substring of length k in DNA sequences is a common task in many bioinformatics applications, including genome assembly, error correction, and sequence alignment. Although the problem is simple, efficient counting of datasets with high sequencing depth or large genome size is a challenge. Results: We propose a robust and efficient method, CHTKC, to solve the k-mer counting problem with a lock-free hash table that uses linked lists to resolve collisions. We also design new mechanisms to optimize memory usage and handle situations where memory is not enough to accommodate all k-mers. CHTKC has been thoroughly tested on seven datasets under multiple memory usage scenarios and compared with Jellyfish2 and KMC3. Our work shows that using a hash-table-based method to effectively solve the k-mer counting problem remains a feasible solution.

Original languageEnglish
Article numberbbaa063
JournalBriefings in Bioinformatics
Volume22
Issue number3
DOIs
StatePublished - 1 May 2021
Externally publishedYes

Keywords

  • DNA-seq
  • algorithm
  • assembly
  • hash table
  • k-mer counting
  • sequence analysis

Fingerprint

Dive into the research topics of 'CHTKC: A robust and efficient k-mer counting algorithm based on a lock-free chaining hash table'. Together they form a unique fingerprint.

Cite this