Skip to main navigation Skip to search Skip to main content

Reference-Guided Chromosome-by-Chromosome de novo Assembly at Scale Using Low-Coverage High-Fidelity Long-Reads with HiFiCCL

  • Zhongjun Jiang
  • , Weihua Pan
  • , Runtian Gao
  • , Heng Hu
  • , Wentao Gao
  • , Murong Zhou
  • , Yu Hang Yin
  • , Zhipeng Qian
  • , Shuilin Jin
  • , Guohua Wang*
  • *Corresponding author for this work
  • Northeast Forestry University
  • Ministry of Agriculture of the People's Republic of China
  • School of Mathematics, Harbin Institute of Technology
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Population genomics using short-read resequencing captures single-nucleotide polymorphisms and small insertions and deletions but struggles with structural variants, leading to a loss of heritability in genome-wide association studies. In recent years, long-read sequencing has improved pangenome construction for diverse eukaryotic species, including humans, crops, and other organisms of ecological and economic importance, addressing this issue to some extent. Sufficient-coverage high-fidelity data for population genomics is often prohibitively expensive, limiting its use in large-scale populations and broader eukaryotic species and creating an urgent need for robust low-coverage assemblies. However, current assemblers underperform in such conditions. To address this, HiFiCCL is proposed, the first assembly framework specifically designed for low-coverage high-fidelity reads, using a reference-guided, chromosome-by-chromosome assembly approach. This study demonstrates that HiFiCCL improves low-coverage assembly performance of existing assemblers and outperforms the state-of-the-art assemblers on human and plant datasets. Tested on 45 human datasets (∼5× coverage), HiFiCCL combined with hifiasm reduces the length of misassembled contigs relative to hifiasm by an average of 21.19% and up to 38.58%. These improved assemblies excel in detecting large germline structural variants, minimize inter-chromosome mis-scaffolding, and improve the detection of specific germline and tumor somatic structural variants based on the pangenome graph.

Original languageEnglish
Article numbere15308
JournalAdvanced Science
Volume13
Issue number13
DOIs
StatePublished - 3 Mar 2026
Externally publishedYes

Keywords

  • chromosome-by-chromosome
  • long high-fidelity reads
  • low coverage
  • population genomics
  • reference-guided de novo assembly

Fingerprint

Dive into the research topics of 'Reference-Guided Chromosome-by-Chromosome de novo Assembly at Scale Using Low-Coverage High-Fidelity Long-Reads with HiFiCCL'. Together they form a unique fingerprint.

Cite this