Skip to main navigation Skip to search Skip to main content

LAK: Lasso and K-Means Based Single-Cell RNA-Seq Data Clustering Analysis

  • Jiao Hua
  • , Hongkun Liu
  • , Boyang Zhang
  • , Shuilin Jin*
  • *Corresponding author for this work
  • School of Mathematics, Harbin Institute of Technology
  • Ocean University of China

Research output: Contribution to journalArticlepeer-review

Abstract

The single-cell RNA sequencing provides a way to obtain marker genes of different cells, which lays the foundation for discovering new cell types. The general strategy of achieving this goal is to build a clustering pipeline and derive differentially expressed genes, followed by the cell type enrichment analysis and driving force analysis. Throughout the entire analysis process, clustering models and appropriate methods of dimension reduction are two vital and challenging tasks. In this study, we present a novel method LAK (a computational pipeline for single-cell RNA-seq data clustering analysis using Lasso and K-means based feature selection method) that can be applied to single-cell RNA-seq data by selecting the candidate genes. To deal with the sparse high-dimensional data, we integrated Lasso penalty into clustering method for single-cell RNA-seq data as the feature selection method, which extracts out the genes that have an actual effect on clustering. We also improved the parameter selection algorithm to search the appropriate parameters automatically by binary search according to the size of the data. Compared with other computational approaches, LAK obtains a better performance in reliability, stability, convenience and accuracy applied to the real datasets, the simulation data, and the datasets with a large number of dropout events.

Original languageEnglish
Article number9143102
Pages (from-to)129679-129688
Number of pages10
JournalIEEE Access
Volume8
DOIs
StatePublished - 2020
Externally publishedYes

Keywords

  • Clustering analysis
  • Lasso
  • single-cell RNA-seq data

Fingerprint

Dive into the research topics of 'LAK: Lasso and K-Means Based Single-Cell RNA-Seq Data Clustering Analysis'. Together they form a unique fingerprint.

Cite this