Skip to main navigation Skip to search Skip to main content

A Diabetes Prediction System Based on Incomplete Fused Data Sources

  • Zhaoyi Yuan
  • , Hao Ding
  • , Guoqing Chao*
  • , Mingqiang Song
  • , Lei Wang
  • , Weiping Ding
  • , Dianhui Chu
  • *Corresponding author for this work
  • School of Computer Science and Technology (School of Software), Harbin Institute of Technology Weihai
  • Shandong University
  • CAS - Suzhou Institute of Biomedical Engineering and Technology
  • Nantong University

Research output: Contribution to journalArticlepeer-review

Abstract

In recent years, the diabetes population has grown younger. Therefore, it has become a key problem to make a timely and effective prediction of diabetes, especially given a single data source. Meanwhile, there are many data sources of diabetes patients collected around the world, and it is extremely important to integrate these heterogeneous data sources to accurately predict diabetes. For the different data sources used to predict diabetes, the predictors may be different. In other words, some special features exist only in certain data sources, which leads to the problem of missing values. Considering the uncertainty of the missing values within the fused dataset, multiple imputation and a method based on graph representation is used to impute the missing values within the fused dataset. The logistic regression model and stacking strategy are applied for diabetes training and prediction on the fused dataset. It is proved that the idea of combining heterogeneous datasets and imputing the missing values produced in the fusion process can effectively improve the performance of diabetes prediction. In addition, the proposed diabetes prediction method can be further extended to any scenarios where heterogeneous datasets with the same label types and different feature attributes exist.

Original languageEnglish
Pages (from-to)384-399
Number of pages16
JournalMachine Learning and Knowledge Extraction
Volume5
Issue number2
DOIs
StatePublished - Jun 2023
Externally publishedYes

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • data sources fusion
  • diabetes prediction
  • ensemble learning
  • graph representation learning
  • missing values imputation

Fingerprint

Dive into the research topics of 'A Diabetes Prediction System Based on Incomplete Fused Data Sources'. Together they form a unique fingerprint.

Cite this