TY - GEN
T1 - A Data-Driven Study of Prediction Methods for Coronary Heart Disease
AU - He, Xu
AU - Fan, Xindi
AU - Zheng, Wanxi
AU - Ti, Ziming
AU - Li, Chunshan
AU - Zhang, Hua
AU - Zhou, Xuequan
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2023
Y1 - 2023
N2 - Coronary heart disease (CHD) is a globally recognised, highly prevalent disease with a high risk of death and a low cure rate. The World Health Organization estimates that deaths from heart disease will reach 23 million by 2030. Therefore, it is imperative to find a fast and effective method for early diagnosis in order to provide patients with early intervention and improve the effectiveness of treatment. With the in-depth development of machine learning, the function of data analysis and prediction will efficiently help doctors to make a preliminary cluster for a large number of people and detect those who have a dangerous rate of developing coronary heart disease. In this paper, three data pre-processing methods, Smote, Borderline Smote and K-means Smote, were used to construct a risk prediction model for coronary heart disease (CHD) based on an unbalanced data set, combined with four algorithms, Logistic Regression, Random Forest, KNN and SVM. After analysing the data characteristics and adjusting the parameters, different combinations of these methods were compared and a better classification method was selected to predict CHD, achieving higher accuracy, precision, AUC and f1 score. Overall, through experiments, the random oversampling and SMOTE methods can effectively solve the data imbalance problem in most cases.Our final training accuracy could be up to 99%, and the testing accuracy could reach 93%.
AB - Coronary heart disease (CHD) is a globally recognised, highly prevalent disease with a high risk of death and a low cure rate. The World Health Organization estimates that deaths from heart disease will reach 23 million by 2030. Therefore, it is imperative to find a fast and effective method for early diagnosis in order to provide patients with early intervention and improve the effectiveness of treatment. With the in-depth development of machine learning, the function of data analysis and prediction will efficiently help doctors to make a preliminary cluster for a large number of people and detect those who have a dangerous rate of developing coronary heart disease. In this paper, three data pre-processing methods, Smote, Borderline Smote and K-means Smote, were used to construct a risk prediction model for coronary heart disease (CHD) based on an unbalanced data set, combined with four algorithms, Logistic Regression, Random Forest, KNN and SVM. After analysing the data characteristics and adjusting the parameters, different combinations of these methods were compared and a better classification method was selected to predict CHD, achieving higher accuracy, precision, AUC and f1 score. Overall, through experiments, the random oversampling and SMOTE methods can effectively solve the data imbalance problem in most cases.Our final training accuracy could be up to 99%, and the testing accuracy could reach 93%.
KW - Random Forest
KW - SMOTE
KW - SVM
KW - machine learning
UR - https://www.scopus.com/pages/publications/85172688701
U2 - 10.1007/978-981-99-4402-6_32
DO - 10.1007/978-981-99-4402-6_32
M3 - 会议稿件
AN - SCOPUS:85172688701
SN - 9789819944019
T3 - Communications in Computer and Information Science
SP - 447
EP - 459
BT - Service Science - CCF 16th International Conference, ICSS 2023, Revised Selected Papers
A2 - Wang, Zhongjie
A2 - Xu, Hanchuan
A2 - Wang, Shangguang
PB - Springer Science and Business Media Deutschland GmbH
T2 - 16th International Conference on Service Science, ICSS 2023
Y2 - 13 May 2023 through 14 May 2023
ER -