Skip to main navigation Skip to search Skip to main content

A data-driven framework for planning-oriented decision support in integrated watershed management: insights from machine learning in Northern China

  • Mei Yun Lu
  • , Jie Ding*
  • , Xin Lei Yu
  • , Yi Lin Zhao
  • , Ji Wei Pang
  • , Yan Li
  • , Shao Nan Shi
  • , Nan Qi Ren
  • , Shan Shan Yang
  • *Corresponding author for this work
  • School of Environment, Harbin Institute of Technology
  • Harbin Corner Science & Technology Inc.
  • Yancheng Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Spatial heterogeneity and anthropogenic disparities contribute to varying pollution challenges across global water bodies, highlighting the importance of understanding regional patterns and key pollution issues to support watershed management strategies tailored to local conditions. However, current research on watershed management decision-making is limited, with insufficient emphasis on the relationship between regional characteristics and pollution issues. To address the gap, this study developed a hybrid framework for identifying regional patterns and key pollution issues by coupling multiple machine learning models. This framework integrates K-means clustering with the extreme gradient boosting (XGBoost) classification model and employs shapley additive explanations (SHAP) analysis to enhance the classification and recognition of regional feature patterns. Furthermore, six prediction models were evaluated to predict key pollution drivers, with the gradient boosting machine (GBM) showing superior performance (Coefficient of determination, R2 = 0.839; Mean squared error, MSE =0.00757). The results identified four distinct city clusters with divergent urban characteristics, including high pollution levels, well-developed agriculture, water shortage, and underdeveloped economies. Further analysis revealed specific pollution risks in different clusters, supporting the need for differentiated control priorities. The application of this framework in northern China demonstrates its effectiveness in identifying regional patterns and key pollution drivers, aiding governments and practitioners in efficiently conducting pre-planning for watershed pollution control based on regional characteristics. Positioned as the initial stage of a multi-layered decision-making architecture for sustainable watershed governance, the framework provides a valuable perspective and emphasizes the importance of developing a full-process decision support system.

Original languageEnglish
Article number100443
JournalWater Research X
Volume29
DOIs
StatePublished - 1 Dec 2025
Externally publishedYes

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 11 - Sustainable Cities and Communities
    SDG 11 Sustainable Cities and Communities

Keywords

  • Coupled models
  • Decision-making
  • Pattern recognition
  • Pollution risk prediction
  • Regional characteristics

Fingerprint

Dive into the research topics of 'A data-driven framework for planning-oriented decision support in integrated watershed management: insights from machine learning in Northern China'. Together they form a unique fingerprint.

Cite this