Skip to main navigation Skip to search Skip to main content

Machine learning-based predictive modeling of HAAs concentration in secondary water supply system using UV–vis absorption and excitation-emission matrix (EEM) fluorescence spectroscopy

  • Harbin Institute of Technology Shenzhen
  • Shenzhen Key Laboratory of Water Resource Utilization and Environmental Pollution Control
  • China Northeast Municipal Engineering Design and Research Institute Co.
  • Shenzhen Wanmu Water Services Co.

Research output: Contribution to journalArticlepeer-review

Abstract

Haloacetic acids (HAAs), the second largest group of disinfection by-products (DBPs), pose significant toxicity and carcinogenic risks. Their concentrations in secondary water supply systems (SWSSs) are often higher than those in water distribution system, highlighting the need for rapid and reliable monitoring. Traditional methods for HAAs monitoring are labor-intensive and time-consuming, whereas spectroscopic techniques offer distinct advantages of fast response time and ease of operation, making them a promising alternative. This study firstly elucidated the potential of ultraviolet and visible (UV–Vis) absorbance and excitation-emission matrix (EEM) fluorescence spectra for HAAs prediction in SWSSs, augmented by machine learning models. Various methods for spectra preprocessing and feature selection were evaluated, such as Savitzky-Golay (SG), multiplicative scatter correction (MSC), first-order derivative (D1), second-order derivative (D2) for UV–Vis spectra, and fluorescence spectral indices (FSIs) and parallel factor analysis (PARAFAC) for EEM spectra. Machine learning (ML) models, including multiple linear regression (MLR), support vector machine (SVM), multilayer perceptron (MLP), extreme gradient boosting (XGBoost), and random forests (RF) were compared. Among these, RF model demonstrated the best predictive performance. Further improvement in model performance was achieved by integrating spectral data with water quality parameters. Model interpretation and simplification were conducted using SHapley Additive exPlanations (SHAP) analysis. Specifically, the input variables for the RF model reduced from 37, 17, 57, 38, 12, and 12 to 10, 10, 3, 14, 12, and 9 for MCAA, DCAA, TCAA, MBAA, HAA5, and HAA9, respectively, resulting in R2 values of 0.933, 0.693, 0.640, 0.783, 0.806 and 0.768 for their prediction. The results of this study provided a novel, rapid and reliable approach for HAAs monitoring in SWSSs, offering significant implications for early-stage risk detection and water safety management.

Original languageEnglish
Article number114329
JournalMicrochemical Journal
Volume215
DOIs
StatePublished - Aug 2025
Externally publishedYes

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 6 - Clean Water and Sanitation
    SDG 6 Clean Water and Sanitation

Keywords

  • EEM
  • Haloacetic acids
  • RF
  • Secondary water supply systems
  • UV–Vis

Fingerprint

Dive into the research topics of 'Machine learning-based predictive modeling of HAAs concentration in secondary water supply system using UV–vis absorption and excitation-emission matrix (EEM) fluorescence spectroscopy'. Together they form a unique fingerprint.

Cite this