Abstract
Haloacetic acids (HAAs), the second largest group of disinfection by-products (DBPs), pose significant toxicity and carcinogenic risks. Their concentrations in secondary water supply systems (SWSSs) are often higher than those in water distribution system, highlighting the need for rapid and reliable monitoring. Traditional methods for HAAs monitoring are labor-intensive and time-consuming, whereas spectroscopic techniques offer distinct advantages of fast response time and ease of operation, making them a promising alternative. This study firstly elucidated the potential of ultraviolet and visible (UV–Vis) absorbance and excitation-emission matrix (EEM) fluorescence spectra for HAAs prediction in SWSSs, augmented by machine learning models. Various methods for spectra preprocessing and feature selection were evaluated, such as Savitzky-Golay (SG), multiplicative scatter correction (MSC), first-order derivative (D1), second-order derivative (D2) for UV–Vis spectra, and fluorescence spectral indices (FSIs) and parallel factor analysis (PARAFAC) for EEM spectra. Machine learning (ML) models, including multiple linear regression (MLR), support vector machine (SVM), multilayer perceptron (MLP), extreme gradient boosting (XGBoost), and random forests (RF) were compared. Among these, RF model demonstrated the best predictive performance. Further improvement in model performance was achieved by integrating spectral data with water quality parameters. Model interpretation and simplification were conducted using SHapley Additive exPlanations (SHAP) analysis. Specifically, the input variables for the RF model reduced from 37, 17, 57, 38, 12, and 12 to 10, 10, 3, 14, 12, and 9 for MCAA, DCAA, TCAA, MBAA, HAA5, and HAA9, respectively, resulting in R2 values of 0.933, 0.693, 0.640, 0.783, 0.806 and 0.768 for their prediction. The results of this study provided a novel, rapid and reliable approach for HAAs monitoring in SWSSs, offering significant implications for early-stage risk detection and water safety management.
| Original language | English |
|---|---|
| Article number | 114329 |
| Journal | Microchemical Journal |
| Volume | 215 |
| DOIs | |
| State | Published - Aug 2025 |
| Externally published | Yes |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 6 Clean Water and Sanitation
Keywords
- EEM
- Haloacetic acids
- RF
- Secondary water supply systems
- UV–Vis
Fingerprint
Dive into the research topics of 'Machine learning-based predictive modeling of HAAs concentration in secondary water supply system using UV–vis absorption and excitation-emission matrix (EEM) fluorescence spectroscopy'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver