Machine Learning-Based Approach for Filling Gaps in Streamflow Data

Authors

  • Jing Lin Ng School of Civil Engineering, College of Engineering, Universiti Teknologi Mara (UiTM), 40450 Shah Alam, Selangor, Malaysia
  • Aik Hang Chong Department of Civil Engineering, Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur, 56000, Malaysia
  • Jin Chai Lee Department of Civil Engineering, Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur, 56000, Malaysia
  • Nur Ilya Farhana Md Noh School of Civil Engineering, College of Engineering, Universiti Teknologi Mara (UiTM), 40450 Shah Alam, Selangor, Malaysia
  • Muyideen Abdulkareem Department of Civil Engineering, Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur, 56000, Malaysia
  • Deprizon Syamsunur Department of Civil Engineering, Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur, 56000, Malaysia
  • Ramez A. Al-Mansob Department of Civil Engineering Department, International Islamic University Malaysia, Gombak, 53100, Malaysia
  • Majid Mirzaei Department of Civil, Construction, and Environmental Engineering, University of Alabama, Tuscaloosa, AL, USA
  • Siaw Yin Thian Water Resources and Climate Change Research Centre, National Hydraulic Research Institute of Malaysia (NAHRIM), Seri Kembangan, Selangor, 43300, Malaysia

DOI:

https://doi.org/10.37934/sijml.5.1.4663

Keywords:

KNN, Machine Learning Models, CART, Missing Streamflow Data, Estimation Method, Naïve Bayes (NB)

Abstract

The lack of streamflow data can significantly impact the flood prediction capacity of various Malaysian agencies, including the National Disaster Management Agency (NADMA). To address this issue, we investigated the use of machine learning methods to estimate missing streamflow data in eleven stations in Peninsular Malaysia. We compared the performance of three machine learning methods (Naive Bayes, k-Nearest Neighbors model, and Multiple Classification and Regression Tree) with five conventional methods (coefficient of correlation, Arithmetic Average Method, Inverse Distance Weighting Model, Linear Interpolation, and Normal Ratio) using statistical approach such as Coefficient of Correlation (R), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). We conducted homogeneity tests using the Pettitt test, Buishand Range (BR) test, Standard Normal Homogeneity Test (SNHT), and Von Neumann Ratio (VNR) test to determine the quality of the data after the data collection was completed. The results of the homogeneity tests showed that the streamflow data series were not randomly distributed. Our results indicated that the machine learning approach outperformed conventional methods in estimating missing streamflow data. The Naive Bayes approach, in particular, was the most successful, using only a modest quantity of training data to properly forecast the outcomes. Our study's contribution is the application of machine learning algorithms to estimate missing streamflow data, and our findings might help Malaysian flood control efforts. Overall, our findings show that machine learning approaches have the potential to improve the accuracy of streamflow data prediction, which is critical for successful flood control.

Downloads

Downloads

Published

2025-03-20

How to Cite

Ng, J. L., Chong, A. H. C., Lee, J. C. L., Md Noh, N. I. F., Abdulkareem, M., Syamsunur, D., … Thian, S. Y. T. (2025). Machine Learning-Based Approach for Filling Gaps in Streamflow Data. Semarak International Journal of Machine Learning , 5(1), 46–63. https://doi.org/10.37934/sijml.5.1.4663

Issue

Section

Articles

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.