Machine Learning-Based Approach for Filling Gaps in Streamflow Data

Jing Lin Ng; Aik Hang Chong Chong; Jin Chai Lee Lee; Nur Ilya Farhana Md Noh; Muyideen Abdulkareem; Deprizon Syamsunur; Ramez A. Al-Mansob; Majid Mirzaei; Siaw Yin Thian Thian

doi:10.37934/sijml.5.1.4663

Authors

Jing Lin Ng School of Civil Engineering, College of Engineering, Universiti Teknologi Mara (UiTM), 40450 Shah Alam, Selangor, Malaysia
Aik Hang Chong Department of Civil Engineering, Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur, 56000, Malaysia
Jin Chai Lee Department of Civil Engineering, Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur, 56000, Malaysia
Nur Ilya Farhana Md Noh School of Civil Engineering, College of Engineering, Universiti Teknologi Mara (UiTM), 40450 Shah Alam, Selangor, Malaysia
Muyideen Abdulkareem Department of Civil Engineering, Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur, 56000, Malaysia
Deprizon Syamsunur Department of Civil Engineering, Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur, 56000, Malaysia
Ramez A. Al-Mansob Department of Civil Engineering Department, International Islamic University Malaysia, Gombak, 53100, Malaysia
Majid Mirzaei Department of Civil, Construction, and Environmental Engineering, University of Alabama, Tuscaloosa, AL, USA
Siaw Yin Thian Water Resources and Climate Change Research Centre, National Hydraulic Research Institute of Malaysia (NAHRIM), Seri Kembangan, Selangor, 43300, Malaysia

DOI:

https://doi.org/10.37934/sijml.5.1.4663

Keywords:

KNN, Machine Learning Models, CART, Missing Streamflow Data, Estimation Method, Naïve Bayes (NB)

Abstract

The lack of streamflow data can significantly impact the flood prediction capacity of various Malaysian agencies, including the National Disaster Management Agency (NADMA). To address this issue, we investigated the use of machine learning methods to estimate missing streamflow data in eleven stations in Peninsular Malaysia. We compared the performance of three machine learning methods (Naive Bayes, k-Nearest Neighbors model, and Multiple Classification and Regression Tree) with five conventional methods (coefficient of correlation, Arithmetic Average Method, Inverse Distance Weighting Model, Linear Interpolation, and Normal Ratio) using statistical approach such as Coefficient of Correlation (R), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). We conducted homogeneity tests using the Pettitt test, Buishand Range (BR) test, Standard Normal Homogeneity Test (SNHT), and Von Neumann Ratio (VNR) test to determine the quality of the data after the data collection was completed. The results of the homogeneity tests showed that the streamflow data series were not randomly distributed. Our results indicated that the machine learning approach outperformed conventional methods in estimating missing streamflow data. The Naive Bayes approach, in particular, was the most successful, using only a modest quantity of training data to properly forecast the outcomes. Our study's contribution is the application of machine learning algorithms to estimate missing streamflow data, and our findings might help Malaysian flood control efforts. Overall, our findings show that machine learning approaches have the potential to improve the accuracy of streamflow data prediction, which is critical for successful flood control.