Machine Learning-Based Approach for Filling Gaps in Streamflow Data
DOI:
https://doi.org/10.37934/sijml.5.1.4663Keywords:
KNN, Machine Learning Models, CART, Missing Streamflow Data, Estimation Method, Naïve Bayes (NB)Abstract
The lack of streamflow data can significantly impact the flood prediction capacity of various Malaysian agencies, including the National Disaster Management Agency (NADMA). To address this issue, we investigated the use of machine learning methods to estimate missing streamflow data in eleven stations in Peninsular Malaysia. We compared the performance of three machine learning methods (Naive Bayes, k-Nearest Neighbors model, and Multiple Classification and Regression Tree) with five conventional methods (coefficient of correlation, Arithmetic Average Method, Inverse Distance Weighting Model, Linear Interpolation, and Normal Ratio) using statistical approach such as Coefficient of Correlation (R), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). We conducted homogeneity tests using the Pettitt test, Buishand Range (BR) test, Standard Normal Homogeneity Test (SNHT), and Von Neumann Ratio (VNR) test to determine the quality of the data after the data collection was completed. The results of the homogeneity tests showed that the streamflow data series were not randomly distributed. Our results indicated that the machine learning approach outperformed conventional methods in estimating missing streamflow data. The Naive Bayes approach, in particular, was the most successful, using only a modest quantity of training data to properly forecast the outcomes. Our study's contribution is the application of machine learning algorithms to estimate missing streamflow data, and our findings might help Malaysian flood control efforts. Overall, our findings show that machine learning approaches have the potential to improve the accuracy of streamflow data prediction, which is critical for successful flood control.
Downloads
