Comparative Analysis of Machine Learning Algorithms for Diabetic Disease Identification
DOI:
https://doi.org/10.37934/araset.45.1.4050Keywords:
PIMA Indian diabetic dataset, Machine learning algorithms, Jupyter notebook and sci-kit librariesAbstract
This article introduces a comparative analysis of different types of machine learning algorithms (MLAs) used for diabetic disease identification. Today machine learning algorithms are a major role in solving and identifying the different type of diseases in the medical sector. In the early prediction of diabetic to easily treat physicians and protect from other diseases in patients seven types of MLAs such as support vector machine (SVM), decision tree (DT), logistic regression (LGR), Gradient boost method (GDBM), k-nearest neighbour (KNN), XG boost (XGBM) and random forest (RF) are used for diabetic identification. PIMA Indian diabetic dataset (PIMAIDD) is used to train and test the MLAs. Confusion matrix, accuracy (ACR), precision (PCN), recall (RCL), f1-score (FSC), receiver operating curve (ROC) and K-fold cross-validation are the metrics used for performance evaluation of MLAs and experiments are implemented by Jupyter notebook and python sci-kit libraries. Six types of test cases were conducted whereas test case 4 (70%-30%) was well performed in which RF reported better results in diabetic identification that differentiates from other machine learning metric scores.