Comparative Analysis of Machine Learning Algorithms for Diabetic Disease Identification

Dhasaradhan Kaveripakam; Jaichandran Ravichandran

doi:10.37934/araset.45.1.4050

Authors

Dhasaradhan Kaveripakam Department of Computer Science and Engineering, Aarupadai Veedu Institute of Technology, Vinayaka Mission’s Research Foundation (Deemed to be University), Paiyanoor, Chengalpattu District, Tamil Nadu 603104, India
Jaichandran Ravichandran Department of Computer Science and Engineering, Aarupadai Veedu Institute of Technology, Vinayaka Mission’s Research Foundation (Deemed to be University), Paiyanoor, Chengalpattu District, Tamil Nadu 603104, India

DOI:

https://doi.org/10.37934/araset.45.1.4050

Keywords:

PIMA Indian diabetic dataset, Machine learning algorithms, Jupyter notebook and sci-kit libraries

Abstract

This article introduces a comparative analysis of different types of machine learning algorithms (MLAs) used for diabetic disease identification. Today machine learning algorithms are a major role in solving and identifying the different type of diseases in the medical sector. In the early prediction of diabetic to easily treat physicians and protect from other diseases in patients seven types of MLAs such as support vector machine (SVM), decision tree (DT), logistic regression (LGR), Gradient boost method (GDBM), k-nearest neighbour (KNN), XG boost (XGBM) and random forest (RF) are used for diabetic identification. PIMA Indian diabetic dataset (PIMAIDD) is used to train and test the MLAs. Confusion matrix, accuracy (ACR), precision (PCN), recall (RCL), f1-score (FSC), receiver operating curve (ROC) and K-fold cross-validation are the metrics used for performance evaluation of MLAs and experiments are implemented by Jupyter notebook and python sci-kit libraries. Six types of test cases were conducted whereas test case 4 (70%-30%) was well performed in which RF reported better results in diabetic identification that differentiates from other machine learning metric scores.