Open Access Open Access  Restricted Access Subscription or Fee Access

Predicting diabetes status using ensemble algorithms with hyperparameter tuning

Sonjit Mondol, Dr. Ajit Kumar Majumder, Dr. Mohammad Alamgir Kabir


Diabetes is a condition in which the body is unable to produce enough insulin to keep blood sugar levels under control.  If diabetes is not properly identified and treated, it can lead to kidney failure, nerve damage, blindness, and coronary heart disease. A healthy lifestyle, therefore, depends on the early identification of diabetes diseases. However, it can be difficult to assess a person's diabetic status if they live in remote areas or in other places where there is little chance of detecting or testing diabetes. In addition, scheduling an appointment at a diagnostic center and consulting with a doctor adds time and cost to the process of monitoring diabetes in urban regions. Machine learning may be utilized in this situation to overcome these concerns as there are numerous strategies available for resolving classification challenges. The objective of this work is to design an ensemble algorithm and optimize its hyperparameters to accurately identify diabetes from a patient's early symptoms without doing a diagnostic test. Consequently, three ensemble algorithms—Boosting, Bagging, and Random Forest—as well as a grid search hyperparameter tuning strategy—are used on the Bangladesh Demographic and Health Survey (BDHS) 2017–18 dataset. The effectiveness of these algorithms is assessed using the metrics accuracy, sensitivity, specificity, kappa, and ROC curve. The Boosting algorithm has the highest accuracy, at 77.65%, compared to the other two algorithms, with a 7.06% improvement brought on by hyperparameter adjustment.

Full Text:



Vijayan VV, Anjali C (2016) Prediction and diagnosis of diabetes mellitus - A machine learning approach. 2015 IEEE Recent Adv Intell Comput Syst RAICS 2015 122–127

Mujumdar A, Vaidehi V (2019) Diabetes Prediction using Machine Learning Algorithms. Procedia Comput Sci 165:292–299

Ahmed N, Ahammed R, Islam MM, Uddin MA, Akhter A, Talukder MAA, Paul BK (2021) Machine learning based diabetes prediction and development of smart web application. Int J Cogn Comput Eng 2:229–241

Ahmed S, Ahmed T, Sharmin T, Mohammad S, Quddus R (2017) Impact of type 2 Diabetes Mellitus for developing severe health complications in Bangladeshi population. Asian J Med Biol Res 3:152–157

Khan MH, Krämer A, Khandoker A, Prüfer-krämer L, Islam A (2011) Trends in sociodemographic and health-related indicators in Bangladesh , 1993 – 2007 : will inequities persist ? 583–592

Pranto B, Mehnaz SM, Mahid EB, Sadman IM, Rahman A, Momen S (2020) Evaluating machine learning methods for predicting diabetes among female patients in Bangladesh. Information.

Kandhasamy JP, Balamurali S (2015) Performance analysis of classifier models to predict diabetes mellitus. Procedia Comput Sci 47:45–51

Nai-Arun N, Moungmai R (2015) Comparison of Classifiers for the Risk of Diabetes Prediction. Procedia Comput Sci 69:132–142

Islam M, Rahman J, Chandra D (2020) Automated detection and classification of diabetes disease based on Bangladesh demography and health survey data , 2011 using machine learning approach. Diabetes Metab Syndr Clin Res Rev 14:217–219

Panda M, Mishra DP, Patro SM, Salkuti SR (2022) Prediction of diabetes disease using machine learning algorithms. IAES Int J Artif Intell 11:284–290

Rajput MR, Khedgikar SS (2022) Diabetes prediction and analysis using medical attributes: A Machine learning approach. J Xi’an Univ Archit Technol 14:98–103

Islam MT, Raihan M, Aktar N, Alam MS, Ema RR, Islam T (2020) Diabetes Mellitus Prediction using Different Ensemble Machine Learning Approaches. 2020 11th Int Conf Comput Commun Netw Technol ICCCNT 2020.

Yadav DC, Pal S (2021) An Experimental Study of Diversity of Diabetes Disease Features by Bagging and Boosting Ensemble Method with Rule Based Machine Learning Classifier Algorithms. SN Comput Sci 2:1–10

Laila U e., Mahboob K, Khan AW, Khan F, Taekeun W (2022) An Ensemble Approach to Predict Early-Stage Diabetes Risk Using Machine Learning: An Empirical Study. Sensors 22:1–15

Taser PY (2021) Application of Bagging and Boosting Approaches Using Decision Tree-Based Algorithms in Diabetes Risk Prediction. Proceedings 74:6

Mukherjee I, Schapire RE (2013) A theory of multiclass boosting. J. Mach. Learn. Res. 14:

Alfaro E, Gáamez M, García N (2013) Adabag: An R package for classification with boosting and bagging. J Stat Softw.

Breiman L (1996) Bagging Predictors. Mach Learn 24:123–140

Breiman L (2001) Random Forest. Mach Learn 45:5–32

James G, Witten D, Hastie T, Tibshirani R (2013) An Introduction to Statistical Learning wth application in R.

Lantz B (2019) Machine Learning with R, 3rd ed. Packt Publishing, Birmingham B3 2PB, UK.

Probst P, Wright MN, Boulesteix AL (2019) Hyperparameters and tuning strategies for random forest. Wiley Interdiscip Rev Data Min Knowl Discov 9:1–15

Elgeldawi E, Sayed A, Galal AR, Zaki AM (2021) Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis. Informatics 8:1–21

DROTÁR P, SMÉKAL Z (2014) Comparative Study of Machine Learning Techniques for Supervised Classification of Biomedical Data. Acta Electrotech Inform 14:5–10

Alpaydin E (2014) Introduction to Machine Learning, 3rd ed. The MIT Press



  • There are currently no refbacks.