Analysis of Big Data in Education to Predict Dropout and Identify At-Risk Students

Document Type : Scientific research

Author
Masters Student of Educational Psychology, Educational Psycholoy Department, Shabestar Branch Islamic Azad University, Shabestar Iran
Abstract
Student dropout is a fundamental challenge for education systems worldwide, imposing significant economic and social consequences on both society and individuals. With the development of modern technologies and the widespread use of educational information systems, a vast amount of educational data is generated, enabling advanced analytics. Big data and machine learning techniques have created unprecedented opportunities for the early identification of students at risk of dropping out. The main objective of this study is to develop a comprehensive model for predicting student dropout using big data analysis and machine learning algorithms. The research questions are: 1) Which factors have the most significant impact on the risk of dropping out? 2) Which machine learning algorithm performs best in predicting dropout? 3) How can an effective early warning system be designed? This research was conducted using a quantitative, descriptive-correlational design. The sample consisted of 450 students from various Iranian universities, selected through stratified random sampling. The data included demographic variables, academic performance, attendance rates, participation levels, and socio-economic factors. Six machine learning algorithms—Logistic Regression, Random Forest, Support Vector Machine, Naive Bayes, Neural Network, and Decision Tree—were used for data analysis. The models were validated using 10-fold cross-validation. The results showed that the Naive Bayes algorithm performed best with an accuracy of 92.4%, precision of 89.7%, and sensitivity of 94.2%. The most important predictors of dropout were current GPA (0.284), attendance rate (0.195), participation score (0.148), and weekly study hours (0.117). A significant negative correlation was observed between current GPA and dropout risk (-0.674). Regression analysis indicated that all main variables except gender had a significant effect on dropout risk. This study demonstrates that using big data and machine learning algorithms can be a powerful tool for predicting dropout and identifying at-risk students. The results contribute to the development of early warning systems that allow for timely intervention and dropout prevention. The practical implications of this research include improving educational policymaking, optimizing resource allocation, and enhancing the quality of student support services.
Keywords

Subjects



Articles in Press, Accepted Manuscript
Available Online from 24 July 2025

  • Receive Date 24 July 2025
  • Accept Date 15 May 2025